Thursday, March 16, 2023

Leveraging Oracle's Compress and Partition Features for Efficient Data Management

Recently, one of my clients loaded a substantial amount of data into an Oracle database table, which consumed an excessive amount of disk space and resulted in sluggish query performance. To address these issues, I suggested implementing Oracle's compress and partition table features. By applying these techniques, we successfully saved disk space and improved query performance.
In this example, we will create a table that combines both compression and partitioning with an interval. The table will store a sales transaction dataset, which includes columns for transaction ID, product ID, date, and revenue. We will use range partitioning with an interval of 1 month and enable Advanced Row Compression for the table. Creating the table with compression and interval partitioning:
CREATE TABLE sales_transactions
(
  transaction_id NUMBER,
  product_id NUMBER,
  transaction_date DATE,
  revenue NUMBER
)
PARTITION BY RANGE (transaction_date) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(
  PARTITION sales_transactions_initial VALUES LESS THAN (TO_DATE('2023-01-01', 'YYYY-MM-DD'))
)
COMPRESS FOR OLTP;
In this example, we have created a range-partitioned table with an interval of 1 month. The PARTITION BY RANGE clause specifies that we are partitioning the table based on the transaction_date column. The INTERVAL clause defines a 1-month interval for partition creation using the NUMTOYMINTERVAL function. The table also has an initial partition, sales_transactions_initial, that holds data for dates before '2023-01-01'. Lastly, we have added the COMPRESS FOR OLTP clause to enable Advanced Row Compression for the table. Inserting data into the table:
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (1, 101, TO_DATE('2022-12-15', 'YYYY-MM-DD'), 1000);
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (2, 102, TO_DATE('2023-01-05', 'YYYY-MM-DD'), 1500);
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (3, 103, TO_DATE('2023-01-20', 'YYYY-MM-DD'), 2000);
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (4, 104, TO_DATE('2023-02-15', 'YYYY-MM-DD'), 2500);

Querying the table:
SELECT * FROM sales_transactions 
   WHERE transaction_date >= TO_DATE('2023-01-01', 'YYYY-MM-DD') 
   AND transaction_date < TO_DATE('2023-02-01', 'YYYY-MM-DD');
This query retrieves sales transactions for January 2023. Oracle will use partition pruning to read only the relevant partition(s) for the specified date range, improving query performance. By combining compression and interval partitioning, you can optimize storage space, manage large tables efficiently, and enhance query performance in your Oracle database.

Wednesday, June 15, 2022

Convincing an Intern to Perform Data Science Tasks Using SQL

An intern has joined my client's company to work on a data science task. I will be helping her along the way. She is familiar with Python and R programming languages and has accomplished a number of projects with these tools. However, I have successfully convinced her that it is a good idea to learn database query language SQL and use it to perform most data work. The following are the four reasons that I mentioned:
  1. Unlike in academic environment, business data are mostly stored in relational databases. It is economical and secure to perform data analytics/data science jobs within databases using SQL.
  2. SQL language is standard. SQL scripts, with minor or no change,  can run on any relational databases such as Snowflake, Oracle, SQL Server, MySQL, Postgresql, etc.
  3. Most of the data analytics/data science processes can be implemented in SQL and deployed in databases.
  4. SQL was developed over 50 years ago. It is still widely used and will continue to be so for many years to come. Not many programming language survived such a long period of time. It is a good investment to learn SQL.


Thursday, December 30, 2021

Upcoming Webinar: Optimize Inventory Safety Stock - A Fundamental Principle and Tool

Many factors involved with an inventory affect the bottom line of a business including the purchase price, selling price, bank interest rate, warehouse space cost, utility cost, cost of the item being damaged, stolen and becoming obsolete, etc. It is a consensus among planners that balancing these factors and setting safety stock levels to improve the overall profitability is by no means an easy task.

In the webinar, Dr.Jay Zhou will introduce a much-need guiding principle that pinpoints the optimal safety stock level to maximize the profitability. It is a powerful and quantitative principle yet a straightforward one that can be easily understood by everybody. The benefits of the approach are many-faceted: fully measurable financial outcomes, improved planning productivity, etc.

Dr. Zhou will show how it actually works by using PeakProfit, a software tool from Friesian Analytics that implements the principle, and finding the optimal safety stocks for stock items convincingly and elegantly in a live demo.

The webinar is designed for all inventory professionals and managers regardless of if they have 2 months or 20 years of working experience. Please register for the webinar scheduled on Thursday, January 13th 2022 - 11:00 AM (EST) here

Friday, December 24, 2021

Upcoming Webinar - Optimize Inventory Safety Stock - A Fundamental Principle and Tool

Without a clear and quantified guiding principle, planners find themselves having a hard time determining inventory safety stock levels. Often, it becomes an “art”, which involves manual judgement and spreadsheets, rather than a science. It is not uncommon to see that a company’s safety stock levels stay the same for years despite the business conditions have changed drastically. As a result, a company’s bottomline is negatively impacted.

In the webinar, Dr.Jay Zhou will introduce a much-need guiding principle that pinpoints the optimal safety stock level to maximize the profitability. It is a powerful and quantitative principle yet a straightforward one that can be easily understood by everybody. The benefits of the approach are many-faceted: fully measurable financial outcomes, improved planning productivity, etc.

Dr. Zhou will show how it actually works by using PeakProfit, a software tool from Friesian Analytics that implements the principle, and finding the optimal safety stocks for stock items convincingly and elegantly in a live demo.

The webinar is designed for all inventory professionals and managers regardless of if they have 2 months or 20 years of working experience. Please register for the webinar scheduled on Thursday, January 13th 2022 - 11:00 AM (EST) here

Sunday, December 19, 2021

Upcoming Webinar: Optimize Inventory Safety Stock - A Fundamental Principle and Tool

Many planners take a trial-and-error approach based on experience or guesswork to set inventory safety stock levels. In the webinar, Dr.Jay Zhou will introduce a fundamental principle that pinpoints the optimal safety stock level to maximize the profitability. It is a powerful and quantitative principle yet a straightforward one that can be easily understood by everybody. The benefits of the approach are many-faceted: fully measurable financial outcomes, improved planning productivity, etc. Following the introduction of the principle, Dr. Zhou will show how it actually works by using PeakProfit, a software tool from Friesian Analytics that implements the principle.

The webinar is designed for all inventory professionals and managers regardless of if they have 2 months or 20 years of working experience. Please register for the webinar scheduled on Thursday, January 13th 2022 - 11:00 AM (EST) here .

Sunday, November 28, 2021

We are hiring data engineers.

We are looking for multiple contractors for a six-month project, Data Engineers, who may work remotely. The role will help us build and maintain a Snowflake data warehouse that captures historical and ongoing business data. The data in the warehouse need to be cleaned, transformed, and merged to generate a holistic view of the business, which will serve as the foundation for downstream value-added in-depth analytics and machine learning.

Responsibilities
Build Snowflake data pipeline/ETL jobs to ingest data from text files, relational and non-SQL databases.
Perform ongoing maintenance and administration of the Snowflake warehouse.
Design and implement Snowflake schemas including tables, views, and materialized views.
Understand the data from a business perspective and write SQL scripts based on business logic.
Manage data sharing and data access for business users.
Optimize data storage and warehouse query performance.

Qualifications and Experience
Experience with Snowflake warehouse, data pipeline, Apache Airflow, and ETL tools in an enterprise environment.
Strong experience working with large data sets from multiple sources. Performed tasks including data cleansing, data merge, and aggregation.
Strong SQL and Python programming skills. Knowledge in R programming is a plus.
Experience MySQL, MongoDB, Amazon AWS services (S3, SQS, Lamba, etc.).
Experience with business intelligence tool is a plus.
Experience with machine learning is a plus.


To apply for the position, please send your resume to us at jobs@sqlytics.com.

Friday, September 24, 2021

An Antique Neural Network for Image Classification

Quite accidentally I found the picture below showing my research work in 1994 when I trained a neural network to identify the land use types from a satellite image. The neural network had a structure of 3 input neurons, corresponding to the blue, green, and red bands of a Landsat TM image, 14 hidden neurons, and 7 output neurons, representing seven land use types including the cornfields, wheat fields, water bodies, impervious surfaces, etc. The number of training data points is 65.

I coded everything from scratch including the algorithm and the user interface using C++. There were no prebuilt packages to use, no internet to search the information, and no such thing as asking questions on social media. The computer used had a 486 CPU, a 32-megabyte memory, and a hard drive with less than 500-megabyte storage.

How things have changed in 27 years! With a deep learning neural network, such as a convolutional neural network, many layers of neurons serving different purposes can be stacked together to form a complex structure and collectively perform recognitive tasks that were unthinkable in the past. And all these can be done with a few lines of Python scripts and the execution is often done in a cloud computing environment with virtually unlimited computation and storage resources. The progress is astonishing.