Friday, October 20, 2023

Speeding Up Bulk Data Loads in Oracle with Hints

Bulk loading volumes of data from staging tables into production tables is a common task, but a slow and tedious process using basic INSERT statements in Oracle. But using query hints can dramatically improve load times, as I recently discovered how dramatical it could be.

In our ETL process, we needed to load order data from a staging table to the main orders table. Our initial INSERT statement looked like:
    INSERT INTO Orders (order_id, customer_id, order_date, total_amount)
       SELECT order_id, customer_id, order_date, total_amount
         FROM Staging_Orders;
This took many hours to millions of rows. But then we came across hint syntax that lets you fine-tune SQL behavior.
Adding a parallel and direct-path hint like:
INSERT /*+ APPEND PARALLEL(4) */ INTO Orders (order_id, customer_id, order_date, total_amount)
SELECT order_id, customer_id, order_date, total_amount
FROM Staging_Orders;
In the above SQL statement, the /*+ APPEND */ hint instructs Oracle to perform a Direct Path Insert for faster data loading. and the PARALLEL(4) hint instructs Oracle to use parallel processing with four parallel execution servers (you can adjust the number as needed based on your system's resources).

The results were mind-blowing - loading finished in under 10 minutes! For big data loads, enabling direct-path INSERT with parallelism cuts the time from hours to minutes.

I hope this example gives you some ideas on optimizing bulk data operations in Oracle through hints. Let me know if you have any other tips for supercharged ETL performance!

Thursday, March 16, 2023

Leveraging Oracle's Compress and Partition Features for Efficient Data Management

Recently, one of my clients loaded a substantial amount of data into an Oracle database table, which consumed an excessive amount of disk space and resulted in sluggish query performance. To address these issues, I suggested implementing Oracle's compress and partition table features. By applying these techniques, we successfully saved disk space and improved query performance.
In this example, we will create a table that combines both compression and partitioning with an interval. The table will store a sales transaction dataset, which includes columns for transaction ID, product ID, date, and revenue. We will use range partitioning with an interval of 1 month and enable Advanced Row Compression for the table. Creating the table with compression and interval partitioning:
CREATE TABLE sales_transactions
(
  transaction_id NUMBER,
  product_id NUMBER,
  transaction_date DATE,
  revenue NUMBER
)
PARTITION BY RANGE (transaction_date) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(
  PARTITION sales_transactions_initial VALUES LESS THAN (TO_DATE('2023-01-01', 'YYYY-MM-DD'))
)
COMPRESS FOR OLTP;
In this example, we have created a range-partitioned table with an interval of 1 month. The PARTITION BY RANGE clause specifies that we are partitioning the table based on the transaction_date column. The INTERVAL clause defines a 1-month interval for partition creation using the NUMTOYMINTERVAL function. The table also has an initial partition, sales_transactions_initial, that holds data for dates before '2023-01-01'. Lastly, we have added the COMPRESS FOR OLTP clause to enable Advanced Row Compression for the table. Inserting data into the table:
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (1, 101, TO_DATE('2022-12-15', 'YYYY-MM-DD'), 1000);
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (2, 102, TO_DATE('2023-01-05', 'YYYY-MM-DD'), 1500);
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (3, 103, TO_DATE('2023-01-20', 'YYYY-MM-DD'), 2000);
INSERT INTO sales_transactions (transaction_id, product_id, transaction_date, revenue)
  VALUES (4, 104, TO_DATE('2023-02-15', 'YYYY-MM-DD'), 2500);

Querying the table:
SELECT * FROM sales_transactions 
   WHERE transaction_date >= TO_DATE('2023-01-01', 'YYYY-MM-DD') 
   AND transaction_date < TO_DATE('2023-02-01', 'YYYY-MM-DD');
This query retrieves sales transactions for January 2023. Oracle will use partition pruning to read only the relevant partition(s) for the specified date range, improving query performance. By combining compression and interval partitioning, you can optimize storage space, manage large tables efficiently, and enhance query performance in your Oracle database.