Wednesday, April 02, 2014

Compare Oracle Tables in Different Databases: Find an ETL Issue

Problem

There are situations where we need to compare tables or views located in different database. For example, a few years ago I was involved in a project to help building a data warehouse to achieve the bank card transactions in production. The card transactions in the production Oracle database are moved through an ETL process to the data warehouse, another Oracle database. The key questions are: how do we know that the data values are the same after the movement through ETL? How do we compare tables in different databases? (We do not automatically assume that the ETL tool does the data movement job correctly. It turned out the some data values actually changes by the ETL.)

One of ways to compare table across Oracle databases is to use database link. Before we create database link using the following command, the user may need to be granted the "create database link " privilege.

SQL> create database link dl_db1 connect to user1 identified by abc123 
using '(description=(address=(protocol=TCP)(host=server1)(port=1521))
(connect_data=(sid=ORCLID)))';
Once the database link is created, we can compare a table the current database to that in another database. To access the table in another database that the database link points to, we specify "@dl_db1" after the table name as shown below. We can query remote table just like local table.
SQL> select count(1) from tbl_123 a, tbl_123@dl_db1 b where a.id=b.id;
In the project that I mentioned above, we compared the tables before and after being moved by ETL and identified that in some cases values changed. Further investigation showed that it was caused by that the ETL tool converted tables in the source database into text files and then loaded them into tables into the destination database. It is usually not a good idea to dump tables text files as some information are lost during the process.

No comments: