Saturday, September 28, 2013

Generate SQL Create Table/View Queries for Existing Tables/Views

Sometimes, it is useful to keep a copy of the DDL statements, i.e., create table/view, for all or some of the tables/views so that we can recreate them. Function dbms_metadata.get_ddl() can be used here.

SQL> select dbms_metadata.get_ddl('VIEW','V_6K_OBS') from dual;

DBMS_METADATA.GET_DDL('VIEW','V_6K_OBS')
CREATE OR REPLACE FORCE VIEW "BDM"."V_6K_OBS" ("TABLE_NAME", "COLUMN_NAME", " DATA_TYPE", "DATA_TYPE_MOD", "DATA_TYPE_OWNER", "DATA_LENGTH", "DATA_PRECISION", "DATA_SCALE", "NULLABLE", "COLUMN_ID", "DEFAULT_LENGTH", "DATA_DEFAULT", "NUM_D ISTINCT", "LOW_VALUE", "HIGH_VALUE", "DENSITY", "NUM_NULLS", "NUM_BUCKETS", "LAS T_ANALYZED", "SAMPLE_SIZE", "CHARACTER_SET_NAME", "CHAR_COL_DECL_LENGTH", "GLOBA L_STATS", "USER_STATS", "AVG_COL_LEN", "CHAR_LENGTH", "CHAR_USED", "V80_FMT_IMAG E", "DATA_UPGRADED", "HISTOGRAM") AS select "TABLE_NAME","COLUMN_NAME","DATA_ TYPE","DATA_TYPE_MOD","DATA_TYPE_OWNER","DATA_LENGTH","DATA_PRECISION","DATA_SCA LE","NULLABLE","COLUMN_ID","DEFAULT_LENGTH","DATA_DEFAULT","NUM_DISTINCT","LOW_V ALUE","HIGH_VALUE","DENSITY","NUM_NULLS","NUM_BUCKETS","LAST_ANALYZED","SAMPLE_S IZE","CHARACTER_SET_NAME","CHAR_COL_DECL_LENGTH","GLOBAL_STATS","USER_STATS","AV G_COL_LEN","CHAR_LENGTH","CHAR_USED","V80_FMT_IMAGE","DATA_UPGRADED","HISTOGRAM" from user_tab_columns where rownum <=6000

SQL> select dbms_metadata.get_ddl('TABLE','MV_UNIVAR_STS') from dual;

DBMS_METADATA.GET_DDL('TABLE','MV_UNIVAR_STS')
CREATE TABLE "BDM"."MV_UNIVAR_STS" ( "FILENAME" VARCHAR2(32), "C" NUMBE R, "TOT" NUMBER, "TOT_DIS" NUMBER, "MI_VAL" VARCHAR2(512), "MX_VAL" VARC HAR2(512) ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MA XTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "BDM"

Generate SQL Queries "Automatically"

We do not have to write every SQL query manually. It is very efficient to generate SQL statements "automatically" using queries. Of course, we can use SQL queries to generate of statements in other programming languages such C/C++. This was what I did when I took the Computer Software Engineering course in the university. In a number of projects, I used SQL queries to generate large quantity of C++ code for many object classes in neat format automatically. I earned a good grade.

For example, the following query generates a number of queries that calculate the number of records and the average values for table names beginning with "TBL" and column data type is number.

SQL> select 'select '||''''||table_name||''''||','||''''||column_name||''''||', count(*), avg('||column_name||') avg_value from '|| table_name||';' from user_tab_columns where table_name like 'TBL%' and data_type = 'NUMBER' and column_name like '%AMT1';

select 'TBL_FRAUDDETAIL_HIST','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_FRAUDDETAIL_HIST;
select 'TBL_MATCHED_SO_FAR1124','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_MATCHED_SO_FAR1124;
select 'TBL_MATCHED_SO_FAR1201','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_MATCHED_SO_FAR1201;
select 'TBL_MATCHED_SO_FAR1226','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_MATCHED_SO_FAR1226;
select 'TBL_TXN_4_POC1','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_4_POC1;
select 'TBL_TXN_4_POC2','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_4_POC2;
select 'TBL_TXN_4_POC3','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_4_POC3;
select 'TBL_TXN_4_POC4','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_4_POC4;
select 'TBL_TXN_4_POC5','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_4_POC5;
select 'TBL_TXN_4_POC6','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_4_POC6;
select 'TBL_TXN_FOR_POC_EXT','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_FOR_POC_EXT;
select 'TBL_TXN_FOR_POC_EXT2','MD_TRAN_AMT1', count(*), avg(MD_TRAN_AMT1) avg_value from TBL_TXN_FOR_POC_EXT2;

12 rows selected.

The following query generates queries to count the distinctive values for all table names starting with "DEMO" and data type is character.

SQL> select 'select '||''''||table_name||''''||','||''''||column_name||''''||', count(distinct '||column_name||') from '||table_name||';' from use r_tab_columns where table_name like 'DEMO%' and data_type like 'VAR%';

select 'DEMO_CUSTOMERS_LOCAL','CUST_FIRST_NAME', count(distinct CUST_FIRST_NAME) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_LAST_NAME', count(distinct CUST_LAST_NAME) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_STREET_ADDRESS1', count(distinct CUST_STREET_ADDRESS1) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_STREET_ADDRESS2', count(distinct CUST_STREET_ADDRESS2) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_CITY', count(distinct CUST_CITY) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_STATE', count(distinct CUST_STATE) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_POSTAL_CODE', count(distinct CUST_POSTAL_CODE) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','PHONE_NUMBER1', count(distinct PHONE_NUMBER1) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','PHONE_NUMBER2', count(distinct PHONE_NUMBER2) from DEMO_CUSTOMERS_LOCAL;
select 'DEMO_CUSTOMERS_LOCAL','CUST_EMAIL', count(distinct CUST_EMAIL) from DEMO_CUSTOMERS_LOCAL;

10 rows selected.

Create Database Link to DB on Amazon EC2 Instance

From a database, we can create a database link to another remote database such as one on Amazon EC2 virtual server as shown below. Here ec2-12-34-567-899.compute-1.amazonaws.com is the amazon EC2 Linux instance's Public DNS.

SQL> create database link dl_aws_ec2 connect to prod_DB identified by PWDXXX using '(description=(address=(protocol= TCP)(host=ec2-12-34-567-899.compute-1.amazonaws.com)(port=1521)) (connect_data=(sid=XE)))';
Database link created.

SQL> select count(1) from user_tables@dl_aws_ec2;

COUNT(1)
15

Just like we query any databases, we can see a few tables with names starting with the word "DEMO", count the number of records, and if we want, make a local copy of the tables.

SQL> select table_name from user_tables@dl_aws_ec2 where table_name like 'DEMO%' and rownum <5 order by table_name;

TABLE_NAME
DEMO_CUSTOMERS
DEMO_ORDERS
DEMO_ORDER_ITEMS
DEMO_PAGE_HIERARCHY

SQL> select count(*) from DEMO_CUSTOMERS@dl_aws_ec2;

COUNT(*)
7

SQL> create table DEMO_CUSTOMERS_LOCAL as select * from DEMO_CUSTOMERS@dl_aws_ec2;
Table created.

Sunday, September 22, 2013

Trim Function- Remove Leading and Trailing Blanks

Leading and trailing banks can be removed by Oracle trim function as shown below.

If we look at the lengths of the original and trimmed string (columns 4 and 5), we notice that the fourth record has 4 banks in the original string. However, the string is replaced with a NULL (length zero). If it is desirable that we want to keep one blank for the record, we can use NVL function to replace the NULL with a single blank.

In a project, I used the query similar to the following and fixed the data fields in debit card transaction.

nvl(substr( trim(SD_TERM_NAME_LOC),1,18),' ')

What I did was to first remove leading and trailing banks, then extract the first 18 characters (in case the terminal name is too long). In the case that the terminal name is all blanks or NULL, I replace them with a single blank. This data preparation step is necessary before we build a predictive model.

Some Observations on NULL Value Handling in Oracle SQL

We need to be aware of how NULL/missing values are handled in SQL query so that we will not be surprised by query results that appear "wrong". This is descried using the following simple table as an example. The fifth record has a NULL value.

SQL> select id, value from tbl_data order by id;

ID VALUE
1 -1
2 0
3 1
4 2
5  

If we calculate the total number of records in the table, number of records with values>=0 and values <0, they are 5, 3 and 1, respectively, as shown below. As we can see, the number of records for values >=0 (3) plus that <0 (1) is less than the total number of records (5). This is because NULL values appear in the SQL where clause exclude records from the consideration.

SQL> select count(*) from tbl_data;

COUNT(1)
5

SQL> select count(*) from tbl_data where value>=0;

COUNT(1)
3

SQL> select count(*) from tbl_data where value<0;

COUNT(1)
1

A better way to calcluate this kind of statisitcs is to use "case when" instead of where clause as shown below.

SQL> select count(*) total, sum(case when value>=0 then 1 else 0 end) non_negative, sum(case when value<0 then 1 else 0 end) negative, sum(case whe n value is null then 1 else 0 end) n_missing from tbl_data;

TOTAL NON_NEGATIVE NEGATIVE N_MISSING
5 3 1 1

Or even better, we combine "case when" with "group by" to calculate the statistics. "Group by" is one of my favorites as it gives the complete picture (including NULL values) about the data.

SQL> select v, count(*) from (select case when value>=0 then 'non-negative' when value <0 then 'negative' else 'missing' end v from tbl_data) group by v;

V COUNT(1)
negative 1
non-negative 3
missing 1

In summary, if we are aware of how NULL values are handles in the database, we will not be surprised by query results that appear "wrong".