Friday, December 25, 2015

Convert Values Using Decode Function

We use decode() function to convert a value to other values. We have the following table.

SQL> select * from tbl_test order by id;

     ID MESSAGE
------- --------------------------------
      1 A
      2 B
      3 C
      4 D
The following query transforms characters values into numbers or weights.
SQL> select a.*, decode(message, 'A', 0.3, 'B',0.4, 0.35) weight from tbl_test a order by id;

     ID MESSAGE                              WEIGHT
------- -------------------------------- ----------
      1 A                                        .3
      2 B                                        .4
      3 C                                       .35
      4 D                                       .35
In the above query, A is mapped to 0.3, B to 0.4. All other values are mapped to the default value of 0.35. If the default value is not defined, other characters where the mapping is not defined will be given a NULL value.
SQL> select a.*, decode(message, 'A', 0.3, 'B',0.4) weight from tbl_test a order by id;

     ID MESSAGE                              WEIGHT
------- -------------------------------- ----------
      1 A                                        .3
      2 B                                        .4
      3 C
      4 D
I have used decode function to convert categorical values into weights that are derived from a logistic regression model. For example, I used the following query to convert transaction code into weights. Before decode function is applied, CARD_TXN_CDE is cleaned up using trim, substr and nvl functions. The trim function removes leading and trailing blanks. The substr function extracts the first 18 characters of the string. The nvl function converts NULL values to blank.
decode(nvl(substr( trim(CARD_TXN_CDE),1,18),' '),
' ',-.660841,
'01',-.518927,
'1',-.076546,
'10',-.294631,
'12',.077699,
'14',-.709884,
'25',-.30619,
'40',.021855,
'51',-.004593,
'52',-.069521,
'53',-.000344,
'57',.570421,
'59',.858444,
'67',1.481654,
'81',.29988,
'91',-.004755,
'96',.02628,
'AD',-.001036,
'AK',-.150162,
'CB',-.001588,
'D5',.364975,
'G',-.015795,
'H',-.274374,
'I',-1.065177,
'J',-.027991,
'N',.157622,
'NK',.763406,
'NR',.080558,
'P',-.961133,
'PE',-.19558,
'PH',.479081,
'PR',.134741,
'S',-.239287,
'SV',.475934,
'T1',.241061,
'T2',-.277572,
'T3',.901487,
'T4',-.0137,
'TS',.01362,
'U2',-.186914,
'X',1.301152,
'XX',-.11462,
'Z0',-.7141,
'Z4',-.004642,
'Z6',-.014541,
0)

Thursday, July 30, 2015

Remove the Rows with Lowest or Highest Values

In a project, I need to remove rows with lowest or highest values within groups. I used the following table to illustrate the problem. For grp 10, rows with val 1 and 4 need to be removed and for grp 11, rows with val 5.5 and 9.6.

SQL> select * from tbl_x2 order by 1,2;

       GRP        VAL
---------- ----------
        10          1
        10          2
        10          3
        10          4
        11        5.5
        11        7.3
        11        9.6

7 rows selected.
I used the following SQL statement to do it.
delete from tbl_x2 a
where a.rowid in
(
select rid from
(
with tbl as (
select b.rowid rid,
row_number() over(partition by grp order by val) rnk,
row_number() over(partition by grp order by val desc) rnkr
from tbl_x2 b)
select rid from tbl where rnk=1 or rnkr=1
)
);
The following query shows that rows with lowest or highest values within groups are removed.
SQL> select * from tbl_x2 order by 1,2;

       GRP        VAL
---------- ----------
        10          2
        10          3
        11        7.3
The trick is to use row_number() function to generate ranks or reverse ranks based on val for each grp. For example, the following query show all rows in the original table and the ranks (rnk) and reversed ranks (rnkr). The lowest val has a rank of 1 and the highest val has a reversed rank of 1.
select b.*,
row_number() over(partition by grp order by val) rnk,
row_number() over(partition by grp order by val desc) rnkr
from tbl_x3 b

       GRP        VAL        RNK       RNKR
---------- ---------- ---------- ----------
        10          1          1          4
        10          2          2          3
        10          3          3          2
        10          4          4          1
        11        5.5          1          3
        11        7.3          2          2
        11        9.6          3          1

Friday, April 10, 2015

Extract SQL Source Code For Objects From Datapump Dumpfile

When we dump the whole schema as a dumpfile, the SQL statements for creating those database objects (DDL) can be extracted from the dumpfile.
First, we create the dumpfile using Oracle datapump export utility expdp.

expdp myuser/myuser dumpfile=myuserdump.dmp directory=myuser_dir
We use Oracle datapump import utility impdp and specify "sqlfile=".
impdp myuser/myuser dumpfile=myuserdump.dmp directory=myuser_dir 
sqlfile=myuserdump.sql
When "sqlfile=" is used, data import is not actually done. Instead, it produces a file containing the SQL statements to create database objects. For example, the following "create table" and "create index" are taken from the sqlfile produced by impdp. We can simple run those SQL queries to create objects.
CREATE TABLE "MYUSER"."DMPZKM" 
   ( "CLUSTER_ID" NUMBER, 
 "RECORD_COUNT" NUMBER, 
 "PARENT" NUMBER, 
 "TREE_LEVEL" NUMBER, 
 "DISPERSION" BINARY_DOUBLE, 
 "NORM" BINARY_DOUBLE, 
  CONSTRAINT "DMPSKM" PRIMARY KEY ("CLUSTER_ID") ENABLE
   ) ORGANIZATION INDEX NOCOMPRESS PCTFREE 10 INITRANS 2 MAXTRANS 255 LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT);

CREATE UNIQUE INDEX "MYUSER"."DMPQKM" ON "MYUSER"."DMPXKM" 
("CLUSTER_ID", "ATTRIBUTE_NAME", "ATTRIBUTE_SUBNAME", "BIN_ID") 
  PCTFREE 10 INITRANS 2 MAXTRANS 164 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT)
  TABLESPACE "DMUSER" PARALLEL 1 ;

Sunday, April 05, 2015

Table Joining Using Function Based Index

The post Create Function Based Index on Oracle Table shows that indexes can be created based on functions. Here is an example of the actual use of function based index in table joining. In a project, I try to join the following two tables based on a unique sequence numbers:
1. Checks deposited, tbl_checks_deposited.
2. Checks that are returned, tbl_checks_returned.
Both tables contain column sequence_ids that are unique for each record and are supposed to link these two tables. I used the following query to join them.

select a.*, b.return_code
from  tbl_checks_deposited a,  tbl_checks_returned b
where a.sequence_id=b.sequence_id;
However, I did not get many successful matches. After some investigation, I have found that the sequence_id in the second table tbl_checks_returned has extra leading and trailing blanks. This could happen when the data is loaded from text files into an Oracle database. I used trim() function to remove the leading and trailing blanks and redo the join using the following query. To make the join faster, I first create a function based index on table tbl_checks_returned.
create index tbl_checks_returned_idx on tbl_checks_returned(trim(sequence_id));

select a.*, b.return_code
from  tbl_checks_deposited a,  tbl_checks_returned b
where a.sequence_id=trim(b.sequence_id);
Now the join returns the correct number of records.

Remove Character at Specific Location in String

In the post Remove the Trailing Character From a String Using SQL, we use substr SQL function. To remove a character at a specific location in a string, I create a function that takes advantage of substr again as shown below.

create or replace function remove_at(
   str IN varchar2,
   pos IN number)
return varchar2
is
begin
return substr(str,1,pos-1)||substr(str, pos+1, length(str));
end;
/
Now, I apply this function using different locations and the results look correct.
SQL> select id, val from tbl_val;

     ID VAL
------- --------------------------------
      1 abcdef
SQL> select id, remove_at(val,3) val from tbl_val;

     ID VAL
------- --------------------------------
      1 abdef

SQL> select id, remove_at(val,5) val from tbl_val;

     ID VAL
------- --------------------------------
      1 abcdf

SQL> select id, remove_at(val,1) val from tbl_val;

     ID VAL
------- --------------------------------
      1 bcdef

SQL> select id, remove_at(val,0) val from tbl_val;

     ID VAL
------- --------------------------------
      1 abcdef

SQL> select id, remove_at(val,10) val from tbl_val;

     ID VAL
------- --------------------------------
      1 abcdef

Saturday, April 04, 2015

Remove the Trailing Character From a String Using SQL

We can using SQL function substr to remove the last character from a string. For example, the following query shows there is a extra comma at the end of the string. Please also see Remove Character at Specific Location in String.

SQL> select * from tbl_test;

     ID VAL
------- --------------------------------
      1 Hello,
The following query will remove it. Substr takes three parameters. The first parameter is the string to process, the second is the starting position and the third is the length of the substring to extract. Here, we define the length to be the total length of the original string minus one.
SQL>  select id, substr(val, 1, length(val)-1) val from tbl_test;

     ID VAL
------- --------------------------------
      1 Hello

Convert Oracle Number to Hex and Vice Verse

To convert a hexadecimal number (base 16) to a decimal number, we use to_number function and specify the format as 'XXXXX' as shown below.

SQL> select to_number('30003','XXXXXX') from dual;

TO_NUMBER('30003','XXXXXX')
---------------------------
       196611
We can verify the conversion by using the following query.The fifth position from the right represents 16 to the 4th power.
SQL> select power(16,4)*3+3 from dual;

POWER(16,4)*3+3
---------------
         196611
To convert a decimal number to a hex number, we use to_char() function.
SQL> select to_char(196611, 'XXXXXX') from dual;

TO_CHAR
-------
  30003

Friday, April 03, 2015

Chi-square Test in Oracle Database

When we want to test the association of two categorical variables, such as sex and certain disease, we may use Chi-square test. Fortunately, with Oracle statistical function stats_crosstab, this can be done using a SQL query. For example, we have a table containing two columns of discrete values, sex and target.

SQL> select sex, target, count(*) num from tbl_chiqs group by sex, target
 order by sex, target;

S     TARGET        NUM
- ---------- ----------
F          0        152
F          1         13
M          0        411
M          1         26
We want to find out if sex and target are associated. Since both are categorical variables, we perform Chi-square test using a SQL query.
select
stats_crosstab(sex, target, 'CHISQ_OBS') chi_squared,
stats_crosstab(sex, target, 'CHISQ_SIG') p_value
from
tbl;

CHI_SQUARED    P_VALUE
----------- ----------
 .735719671 .391035481   
Based on the p value of the Chi-square statistics, we can not reject the NULL hypothesis that sex and target are independent given the significant level of 0.05. As we see, it is very convenient to perform statistical tests such as Chi-square, within a database using SQL query.It is not necessary to move the data to another statistical software to perform those tests.

Thursday, April 02, 2015

Oracle Impdp Remap Tablespace

When I import a Oracle dump file produced by datapump export utility expdp from another database instance, I get the following error. This is caused the the default tablespace for the user is different.


shell> impdp myuser/mypassword directory=DM_DUMP dumpfile=tmpm_r_pmml.dmp

ORA-39083: Object type TABLE:"MYUSER"."DATA_TRAIN" failed to create with error:
ORA-00959: tablespace 'MYUSER' does not exist
To find our the default tablepsace name for the current user, I log onto the user.
Shell> sqlplus myuser/mypassword

SQL> select default_tablespace from user_users;

DEFAULT_TABLESPACE
------------------------------
TBS

I add "remap_tablespace=DMUSER:TBS" to impdp command and the dump file is imported correctly.
impdp dmuser/dmuser directory=DM_DUMP dumpfile=tmpm_r_pmml.dmp 
   remap_tablespace=MYUSER:TBS

Thursday, March 26, 2015

Best Practice In Loading Text Files Into Oracle Database

The post Five Ways of Loading Text Files Into Oracle Database shows common ways of loading test files. There are many issues when loading text files into an Oracle database. These issues include:

  • Data format conversions such as strings to numbers and strings to dates
  • NULl value representation in text files. NUll are ., n/a, null, blanks, etc.?
  • Other data representation issues such as $ in front of amount.
Thus it is important to find out effective ways to handle those data issues and make sure data are indeed loaded correctly. For many projects, I have used the following process of importing text files into an Oracle database.
  • Step 1. Define external tables pointing to the text files. In the external tables, we simply define all columns as string (varchar2). Creating an external table does not physically load the data. It just contains the "pointer" to the files. We can run SQL queries against an external table just like a regular table.
  • Step 2. Perform data cleansing and conversion using SQL functions and store the data into tables or materialized views.
For example, the following a few lines of text files to be loaded.
10,"ABASTECEDORA NAVAL Y INDUSTRIAL, S.A.",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
15,"ABDELNUR, Nury de Jesus","individual","CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
25,"ACEFROSTY SHIPPING CO., LTD.",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
36,"AEROCARIBBEAN AIRLINES",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
39,"AEROTAXI EJECUTIVO, S.A.",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
41,"AGENCIA DE VIAJES GUAMA",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
First, we create an external table as shown below. All columns are simply defined as varchar2 as we will do the data conversion later.
create directory dir_files as '/home/data';
create table tbl_sdn_ext(
  v1  varchar2(512),
  v2  varchar2(512),
  v3  varchar2(512),
  v4  varchar2(512),
  v5  varchar2(512),
  v6  varchar2(512),
  v7  varchar2(512),
  v8  varchar2(512),
  v9  varchar2(512),
  v10  varchar2(512)
)
organization external
( type oracle_loader
default directory DIR_ORA_EXT
access parameters
( records delimited by newline
skip 1
fields terminated by ','
optionally enclosed by '"'
missing field values are null
)
location('sdn.csv')
);
Once we have the external table, we can apply SQL functions to do various data conversions and store the results into tables physically as shown below.
create table tbl_sdn2
as
with tbl as
(
select
case when trim(V1)='-0-' then null else upper(trim(V1)) end V1,
case when trim(V2)='-0-' then null else upper(trim(V2)) end V2,
case when trim(V3)='-0-' then null else upper(trim(V3)) end V3,
case when trim(V4)='-0-' then null else upper(trim(V4)) end V4,
case when trim(V5)='-0-' then null else upper(trim(V5)) end V5,
case when trim(V6)='-0-' then null else upper(trim(V6)) end V6,
case when trim(V7)='-0-' then null else upper(trim(V7)) end V7,
case when trim(V8)='-0-' then null else upper(trim(V8)) end V8,
case when trim(V9)='-0-' then null else upper(trim(V9)) end V9,
case when trim(V10)='-0-' then null else upper(trim(V10)) end V10
from tbl_sdn_ext
)
select
v1 sdn_id,
v2 full_name,
trim(substr(v2, 1, regexp_instr(v2,',')-1)) lnm,
trim(substr(v2, regexp_instr(v2,',')+1, length(v2))) fnm,
v3 dtype,
v4 country
from
tbl;

Tuesday, March 24, 2015

Most Frequently Used SQL Functions by Data Scientist

As a data scientist, I have done many projects for banks, insurance companies, marketing firms, etc. Most of the projects involve building predictive models. I have won three head to head competitions for building best predictive models.

In a typical project, I receive the data, usually in the format for test files, from a client. Then I load them into Oracle databases. I do most of the data manipulation in the database using SQL. As a result of years of work, I have created close to 800 SQL script files under my work directories on my computers. I figure out it would be interesting to see what are the most frequently used SQL functions from a data scientist's perspective. So I perform a simple analysis on my SQL scripts. I have found that the following 45 functions accounts for about 65% of Oracle SQL functions that I have used for various projects. The following table shows ranks of frequencies of those functions as they present in my SQL scripts. As we may see, a typical data analytics project only involves a relatively small number of frequently used SQL functions.

We may have the following observations: 1. Many of the functions are used for data cleanse and data conversion including trim, nvl, decode,to_char, to_number, trunc, substr,etc. This precisely reflects the fact that at least 80% of the work is about data manipulation. 2. Some simple statistics functions include sum, count, min, max, stddev, corr, median. It is convenient to calculate useful statistics using SQL. We may use those SQL functions to produce reports similar to that by SAS UNIVARIATE.

Rank SQL Function Frequency
1  trim 10.03%
2  sum 8.22%
3  nvl 5.93%
4  count 5.67%
5  decode 5.38%
6  to_char 4.63%
7  min 3.53%
8  max 3.33%
9  to_number 2.25%
10  trunc 2.22%
11  row_number 1.97%
12  avg 1.88%
13  round 1.33%
14  to_date 1.20%
15  substr 1.07%
16  sign 0.93%
17  upper 0.77%
19  stddev 0.45%
20  mod 0.35%
21  lower 0.30%
22  exp 0.28%
23  ratio_to_report 0.27%
24  median 0.25%
25  abs 0.23%
26  lag 0.23%
27  dense_rank 0.22%
28  length 0.22%
29  greatest 0.18%
30  sys_guid 0.15%
31  ceil 0.15%
32  chr 0.13%
33  regexp_instr 0.10%
35  percent_rank 0.10%
36  replace 0.10%
37  regexp_replace 0.10%
38  cume_dist 0.08%
39  instr 0.07%
40  rtrim 0.07%
41  width_bucket 0.07%
42  prediction 0.07%
43  corr 0.05%
44  lead 0.03%
45  lpad 0.03%

Sunday, March 22, 2015

The 2015 Rexer Analytics Data Miner Survey has launched!

The 2015 Rexer Analytics Data Miner Survey has launched. The founder of Rexer Analytics, Karl Rexer, is regarded by Deep-data-mining.com as one of 10 Most Influential People in Data Analytics.

Analytic Professionals — Share your views: Participate in the 2015 Data Miner Survey

Data Analysts, Predictive Modelers, Data Scientists, Data Miners, and all other types of analytic professionals, students, and academics: Please participate in the 2015 Rexer Analytics Data Miner Survey.


Survey Link: www.rexeranalytics.com/Data-Miner-Survey-2015-Intro.html
Access Code: CL72L4It is OK to share this Access Code with others: It can be used by multiple people.
Survey results will be unveiled at the Fall-2015 Boston Predictive Analytics World event.

Rexer Analytics has been conducting the Data Miner Survey since 2007. Each survey explores the analytic behaviors, views and preferences of data miners and analytic professionals. Over 1200 people from around the globe participated in the 2013 survey. Summary reports (40 page PDFs) from previous surveys are available FREE to everyone who requests them by emailing DataMinerSurvey@RexerAnalytics.com. Also, highlights of earlier Data Miner Surveys are available at www.rexeranalytics.com/Data-Miner-Survey-Results-2013.html, including best practices shared by respondents on analytic success measurement, overcoming data mining challenges, and other topics. The FREE Summary Report for this 2015 Data Miner Survey will be available to everyone Fall-2015.

Please tell other data analysis professionals about the survey.

Rexer Analytics is a consulting firm focused on providing data mining and analytic CRM solutions. Recent solutions include customer loyalty analyses, customer segmentation, predictive modeling to predict customer attrition and to target direct marketing, fraud detection, sales forecasting, market basket analyses, and complex survey research. More information is available at www.RexerAnalytics.com or by calling +1 617-233-8185.

Saturday, March 21, 2015

Find Out Database Object Dependencies

Some database objects are depending other objects. For example, a view may query other tables or views. We can query user_dependencies to find out these dependencies.

SQL> create view v_x_y as select a.*, b.val val2 from t_x a, t_y b 
where a.id=b.id;

View created.
The following query shows that view v_x_y depends on two tables, t_x and t_y.
SQL>select * from user_dependencies where name='V_X_Y';

NAME, TYPE, REFERENCED_OWNER, REFERENCED_NAME, REFERENCED_TYPE, 
REFERENCED_LINK_NAME, SCHEMAID, DEPENDENCY_TYPE
V_X_Y VIEW DMUSER T_X TABLE  116 HARD
V_X_Y VIEW DMUSER T_Y TABLE  116 HARD
We create another view v_x_y2 based on view v_x_y.
SQL> create view v_x_y2 as select * from v_x_y;
The following query shows that V_X_Y2 depends on V_X_Y.
SQL> select * from user_dependencies where name='V_X_Y2';

NAME, TYPE, REFERENCED_OWNER, REFERENCED_NAME, REFERENCED_TYPE, 
REFERENCED_LINK_NAME, SCHEMAID, DEPENDENCY_TYPE
V_X_Y2 VIEW DMUSER V_X_Y VIEW  116 HARD
We create a materialized view based on two tables.
SQL> create materialized view mv_x_y as select a.*, b.val val2 
from t_x a, t_y b where a.id=b.id;
SQL> select * from user_dependencies where name='MV_X_Y';
Materialized view created.
The materialized view MV_X_Y depends on tables T_X ,T_Y and MV_X_Y.
NAME, TYPE, REFERENCED_OWNER, REFERENCED_NAME, REFERENCED_TYPE, 
REFERENCED_LINK_NAME, SCHEMAID, DEPENDENCY_TYPE
MV_X_Y MATERIALIZED VIEW DMUSER T_X TABLE  116 REF
MV_X_Y MATERIALIZED VIEW DMUSER T_Y TABLE  116 REF
MV_X_Y MATERIALIZED VIEW DMUSER MV_X_Y TABLE  116 REF

Monday, March 16, 2015

Watch Out Materialized View Refresh Group

In the post Refresh an Oracle Materialized View, we add a materialized view to a refresh group to be refreshed. When we drop the materialized view, it will be removed from the refresh group. Thus if we drop and then recreate the same materialized view, it will not be refreshed according to the schedule defined in the refresh group. We have to manually add the materialized view back to the refresh group. So I normally do not use refresh group. Instead, I create a procedure to refresh the materialized view using dbms_mview.refresh.

create or replace procedure proc_all(md varchar2 default NULL)
is
begin
 dbms_mview.refresh('MV_A');
 dbms_mview.refresh('MV_B');
 dbms_mview.refresh('MV_C');
 dbms_mview.refresh('MV_D');
end;
Then I use DBMS_SCHEDULER to schedule the job.
BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'my_proc',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN proc_all; END;',
    start_date      => SYSTIMESTAMP,
    repeat_interval => 'freq=daily; byhour=19;',
    end_date        => NULL,
    enabled         => TRUE,
    comments        => 'daily job.');
END;
/

Display Date Type in Oracle SQLPLUS

To display the date type in the desired format, we can manually format the date using to_char() function. Or, we use "alter session set NLS_DATE_FORMAT" to set the environment variable and the format will be applied to all date columns in any queries.

SQL> select sysdate from dual;

SYSDATE
---------
16-MAR-15

SQL> select to_char(sysdate,'YYYYMMDD:HH24:MI:SS') from dual;

TO_CHAR(SYSDATE,'
-----------------
20150316:07:16:02

SQL> alter session set NLS_DATE_FORMAT = 'YYYYMMDD:HH24:MI:SS';

Session altered.

SQL> select sysdate from dual;

SYSDATE
-----------------
20150316:07:16:31

Sunday, March 15, 2015

Replace Consecutive Blanks With a Single Blank (Continued)

In an earlier post Replace Consecutive Blanks With a Single Blank, we use SQL regexp_replace() to replace consecutive blanks with a single blank. We may also use Linux command tr to do the same before we load a text file into a database.

$ cat x.txt
hello    world!
This   is a  test .
Using "tr -s ' '" command to squeeze consecutive blanks into one.
$ cat x.txt | tr -s ' '
hello world!
This is a test .
Many Linux/Unix commands are very useful to clean up text files. I am using Windows but I installed open source Cygwin that provides a Linux environment. In another example Calculate Histogram for a Text File, we calculate the histogram using Linux commands.

Saturday, March 14, 2015

Import XML File Into Oracle

We can import XML files into an Oracle database schema. First, we use BFILE data type to store the XML file name. Then we use xmltype() function to convert the file content into oracle XMLTYPE.

create table test_docs (id number, text_doc bfile);
truncate table test_docs;
insert into test_docs values (1, BFILENAME('DM_DUMP', 'test.xml'));
SQL> select xmltype(text_doc, nls_charset_id('AL32UTF8')) xml_doc from test_docs where id=1;
We may also create a new table containing the xmltype data type column using CTAS (Create Table As Select) query as shown below.
create table test_xml 
as select id, xmltype(text_doc, nls_charset_id('AL32UTF8')) xml_doc 
from test_docs;

Thursday, March 12, 2015

Fix Views That Stop Working

A view is based on query against other database objects that may involve tables, views, database links, etc.. Sometimes when the underline objects changes, the view may become invalid. For example, a view is defined based on a remote table. When I dropped and recreate the database link pointing to the remote table, the querying against the view returns error.


SQL*Plus: Release 11.2.0.1.0 Production on Sun Nov 30 10:09:48 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL>  select count(*) from V_CLAIM
old   1: select count(*) from &1
new   1: select count(*) from V_CLAIM
select count(*) from V_CLAIM
                     *
ERROR at line 1:
ORA-04063: view "PROD.V_CLAIM" has errors
I use dbms_utility.invalidate procedure to "fix" the view as shown below. First, we need to find the object_id for the view.
SQL> select object_name, object_id from user_objects where object_name='V_CLAIM';
OBJECT_NAME                               OBJECT_ID
---------------------------------------- ----------
V_CLAIM                                   16995
SQL> exec dbms_utility.invalidate(16995);

PL/SQL procedure successfully completed.

SQL> @ct V_SH_FH_CLAIM
old   1: select count(*) from &1
new   1: select count(*) from V_CLAIM

  COUNT(*)
----------
     24782

Move Large Amount of Data Using Insert Into

When we insert many records into a table, it is faster to use hint /*+ append */. This way, Oracle will use direct path load to insert large amount of data that is much faster than conventional insert. As we see from the example below, when /*+ apeend */ hint is used, insert only takes 0.18 second vs 0.49 with conventional insert.

SQL> insert /*+ append */ into tbl_y select * from TBL_CLAIM_ALL@dl_apex2;

1031 rows created.

Elapsed: 00:00:00.18
SQL> commit;

Commit complete.

Elapsed: 00:00:00.07
SQL> insert into tbl_y select * from TBL_CLAIM_ALL@dl_apex2;

1031 rows created.

Elapsed: 00:00:00.49
SQL> commit;

Wednesday, March 11, 2015

More on Get Source Code for Oracle Database Objects

The following are the three ways to get the source code for Oracle database objects such as views, materialized views, procedures, functions, synonyms, database links etc.
1. For views, we query user_views. This is described in Get the source code for Oracle database views and materialized view
2. For materialized view, we query user_mviews. This is also described in Get the source code for Oracle database views and materialized view.
3. For objects, including views and materialized views, we can use dbms_meta.get_ddl(). See Generate SQL Create Table/View Queries for Existing Tables/Views, SQL Scripts for Oracle Database Link and Hide PL/SQL Scripts For Function or Procedure For example, the following query returns the "create table" statement.

SQL> select dbms_metadata.get_ddl('TABLE', 'TBL_A') from dual;

DBMS_METADATA.GET_DDL('TABLE','TBL_A')
--------------------------------------------------------------------------------

  CREATE TABLE "PROD"."TBL_A"
   (    "ID" NUMBER,
        "VALUE" VARCHAR2(64),
        "VALUE2" VARCHAR2(32)
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DE
FAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "PROD"

Monday, March 09, 2015

Terminate Oracle Session on Amazon RDS

To kill an Oracle session on Amazon RDS instance, in stead of using "alter system kill session", we use the procedure provided by Amazon rdsadmin.rdsadmin_util.kill(sid, serial#).
First, we find out the sid and serial# for the session that we want to terminated. We log in as the master user of an Amazon RDS Oracle instance.
SQL> select username, sid, serial#, command, status from v$session;

USERNAME                              SID    SERIAL#    COMMAND STATUS
------------------------------ ---------- ---------- ---------- --------
TESTPROD723                            12      56369          3 ACTIVE
                                       13         17          0 ACTIVE
RDSADMIN                               15         27          0 INACTIVE
TEST1001                               21      12787          0 INACTIVE
                                       28      50603          0 ACTIVE
                                      609          1          0 ACTIVE
                                      610          1          0 ACTIVE
                                      611          1          0 ACTIVE
                                      612          1          0 ACTIVE
                                      613          1          0 ACTIVE
                                      614          1          0 ACTIVE
                                      615          1          0 ACTIVE
                                      616        103          0 ACTIVE
                                      618          3          0 ACTIVE
                                      619          7          0 ACTIVE
                                      622          1          0 ACTIVE
                                      623          3          0 ACTIVE
                                      625        131          0 ACTIVE
TEST3                                 627      35083         74 ACTIVE

29 rows selected.
We are not allowed to run "alter system kill session".
SQL> alter system kill session '627,35083';
alter system kill session '627,35083'
*
ERROR at line 1:
ORA-01031: insufficient privileges
We can terminate a session using rdsadmin.rdsadmin_util.kill
SQL> exec rdsadmin.rdsadmin_util.kill(627,35083);

PL/SQL procedure successfully completed.

Debug Oracle PL/SQL scripts

To debug PL/SQL scripts, I have found two commands are extremely helpful, "set echo on" and "show errors". "Set echo on" prints the scripts in a file along with the line number. "show errors" displays the compilation errors. By combining the outputs from the above two commands, we can easily identify where the problems are and fix them. When we are done debugging, we can turn off the display using "set echo off". In the example below, I put my "create or replace procedure" in a script file called debug_proc.sql.

SQL> set echo on
SQL> @debug_proc.sql
SQL> create or replace procedure
  2  proc_test_ins(
  3  p1 number,
  4  p2 date)
  5
  6  is
  7
  8  begin
  9
 10  insert into tbl_test_x(
 11  id, dt)
 12
 13  select
 14  p1, p2 from dua;
 15  commit;
 16
 17
 18  end;
 19
 20  /

Warning: Procedure created with compilation errors.
The "set echo on" show the line number for each line of scripts. To see the detailed compilation errors, we run "show errors" command.
SQL> show errors
Errors for PROCEDURE PROC_TEST_INS:

LINE/COL
--------
ERROR
--------------------------------------------------------------
--------------------------------------------------------------
------------------------------------------------------------
10/1
PL/SQL: SQL Statement ignored

14/13
PL/SQL: ORA-00942: table or view does not exist
As we see, there is a typo on line 14, "from dua" should be "from dual".
SQL> create or replace procedure
  2  proc_test_ins(
  3  p1 number,
  4  p2 date)
  5
  6  is
  7
  8  begin
  9
 10  insert into tbl_test_x(
 11  id, dt)
 12
 13  select
 14  p1, p2 from dual;
 15
 16  commit;
 17
 18
 19  end;
 20
 21  /

Procedure created.
To turn off the display of the scripts and the line numbers, I run "set echo off".
SQL> set echo off
SQL> @debug_proc.sql

Procedure created.
Now we can run the procedure.
SQL> exec proc_test_ins(1, sysdate);
SQL> select * from  tbl_test_x;

     ID DT
------- ---------
      1 09-MAR-15

Monday, February 09, 2015

Refresh Materialized View Daily at Specific Time

We can define a specific time of the day to refresh a materialized view. For example, the following query makes the existing materialized view to be refreshed immediately and then every day at 7pm.

SQL> alter materialized view MV_NAME refresh start with sysdate next trunc(sysdate)+19/24;
In the above SQL command, trunc(sysdate) makes the date start from the beginning of the day (midnight). The time is then added by 19/24 of the day. For Oracle date type, 1 unit is a day.

Monday, January 19, 2015

List Materialized Views In a Refresh Group

To show the members in a materialized view refresh group, we can use the following query that takes advantage of DBA_RGROUP and DBA_RCHILD. DBA_RGROUP includes all refresh groups. DBA_RCHILD contains all the children in any refresh group.

SQL>  select r.owner, r.name gp_name, c.name mv_name 
from DBA_RCHILD c, DBA_RGROUP r 
where c.owner=r.owner and c.REFGROUP=r.REFGROUP 
order by r.owner, r.name, c.name;

OWNER                          GP_NAME                        MV_NAME
------------------------------ ------------------------------ ------------------------------
TEST_COMPS                     MV_GRP                         MV_SH_ABC
TEST_COMPS                     MV_GRP                         MV_SH_DEF
TEST_COMPS                     MV_GRP                         MV_SH_TEST

Hide PL/SQL Scripts For Function or Procedure

To protect the intellectual properties or for security reasons, we may want to hide the source PL/SQL code. To do this, we use "wrap" command to translate the PL/SQL script files into a format that can not be understood by human beings but can be processed by the database.
For example we have the following script file add_func.sql.

$ cat add_func.sql
create or replace function add2 (
  a number,
  b number
)
return number
is
begin
return a+b;
end;
/
We run wrap to convert it into a plb file that is not understandable by human beings.
oradba@bdm64-PC ~/projects_c/sql
$ wrap iname=add_func.sql
Above command generates the following file, add_func.plb. The following are its content.
create or replace function add2 wrapped
a000000
b2
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
8
52 7d
bWldIYVqPP9njmOrnggm/xpHL1wwg8eZgcfLCNL+XlpZoX8JUI8JabjDpZmBMsCyJfvCkA5+
0S5E4sqxyFDKJHWpF3yctwqYsCxZtMwF+uzZPXKV7Pumreu29A==
We connect to the database and run the file add_func.plb to create the function just like running a regular PL/SQL script file.
SQL> @add_func.plb
The following queries show that the source code for this function is not readable. This is exactly what we want.
SQL> select text from user_source where name='ADD2';

TEXT
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
------------------------------------------------------------
function add2 wrapped
a000000
b2
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
8
52 7d
bWldIYVqPP9njmOrnggm/xpHL1wwg8eZgcfLCNL+XlpZoX8JUI8JabjDpZmBMsCyJfvCkA5+
0S5E4sqxyFDKJHWpF3yctwqYsCxZtMwF+uzZPXKV7Pumreu29A==


SQL> select dbms_metadata.get_ddl('FUNCTION', 'ADD2') from dual;

DBMS_METADATA.GET_DDL('FUNCTION','ADD2')
--------------------------------------------------------------------------------

  CREATE OR REPLACE FUNCTION "PROD"."ADD2" wrapped
a000000
b2
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
8
52 7d
bWldIYVqPP9njmOrnggm/xpHL1wwg8eZgcfLCNL+XlpZoX8JUI8JabjDpZmBMsCyJfvCkA5+
0S5E4sqxyFDKJHWpF3yctwqYsCxZtMwF+uzZPXKV7Pumreu29A==