Thursday, March 26, 2015

Best Practice In Loading Text Files Into Oracle Database

The post Five Ways of Loading Text Files Into Oracle Database shows common ways of loading test files. There are many issues when loading text files into an Oracle database. These issues include:

  • Data format conversions such as strings to numbers and strings to dates
  • NULl value representation in text files. NUll are ., n/a, null, blanks, etc.?
  • Other data representation issues such as $ in front of amount.
Thus it is important to find out effective ways to handle those data issues and make sure data are indeed loaded correctly. For many projects, I have used the following process of importing text files into an Oracle database.
  • Step 1. Define external tables pointing to the text files. In the external tables, we simply define all columns as string (varchar2). Creating an external table does not physically load the data. It just contains the "pointer" to the files. We can run SQL queries against an external table just like a regular table.
  • Step 2. Perform data cleansing and conversion using SQL functions and store the data into tables or materialized views.
For example, the following a few lines of text files to be loaded.
10,"ABASTECEDORA NAVAL Y INDUSTRIAL, S.A.",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
15,"ABDELNUR, Nury de Jesus","individual","CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
25,"ACEFROSTY SHIPPING CO., LTD.",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
36,"AEROCARIBBEAN AIRLINES",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
39,"AEROTAXI EJECUTIVO, S.A.",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
41,"AGENCIA DE VIAJES GUAMA",-0- ,"CUBA",-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0- ,-0-
First, we create an external table as shown below. All columns are simply defined as varchar2 as we will do the data conversion later.
create directory dir_files as '/home/data';
create table tbl_sdn_ext(
  v1  varchar2(512),
  v2  varchar2(512),
  v3  varchar2(512),
  v4  varchar2(512),
  v5  varchar2(512),
  v6  varchar2(512),
  v7  varchar2(512),
  v8  varchar2(512),
  v9  varchar2(512),
  v10  varchar2(512)
)
organization external
( type oracle_loader
default directory DIR_ORA_EXT
access parameters
( records delimited by newline
skip 1
fields terminated by ','
optionally enclosed by '"'
missing field values are null
)
location('sdn.csv')
);
Once we have the external table, we can apply SQL functions to do various data conversions and store the results into tables physically as shown below.
create table tbl_sdn2
as
with tbl as
(
select
case when trim(V1)='-0-' then null else upper(trim(V1)) end V1,
case when trim(V2)='-0-' then null else upper(trim(V2)) end V2,
case when trim(V3)='-0-' then null else upper(trim(V3)) end V3,
case when trim(V4)='-0-' then null else upper(trim(V4)) end V4,
case when trim(V5)='-0-' then null else upper(trim(V5)) end V5,
case when trim(V6)='-0-' then null else upper(trim(V6)) end V6,
case when trim(V7)='-0-' then null else upper(trim(V7)) end V7,
case when trim(V8)='-0-' then null else upper(trim(V8)) end V8,
case when trim(V9)='-0-' then null else upper(trim(V9)) end V9,
case when trim(V10)='-0-' then null else upper(trim(V10)) end V10
from tbl_sdn_ext
)
select
v1 sdn_id,
v2 full_name,
trim(substr(v2, 1, regexp_instr(v2,',')-1)) lnm,
trim(substr(v2, regexp_instr(v2,',')+1, length(v2))) fnm,
v3 dtype,
v4 country
from
tbl;

No comments: