Sunday, June 17, 2012

How to securely mask credit card numbers

Often banks need to send large amount of data to outside vendors to perform data analysis. It is safer to mask sensitive data items such as credit card numbers, social security numbers, etc. The following is one approach to securely mask credit card numbers.

1. Generate a random number for each credit card number.
2. Generate a rank for each random number using row_number().
3. Generate the masked credit card number based on the rank.
Step 1. create table tbl_card_w_rand_num as select card_number, dbms_random.random() rnd from tbl_cards;
Step 2. Generate the masked credit card numbers based on the rank of random numbers. Keep this table in a safe place.

create table tbl_masked_card_num as with tbl as(
select a.*, row_number() over(order by rnd) rnk from tbl_card_w_rand_num a)
select a.*, 'A'||lpad( rnk,15,'0') masked_card_num from tbl a;
Step 3. Create data using masked_card_num

Wednesday, June 13, 2012

Calculate moving average: SAS vs Oracle


Oracle analytic windows functions provide a great way to calculate cross-row information such as moving average. The following examples show that using Oracle analytic function to calculate moving average is more compact, readable and less error-prone than using SAS scripts.

1. Calculating Moving Average using Oracle Analytic Function
create table ds2 as
select a.*, avg(num) over(partition by account_id order by month rows between 4 preceding and current row) mov_aver from ds1 a;

2. Calculating Moving Average using SAS
proc sort data=ds1;
by  account_id  month;
run;
%let n = 4;
data ds2;
set ds1;
by account_id;
retain num_sum 0;
if first.account_id then do;
count=0;
num_sum=0;
end;
count+1;
last&n=lag&n(num);
if count gt &n then num_sum=sum(num_sum,num,-last&n);
else num_sum=sum(num_sum,num);
if count ge &n then mov_aver=num_sum/&n;
else mov_aver=.;
run;

Sunday, June 10, 2012

Converting SAS scripts into Oracle PL/SQL. Why Bother?

I am converting large quantity of existing SAS scripts into Oracle. So far, all the SAS scripts can be implemented into Oracle PL/SQL (mostly SQL). Given the same input data, SAS and Oracle produce the exactly same result most of the time. In very limited number of cases, there are very small discrepancies in SAS and Oracle results. One of such example is calculating percentiles. However, these discrepancies are very small and explainable.

Why the conversion of existing SAS scripts into Oracle PL/SQL?  The following are the reasons.
1. Production data existing in Oracle DB all the time. It is better to run analytics processes within the database without data movement and enjoy all those benefits offered by DB, e.g., security, manageability, scalability, indexes, materialized views,table partitions, SQL, etc.
2. To democratize data analytics. Only limited number of people know how to use SAS( many of them have at least master degrees). However, there are far more SQL programmers/analysts who are querying data on daily basis. To us, data analytics is simply a more advanced way of querying data, nothing more, nothing less. It is natural to  take advantage of analytics power using SQL.

Tuesday, June 05, 2012

SAS proc rank and its Oracle SQL equivalent


The following SAS rank and Oracle SQL produce the same results. In SAS, when there are ties, the average rank is used. To do the same in Oracle, we calculate the unique row numbers and then its average based on group id and variable (amt).

SAS rank.
proc rank data=tbl_in  out=tbl_out ties=mean descending;
by grp;
var amt;
ranks amt_rnk;
run;

Oracle SQL rank.
create table tbl_out
as with tbl as
(
select a.*,
row_number() over(partition by grp order by  amt  desc)  amt_rnk0  from tbl_in a
) select a.*, (avg( amt_rnk0  ) over(partition by  grp,  amt )) amt_rnk
  from tbl a;

Saturday, June 02, 2012

Five Ways of Creating Unique Record Identifier For Oracle Tables

I have used the following five ways to create a unique record identifier for an Oracle table using SQL.
1. Simply use Oracle pseudocolumn rowid
Oracle pseudocolumn rowid comes with every table. It looks like the following.
SQL> select rowid from table_test where rownum <3;
ROWID
------------------
AAAYLPAAVAAEvLTAAA
AAAYLPAAVAAEvLTAAB
2. Create a  unique record id using rownum
create table table_test_new as select rownum as uniq_id, a.* from table_test a;
3. Create a unique record id using row_number()
create table table_test_new2 as select row_number() over(order by LOG_DATE)  uniq_id, a.* from  table_test a;
4. Create a unique record id using sequence.
create sequence rec_id_seq;
create table table_test_new3 as select rec_id_seq.nextval uniq_id, a.* from  table_test a;
5. Create a unique record id using sys_guid()
According to Oracle document, SYS_GUID generates and returns a globally unique identifier (RAW value) made up of 16 bytes.
create table table_test_new4 as select sys_guid() uniq_id, a.* from  table_test a;
What are other ways to create a unique record id that I am missing?