Tuesday, September 04, 2012

3 most useful little SQL functions for data manipulation

I have found the following three little SQL functions useful for data manipulation. The are almost always used in my data mining projects in these years. They are not fancy, but very convenient.

1. Mod function.
Mod(m,n) in Oracle returns the remainder of m divided by n. For example, mod(13,7)=6 because 6 is the remainder of 13 divided by 7. If m and n are positive integers, mod(m,n) is always within the range of 0 and n-1. We can use Mod function for random sampling. For SQL Server, Mod function is operator m%n.

select record_id from tbl_test2 order by record_id;
 RECORD_ID
----------
         1
         2
         3
         4
         5
         6
         7
The following query picks every other record id.
select record_id from tbl_test2 where mod(record_id,2)=1 order by record_id;
 RECORD_ID
----------
         1
         3
         5
         7
The following query picks every third record id.
select record_id from tbl_test2 where mod(record_id,3)=1 order by record_id;
 RECORD_ID
----------
         1
         4
         7
We can combine mod and dbms_random.random functions to perform random sample. See previous post "Random Sampling using SQL".
2. Nvl function
Oracle nvl(x,y) function replaces null value in x with y. The following query replaces null value with 'X';

select var, nvl(a.var, 'X')  from tbl_test3;
VAR                              NVL(A.VAR,'X')
-------------------------------- --------------------------------
A                                A
B                                B
                                 X
C                                C
3. Decode function
Decode function maps values of a variable to other values.  In a number of projects, I used decode function to convert the categories into weights (derived from logistic regression models). The following example assigns a weight of -.420916 to A, -.028324 to B and so on. A default weight of 0 is assigned to all other categories that are not listed.
decode(var,
'A ',-.420916,
'B',-.028324,
'C',-.186108,
'D',-1.165688,
'E',-.016754,
'F',-.153281,
'G',-.010159,
'H',-1.136179,
0)

No comments: