Random sampling processes could become very sophisticated. One of such examples is a stratified random sampling method developed by the Department of Social and Health Services (DSHS) to audit hospital Medicaid claims. This method is described in a paper "Analysis of the Regression Estimator
and its Implementation by DSHS"( http://www.hca.wa.gov/documents/rfp/recovery_audit/exhibit_b.pdf)
Hospitals provide services for patients who are covered by Medicaid and then file claims to the states to get the payment. State governments want to perform post-payment reviews to make sure that they do not overpay hospitals. Because the size of claims from a hospital could be very large, only a randomly sampled claims will be audited.
The claims of highest 25 amounts are always reviewed. For the remaining claims, the process first divides them into many buckets, or strata, based on their claim amount. Random sampling will be performed for each stratum. The higher the variance in claim amount in a stratum, the more samples will be drawn from it. Please notice that the sample size is NOT proportional to the population size in the stratum.
The most common ways to implement the above methodology are probably using SAS or SQL. I have seen the SAS version of it. I also implemented two SQL versions. One SQL version strictly follows the steps in the SAS package. Another SQL version uses Oracle analytic functions heavily that results in 75% reduction in the volume of code. These analytic functions include row_number() over(), width_bucket () , sum() over(), count() over(), etc.
Popular Topics
Popular Topics
Showing posts with label row_number. Show all posts
Showing posts with label row_number. Show all posts
Wednesday, September 12, 2012
Wednesday, August 15, 2012
Rank in Oracle: rank() and dense_rank()
Unlike row_number() function, rank() and dense_rank() may produce duplicate ranks.
1. The data
select num from TBL_TEST order by num;
NUM
----------
1
1
2
2
2
3
4
4
5
2. rank() may have gaps.
select num, rank() over(order by num) as rnk from TBL_TEST order by num;
NUM RNK
---------- ----------
1 1
1 1
2 3
2 3
2 3
3 6
4 7
4 7
5 9
3. dense_ran() does not have gap in ranks.
select num, dense_rank() over(order by num) as rnk from TBL_TEST order by num;
NUM RNK
---------- ----------1 1
1 1
2 2
2 2
2 2
3 3
4 4
4 4
5 5
Thursday, August 02, 2012
Rank in Oracle
There are several ways to rank records based on certain variables. One of the is to use row_number
select * from TBL_TEST order by grp, num;
GRP NUM
---------- ----------
A 1
A 2
A 2
A 3
A 4
B 1
B 2
B 4
B 5
Row_number() generates unique sequece numbers without gap. Ties are given different numbers. I use row_number() to generate unique id.
Example 1. generate rank separately for GRP
with tbl as(
select a.grp, a.num, row_number() over(partition by GRP order by num) rnk from TBL_TEST a)
select * from tbl order by grp, rnk ;
GRP NUM RNK
---------- ---------- ----------
A 1 1
A 2 2
A 2 3
A 3 4
A 4 5
B 1 1
B 2 2
B 4 3
B 5 4
Example 2. Generate rank without partition by GRP
with tbl as(
select a.grp, a.num, row_number() over(order by grp, num) rnk from TBL_TEST a)
select * from tbl order by grp, rnk ;
GRP NUM RNK
---------- ---------- ----------
A 1 1
A 2 2
A 2 3
A 3 4
A 4 5
B 1 6
B 2 7
B 4 8
B 5 9
I will discuss rank functions including dense_rank(), ran() in another post.
select * from TBL_TEST order by grp, num;
GRP NUM
---------- ----------
A 1
A 2
A 2
A 3
A 4
B 1
B 2
B 4
B 5
Row_number() generates unique sequece numbers without gap. Ties are given different numbers. I use row_number() to generate unique id.
Example 1. generate rank separately for GRP
with tbl as(
select a.grp, a.num, row_number() over(partition by GRP order by num) rnk from TBL_TEST a)
select * from tbl order by grp, rnk ;
GRP NUM RNK
---------- ---------- ----------
A 1 1
A 2 2
A 2 3
A 3 4
A 4 5
B 1 1
B 2 2
B 4 3
B 5 4
Example 2. Generate rank without partition by GRP
with tbl as(
select a.grp, a.num, row_number() over(order by grp, num) rnk from TBL_TEST a)
select * from tbl order by grp, rnk ;
GRP NUM RNK
---------- ---------- ----------
A 1 1
A 2 2
A 2 3
A 3 4
A 4 5
B 1 6
B 2 7
B 4 8
B 5 9
I will discuss rank functions including dense_rank(), ran() in another post.
Tuesday, June 05, 2012
SAS proc rank and its Oracle SQL equivalent
The following SAS rank and Oracle SQL produce the same results. In SAS, when there are ties, the average rank is used. To do the same in Oracle, we calculate the unique row numbers and then its average based on group id and variable (amt).
SAS rank.
proc rank data=tbl_in out=tbl_out ties=mean descending;
by grp;
var amt;
ranks amt_rnk;
run;
Oracle SQL rank.
create table tbl_out
as with tbl as
(
select a.*,
row_number() over(partition by grp order by amt desc) amt_rnk0 from tbl_in a
) select a.*, (avg( amt_rnk0 ) over(partition by grp, amt )) amt_rnk
from tbl a;
Saturday, June 02, 2012
Five Ways of Creating Unique Record Identifier For Oracle Tables
I have used the following five ways to create a unique record identifier for an Oracle table using SQL.
1. Simply use Oracle pseudocolumn rowid
Oracle pseudocolumn rowid comes with every table. It looks like the following.
SQL> select rowid from table_test where rownum <3;
ROWID
------------------
AAAYLPAAVAAEvLTAAA
AAAYLPAAVAAEvLTAAB
2. Create a unique record id using rownum
create table table_test_new as select rownum as uniq_id, a.* from table_test a;
3. Create a unique record id using row_number()
create table table_test_new2 as select row_number() over(order by LOG_DATE) uniq_id, a.* from table_test a;
4. Create a unique record id using sequence.
create sequence rec_id_seq;
create table table_test_new3 as select rec_id_seq.nextval uniq_id, a.* from table_test a;
5. Create a unique record id using sys_guid()
According to Oracle document, SYS_GUID generates and returns a globally unique identifier (RAW value) made up of 16 bytes.
create table table_test_new4 as select sys_guid() uniq_id, a.* from table_test a;
What are other ways to create a unique record id that I am missing?
1. Simply use Oracle pseudocolumn rowid
Oracle pseudocolumn rowid comes with every table. It looks like the following.
SQL> select rowid from table_test where rownum <3;
ROWID
------------------
AAAYLPAAVAAEvLTAAA
AAAYLPAAVAAEvLTAAB
2. Create a unique record id using rownum
create table table_test_new as select rownum as uniq_id, a.* from table_test a;
3. Create a unique record id using row_number()
create table table_test_new2 as select row_number() over(order by LOG_DATE) uniq_id, a.* from table_test a;
4. Create a unique record id using sequence.
create sequence rec_id_seq;
create table table_test_new3 as select rec_id_seq.nextval uniq_id, a.* from table_test a;
5. Create a unique record id using sys_guid()
According to Oracle document, SYS_GUID generates and returns a globally unique identifier (RAW value) made up of 16 bytes.
create table table_test_new4 as select sys_guid() uniq_id, a.* from table_test a;
What are other ways to create a unique record id that I am missing?
Sunday, May 27, 2012
Calculate Gain Chart Using SQL in Oracle
The following script calculates the gain chart based on score and payment_ind (target variable, payment or non payment). The mod function is to reduce the number of data points in the gain chart
with tbl2 as
(select a.*, row_number() over(order by score desc ) risk_rank from tbl_testing_data a),
tbl3 as
( select a.*,
count(1) over(order by risk_rank) total_num,
sum(case when payment_ind='Y' then 1 else 0 end ) over(order by risk_rank) total_bad,
count(1) over(order by risk_rank)/count(1) over() pcnt_tot,
sum(case when payment_ind='Y' then 1 else 0 end ) over(order by risk_rank)/ sum(case when payment_ind='Y' then 1 else 0 end ) over()
pcnt_bad
from tbl2 a
)
select * from tbl3 where mod(risk_rank,100)=1 order by risk_rank;
Remove Duplicates From Data
It is quite common to have duplicates in the data. From example, an application is submitted or a claim is filed multiple times. With Oracle function row_number function, we can easily remove the duplicates.
The following SQL first calculates the rank of records by app_date for each app_id, then it only selects those records with ranks equals 1. The only earliest record for each app_id will be selected.
with tbl as
(
select a.*,
row_number() over(partition by app_id order by APP_DATE ) rnk from CELL_PHONE_SERVICE_APPS a
)
select * from tbl where rnk=1;
If we want to keep the latest record for each app_id, we simply generate rank for each app_id by the descending order of app_date as shown below.
with tbl as
(
select a.*,
row_number() over(partition by app_id order by APP_DATE desc ) rnk from CELL_PHONE_SERVICE_APPS a
)
select * from tbl where rnk=1;
Subscribe to:
Posts (Atom)