Monday, May 28, 2012

Randomly Sample a Precise Number of Records



With Oracle dbms_random.random and row_number function, we can easily randomly select a certain number of records from the data.

The following scripts select 50 records randomly from each group (strata). First, a number number is generate for each record. Then a rank is generated by row_number function based on the random number for each strata. Finally, only records ranked top 50 in each strata are selected. The rank of random number is also random.

create table tbl_id_w_rnd as select id, strata, dbms_random.random() rnd from tbl_data;

create materialized view mv_precise_50_samples as
With tbl as
(
Select  a.*, row_number() over(partition by strata order by rnd) rnk from tbl_id_w_rnd  a
)
Select * from tbl where rnk<=50;
--test
select strata, count(1) from  mv_precise_50_samples group by strata order by strata;

1 comment:

Unknown said...

Thanks for providing us with such a useful information. Keep up the good work and continue providing us more quality information from time to time.

Sample for research