Monday, May 28, 2012
Randomly Sample a Precise Number of Records
With Oracle dbms_random.random and row_number function, we can easily randomly select a certain number of records from the data.
The following scripts select 50 records randomly from each group (strata). First, a number number is generate for each record. Then a rank is generated by row_number function based on the random number for each strata. Finally, only records ranked top 50 in each strata are selected. The rank of random number is also random.
create table tbl_id_w_rnd as select id, strata, dbms_random.random() rnd from tbl_data;
create materialized view mv_precise_50_samples as
With tbl as
Select a.*, row_number() over(partition by strata order by rnd) rnk from tbl_id_w_rnd a
Select * from tbl where rnk<=50;
select strata, count(1) from mv_precise_50_samples group by strata order by strata;
Labels: random sampling