Deep Data Mining Blog: Calculate Statistical Mode- the Most Frequent Value

Thursday, August 08, 2013

Calculate Statistical Mode- the Most Frequent Value

Statistical mode is the value that happens most often (as shown in the chart below). For discrete variables, mode can be calculated using the same approach describe in post Find the most frequent items using SQL. We simply select the top 1 most frequent item.

For continuous variables, we can use the function width_bucket() to divide the data into segments and calculate the frequency in each segment. We can then pick the average value of the data points from the most frequent segment as the mode. We use the 1,000 random numbers described in the post Generate 1000 Normally Distributed Numbers. The following query calculate the frequency and average value for each of the segment defined by width_bucket(variable, min_value, max_value, number_of_buckets). Buckets 0 and 12 are for values lower than -3 and higher than 3, respectively.

SQL> select s, count(1), avg(num) from (select width_bucket(num, -3,3, 11) s, num from TBL_1K_RND) group by s order by s;

S COUNT(1) AVG(NUM)

---------- ---------- ----------

0 2 -3.0538402

1 5 -2.6387896

2 29 -2.1748723

3 58 -1.6073189

4 115 -1.0457551

5 185 -.53460548

6 208 .010066623

7 194 .525553927

8 119 1.05966651

9 47 1.5985678

10 32 2.14458861

11 4 2.67165765

12 2 3.06640121

Again, we use row_number() function to generate rank based on frequency and calculate the mode as follows.

with tbl_1 as
(
select s, count(1) cnt, avg(num) num from (select width_bucket(num, -3,3, 11) s, num from TBL_1K_RND) group by s order by s),
tbl_2 as
(
select a.*, row_number() over(order by cnt desc) rnk from tbl_1 a
)
select num from tbl_2 where rnk=1;

NUM
----------

.010066623

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics

Popular Topics

Thursday, August 08, 2013

Calculate Statistical Mode- the Most Frequent Value

No comments: