Deep Data Mining Blog: October 2016

Wednesday, October 26, 2016

Ranking High in Kaggle Competition is a Huge Advantage for Job Seekers

For someone who is looking for a job in data analytics field, high rankings in Kaggle Competition will give him tremendous advantage. Employers see the competition winners having strong problem solving skills and hands-on expertise. Indeed, to be able to complete some projects for Kaggle competitions, participants have to put in a lot of effort to deal with data issues even before any predictive models can be built. This is very similar to that in the real world applications where 80% of time is spent on data cleanse and manipulation. It is not a surprise that some employers prefer Kaggle competition winners over PhD graduates whose skills are perceived as more theoretical.

Taking my nephew, Yuyu Zhou, as an example. He got a master degree in data analytics. While he was in school, he spent a few weeks with other classmates to participate in Kaggle competitions. His team has achieved the top 3% and 5% in two Kaggle prediction competitions respectively (see my blog post a Young Data Scientist- Kaggle Competition Top 5% Winner: Yuyu Zhou. Once he graduated, he was quickly hired by a prestigious company and has been earning PhD level salary. Those few weeks he spent on Kaggle Competition was the best time investment of his life.

Thursday, October 20, 2016

Oracle Ora_Hash function- Part 1 Random Sampling

Oracle ora_hash() is a very useful function. I have used it for different purposes such as generating random number. The following query generate 5 buckets from 0 to 4, each of them have the similar number of records.
First, we create a table and populate it with 1,000 numbers.

create table t_n (n number);

begin
for i in 1..1000
 loop 
 insert into t_n values(i);
 end loop;
 commit;
end;

In the query below, the parameter 5 of ora_hash defines the number of buckets is 5. As we see, each bucket has simlilar number of records.

with tbl as
(
select ora_hash(n, 5) bucket, n  from t_n)
select bucket, count(*), min(n), max(n) from tbl
group by bucket order by 1;
BBUCKET COUNT(*) MIN(N) MAX(N)
0 154          2  993
1 164          7  999
2 168          6  991
3 175          4  995
4 173          8 1000

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics

Popular Topics

Wednesday, October 26, 2016

Ranking High in Kaggle Competition is a Huge Advantage for Job Seekers

Thursday, October 20, 2016

Oracle Ora_Hash function- Part 1 Random Sampling