Deep Data Mining Blog: a Young Data Scientist- Kaggle Competition Top 5% Winner: Yuyu Zhou

Saturday, March 05, 2016

a Young Data Scientist- Kaggle Competition Top 5% Winner: Yuyu Zhou

Yuyu Zhou is a graduate student in Analytics in University of New Hampshire. His team has achieved the top 3% and 5% in two Kaggle prediction competitions respectively. In an interview, I asked him how their predictive models performed so well. Yuyu said,

"One of the keys to the success is that we spend tremendous amount of time working on building feature variables. Those variables are usually the results of combining several raw variables. For example, the ratio between the body weight and height is a better variable in predicting a patient's health than using body weight or height alone."

"My training in computer science is extremely helpful in these projects. I am able to write Java, Python and SQL scripts to perform tasks such as data cleansing, data merge, and data transform, etc. As we know, more than 80% of time in a project is typically spent on those tasks before we start building predictive models."

"We have tried many type of predictive models and found that gradient boosting trees have consistently perform the best."

The following is a summary of Yuyu's contribution in those two projects.

Kaggle Competition: Rossmann Store Sales Prediction (ranked top 5%) Oct 2015 – Dec 2015

Built the Predictive Model for daily sales for Rossmann Stores using Python Machine Learning library.
Conducted data cleaning and feature engineering for increasing data quality.
Designed final prediction model by combining the multiple gradient boosting trees algorithms
Prediction accuracy was ranked at 163 out of 3303 teams

Kaggle Competition: Property Risk Level Prediction (ranked top 3%) July 2015 – Aug 2015

Developed Statistics models to predict risk level of properties which Liberty Mutual Inc is going to protect.
Led the team and conducted cost and benefit analysis on new ideas.
Implemented ideas using statistical packages from Python.
Prediction accuracy was ranked at 71 out of 2236 teams.

Yuyu is currently looking for a full time job in data analytics. Please feel free to contact him if you are hiring. He can be reached by email yuyu.zhou@hotmail.com or phone (508) 933-7311. Here is his LinkedIn Profile.

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics

Popular Topics

Saturday, March 05, 2016

a Young Data Scientist- Kaggle Competition Top 5% Winner: Yuyu Zhou

Kaggle Competition: Rossmann Store Sales Prediction (ranked top 5%) Oct 2015 – Dec 2015

Kaggle Competition: Property Risk Level Prediction (ranked top 3%) July 2015 – Aug 2015

No comments: