Tuesday, January 15, 2019

About Dr. Zhou's Oracle SQL for Data Science Course


On January 31, 2017, I was invited by Prof. Wei Ding at the Department of Computer Science, University of Massachusetts Boston, and gave 3 talks about my data science projects across different industries. These talks are extremely well received. The following is what Prof. Ding says about my talks.

"It was a fortune to have Jay come to our computer science department to share his experience in solving business problems with predictive analytics on February 28, 2017. What Jay had presented in his 3 talks, each lasting for 1 hour in different topics of data mining, was totally impressive and beyond our wildest expectation. Having built competition-winning predictive models for some of the biggest companies and produced hundreds of millions of dollars’ savings, Jay shared the secret of his success with students and faculty without reservation. His strong presentations were such an inspiration for our computer science students and faculty and his methodology was innovative and powerful, even for very seasoned data scientists among the audience. Jay, thank you so much for your hard work preparing and delivering these presentations!" -Prof. Ding Wei, Department of Computer Science, University of Massachusetts Boston

The audience are particularly amazed by how I come up with solutions using Oracle SQL environment. To share my expertise, I create the online course Oracle SQL for Data Science to show how to perform common data science tasks using Oracle SQL and the benefits for doing that.

I let Charlie Berger,Senior Director of Product Management, Machine Learning, AI and Cognitive Analytics at Oracle know about my course and he told me "Your course is amazing."

Deep Learning World


The premier conference covering the
commercial deployment of deep learning




DeepLearning World is the premier conference covering the commercial deployment of deep learning. The event’s mission is to foster breakthroughs in the value-driven operationalization of established deep learning methods. DLW runs parallel to the established PredictiveAnalytics World for Industry 4.0 at the same venue. Combo passes are available.



How to turn Deep Tech into Broad Application

The hype is over: deep learning enters the “trough of disillusionment”. Companies are realizing that not every business problem requires the deep learning hammer. Of course, there are use cases that are best solved with artificial neural networks: image, speech and text recognition; anomaly detection and predictive maintenance on sensor data; complex data synthesis and sampling; reinforcement and sparse learning; and many more applications show the potential of artificial intelligence for real-world business scenarios. At the Deep Learning World conference data science experts present projects that went beyond experimentation and prototyping and showcase solutions that created economic value for the company. The case study sessions will focus on how it worked and what didn’t work while the deep dive sessions will explain topics such as RNN, CNN, LSTM, transfer learning and further in analytical and technical detail. Meet the European deep learning community in May in Munich and learn from well-known industry leaders!


Deep-data-mining.com blog readers receive 15% discount with code: DDMPAWDLW

Predictive Analytics World for Industry 4.0


6-7 May, 2019 – Munich
Predictive Analytics World is the leading vendor independent conference for applied machine learning for industry 4.0.
Business users, decision makers and experts in predictive analytics will meet on 6-7 May 2019 in Munich to discover and discuss the latest trends and technologies in machine & deep learning for the era of Internet of Things and artificial intelligence.

Putting Machine Intelligence into Production

Smart Factory, Smart Supply Chain, Smart Grid, Smart Transport: artificial intelligence promises an intelligent and fully automated future but reality is: most machines, most vehicles and most grids lack sensors and even where sensors do exist they might not be connected to the Internet of Things. Many companies invested in their infrastructure and are experimenting with prototypes e.g. for predictive maintenance, dynamic replenishment, route optimization and more, but even if they succeeded in delivering a proof of concept they face the challenge to deploy their predictive model into production and scale their analytics solution to company wide adoption. The issues are not merely analytical but a combination of technical, organisational, judicial and economic details. At the Predictive Analytics World for Industry 4.0 experienced data scientists and business decision makers from a wide variety of industries will meet for two days to demonstrate and to discuss dozens of real-world case studies from well-known industry leaders. In addition, predictive analytics experts will explore new methods and tools in special deep dive sessions in detail. Finally, the Predictive Analytics World is accompanied by the Deep Learning World conference, which focuses on the industry and business application of neural networks. Take the chance, learn from the experts and meet your industry peers in Munich in May!
 Hot topics on the 2019 Agenda:
  • Predictive Maintenance & Logistics
  • Anomaly Detection & Root Cause Analysis
  • Fault Prediction & Failure Detection
  • Risk Management & Prevention
  • Route & Stock Optimization
  • Industry & Supply Chain Analytics
  • Image & Video Recognition
  • Internet of Things & Smart Devices
  • Stream Mining & Edge Analytics
  • Machine ~, Ensemble ~ & Deep Learning
  • Process Mining & Network Analyses
  • Mining Open & Earth Observation Data
  • Edge Analytics & Federated Learning
… and many more related topics

PredictiveAnalytics World 4.0 will be co-located with Deep Learning World, the premier conference covering the commercial deployment of deep learning in 2019. Deep-data-mining.com blog readers receive 15% discount with code: DDMPAWDLW


Tuesday, January 08, 2019

Analytics & AI in Travel North America


Analytics & AI in Travel North America launched by EyeForTravel will take place on March 14-15 at the Hilton Parc 55 Hotel, San Francisco, USA. There will be over 350 senior data, analytics, pricing, product development and digital marketing experts from the world’s leading travel companies, the event will explore the strategies for brands to address the biggest opportunity right now – how to conquer hyper-personalization.
Confirmed speakers include Hilton’s SVP of Analytics, Google’s head of AI global product partnerships, Expedia’s Head of Platform – Loyalty, Wyndham Hotel Group’s Vice President of Global Revenue Management Operations and Sales, Carlson Wagonlit’s Principal Data Scientist, and many more.

Attendees can expect to explore insights into the following:

• Harnessing AI and Data to Transform your Loyalty Strategy: Discover how weaving AI into your business, capturing preference data and delivering a truly personalized service will give you the edge in winning loyal customers from your competition

• Overcoming Pricing Peril with Personalized & Real-Time Revenue Generation Tactics: Make the shift to real-time pricing on an individual level, Nail down the use-cases of how to overcome this, forecast like a pro and optimize direct revenue

• Getting Up Close and Personal with the Customer and Capitalize on Every Channel: Use AI to fuel CRM and CS to bring customer data to life at every touchpoint, use the rich and famous on social to avoid brand erosion and secure market share.

• Immersing Yourself in an AI-driven Predictive Future to Seize New Profits: It all comes down to being predictive if you want to turn new profits. Deliver AI-led futures in your company for more efficient internal mechanics and travel customer-centricity

• Driving Real-Time, Hyper-Personalization to Move Your Profit Needle: Delve into new levels of granularity, become the Amazon of travel and deliver the perfect travel itinerary every time for unstoppable loyalty

• Seizing Voice, AR and VR Makes You Grab that Conversion: Be part of the lucky few that benefits from voice enabled search, drive direct bookings and use AR and VR to give your customer the confidence to convert

• Dominating Direct Bookings Through A Mastery of Mobile: Create an AI-enabled mobile product that drives direct bookings, focus on UI and UX that screams out loyalty and bolster your bottom line

• Outclassing your Competition with Total RM and Surge Ancillary Sales: Build state-of-the-art infrastructure that supports ancillary revenue and squeeze every ounce of profit from all revenue streams

Please check out the following icon for more information.

Thursday, December 20, 2018

Could Not Connect to Amazon RDS Oracle Database From Car Dealer WiFi

I am trying to connect to my Oracle database on Amazon RDS at a car dealer while my car is in service. My laptop is connecting to the public WiFi. When I try to connect to the Oracle server, I got "Error Message = IO Error: The Network Adapter could not establish the connection".

I realized the issue is caused by the new ip address not included Amazon Security Group inbound rules. I find my ip address. Then I log onto Amazon AWS console and find the security group associated with the DB instance. After I add a inbound rule 64.188.5.xxx/32, I am able to connect to the DB immediately.

Wednesday, December 19, 2018

Statistically Manufactured Personal Data

To avoid the trouble of dealing with personal data when we test our analytics processes, I have created mock personal data that closely reflect American population from statistical point of view. The largest data set has 1 million records with variables including first name, last name, sex, date of birth, social security number, address, phone number and email. The values of these variables are produced to be as realistic as possible to real American population. They represents about 0.33% of population in the United States.

These observations about the data 1 million mock personal data records are very close to the real statistics of the population in USA.

1.The top 4 states that have the most people are: California(138223 persons, %13.82), Texas(99217 persons, %9.92), Florida(69640 persons, %6.96) and New York(49979 persons, %5). These are close to the real distribution of the population in USA.
2. The female are 51% and the male are 49%.
3. Top 3 last names are Smith(10800 persons, %1.08), Williams(8000 persons, %.8) and Jones(6900 persons, %.69).
4. Top 3 female first names are Ava(4707 persons, %.93), Olivia(4508 persons, %.89) and Isabella(4311 persons, %.85) and top 3 male first names are Noah(5075 persons, %1.03), Elijah(4736 persons, %.96) and Liam(4434 persons, %.9).
5. The following table shows distributions of persons by age for both sexes. Women live longer than men.
                        Female           Male
Age Group        #        %       #  % 
   .Under 5 years 34603 6.81% 35656 7.25%
   .5 to 9 years 34707 6.83% 34010 6.92%
   .10 to 14 years 30192 5.94% 33013 6.72%
   .15 to 19 years 34361 6.76% 32689 6.65%
   .20 to 24 years 32512 6.39% 36647 7.45%
   .25 to 29 years 35626 7.01% 37278 7.58%
   .30 to 34 years 34344 6.76% 31977 6.50%
   .35 to 39 years 33325 6.55% 31927 6.49%
   .40 to 44 years 33332 6.56% 34456 7.01%
   .45 to 49 years 35070 6.90% 35443 7.21%
   .50 to 54 years 37321 7.34% 34876 7.09%
   .55 to 59 years 31623 6.22% 31315 6.37%
   .60 to 64 years 28801 5.67% 24218 4.93%
   .65 to 69 years 20999 4.13% 19881 4.04%
   .70 to 74 years 16617 3.27% 14065 2.86%
   .75 to 79 years 13520 2.66% 10272 2.09%
   .80 to 84 years 10693 2.10% 7983 1.62%
   .85 years and over 10754 2.12% 5894 1.20%
You may download a small file with 100 records free here. Free Download. Files with 5k, 50K, 250K and 1 million records are available for purchase at https://www.datamanufacturer.com.
File Name Description Price Buy
dm_mock_person_100.csv 100 mock personal data records. CSV format. free Free Download
dm_mock_person_5k.csv 5K mock personal data records. About 0.7M bytes. CSV format. $2.95  
dm_mock_person_50k.csv 50K mock personal data records. About 7M bytes. CSV format. $7.95  
dm_mock_person_250k.csv 250K mock personal data records. About 35M bytes. CSV format. $9.95  
dm_mock_person_1m.csv 1 million mock personal data records. About 140M bytes. CSV format. $39.95  

Tuesday, December 18, 2018

Generate Random String in Oracle

The following query generate random email address.
SQL> select dbms_random.string('l', 8)||'@'||dbms_random.string('l', 7)||'.com' 
        email from dual;

EMAIL
------------------------------------------------------------------------------------
irslsxrf@wltikyv.com
The first parameter 'l' means string will be created in lower cases.

Monday, December 17, 2018

Find the Most Frequent Values

To find the most frequent values, we can use STATS_MODE function. The following query shows areacode in state Missouri.
SQL> select areacode from T_PHONE_AREA where state='Missouri' order by 1;
  AREACODE
----------
       314
       314
       314
       314
       314
       314
       314
       314
       314
       314
       314
       314
       417
       417
       573
       573
       573
       636
       636
       636
       636
       636
       636
       660
       816
       816
       816
       816
       816
       816
       816
       816
       816
       816

34 rows selected.
In the following query, stats_mode(areacode) returns the areacode 314 that is the most frequent value.
SQL> select stats_mode(areacode) from T_PHONE_AREA where state='Missouri';

STATS_MODE(AREACODE)
--------------------
                 314

Remove the Last Word From a String Using Oracle SQL

I use the following query to remove the last word of a sentence.
with tbl as (select 'Boston city' as name  from dual)
select  name, substr(name, 1, instr(name,' ',-1)-1 ) simple_name  from tbl;

NAME        SIMPLE_NAME
----------- -----------
Boston city Boston     

Find Out Table Columns That Are Indexed

I use the following query to find out columns that are indexed for a table.
select index_name, column_name from USER_IND_COLUMNS where table_name='MYTABLE'

Oracle Function Based Index is Handy

In table t_my_table, column name is in lower case. However, I want to join this table with another table where names are in upper cases. I create a function based index.
create index t_my_tablei on t_my_table(upper(name));
That way, I don't to create another column or table that contains upper(name) and create index on it. When I join the two tables based on upper(a.name) = b.name, function based index upper(a.name) is used and it is fast.
select a.*, b.* from t_my_table a, my_another_table b where upper(a.name) = b.name;

Sunday, December 16, 2018

Amazon RDS Oracle Instance Running Out of Disc Space

My Oracle database instance on Amazon RDS runs out of disc space. I add more of them by modifying the instance and add extra disc space. This is the link to instructions.