Friday, September 24, 2021
I coded everything from scratch including the algorithm and the user interface using C++. There were no prebuilt packages to use, no internet to search the information, and no such thing as asking questions on social media. The computer used had a 486 CPU, a 32-megabyte memory, and a hard drive with less than 500-megabyte storage.
How things have changed in 27 years! With a deep learning neural network, such as a convolutional neural network, many layers of neurons serving different purposes can be stacked together to form a complex structure and collectively perform recognitive tasks that were unthinkable in the past. And all these can be done with a few lines of Python scripts and the execution is often done in a cloud computing environment with virtually unlimited computation and storage resources. The progress is astonishing.
Wednesday, September 22, 2021
Since many companies store their critical business data in Oracle databases, it is advantageous to perform random sampling within the same environment using SQL without data movement. For example, it is time-consuming to pull a large data set out of a database and do random sampling using Python on a laptop computer. In addition, the data are prone to various security issues once they are not protected by the database.
A Competition-winning data scientist and long-time Oracle SQL practitioner Dr. Jay Zhou creates an online course and shares his expertise in performing random sampling using Oracle SQL. Students will learn practical skills that can be applied immediately in their work. There were hundreds of people from 85 countries who took the course.
- How to quickly view random samples of the data. There are multiple ways to do this task.
- How to select a precise number of samples randomly.
- How to split data randomly. This is a necessary task when we build a machine learning model and need to produce three data sets, i.e., training, testing, and validation sets.
- How to select random samples by groups. For example, we want to randomly select 100 students, 50 of them female and 50 male, from a school.
Are there any course requirements or prerequisites?Very basic Oracle SQL knowledge
Who this course is for?SQL developers, data analysts, data scientists, statisticians
Saturday, September 18, 2021
Whether it is inventory planning, or financial fraudulent transaction detection, or finding costly insurance claims, it generally holds true that 95% of the work can be resolved by automated algorithms. The remaining 5% needs to be done by domain experts using their expertise, intuition, and creativity. I call it the 95-5 rule of automation.
The 95-5 rule is not simply a division of the labor between machines and human experts flatly in that proportion. There is a structural and temporal implication in it. Algorithms are first applied to a raw problem, which involves a large number of cases and big data and is hard or inefficient to solve manually. This step produces as an output a simpler problem where the work is greatly reduced, by 95% generally. Human experts then work on this reduced problem and make their judgment calls to reach the final decision.
To recap, in the real world the 95-5 rule of automation works this way: applying algorithms to a raw problem to reduce the work by 95% and subsequently having human experts take on the reduced problem.
Here are the benefits as reported by our clients that have adopted solutions based on the 95-5 rule of automation.
- Improved outcomes. For example, a bank sees its fraud loss reduced by 70%. Another bank finds the bad debt rate dropping by 50%.
- Increased efficiency. In a K12 education company, content tagging is 100 times more efficient than a manual process.
- More jobs. A group in a bank hires more analysts because the operation there drives a good return on investment.
- Improved employees' morale. This is because they work on the reduced problem where the same amount of effort generates more fruitful outcomes. ( I did not realize this point until I saw a report produced by an independent department from a client company.)
When the rule is applied to inventory planning, our advanced optimization algorithm generates a set of recommended safety stocks for all items which serves as the foundation for planners to make further improvements.
One lesson that we have learned is that, unless it is an exceptionally simple circumstance, domain experts should not work with the raw problem directly. Unfortunately, the violation of this principle is happening every day resulting in ineffective, inefficient, and unscalable operations and a stressful workforce. The whole situation is avoidable.