Friday, September 24, 2021

An Antique Neural Network for Image Classification

Quite accidentally I found the picture below showing my research work in 1994 when I trained a neural network to identify the land use types from a satellite image. The neural network had a structure of 3 input neurons, corresponding to the blue, green, and red bands of a Landsat TM image, 14 hidden neurons, and 7 output neurons, representing seven land use types including the cornfields, wheat fields, water bodies, impervious surfaces, etc. The number of training data points is 65.

I coded everything from scratch including the algorithm and the user interface using C++. There were no prebuilt packages to use, no internet to search the information, and no such thing as asking questions on social media. The computer used had a 486 CPU, a 32-megabyte memory, and a hard drive with less than 500-megabyte storage.

How things have changed in 27 years! With a deep learning neural network, such as a convolutional neural network, many layers of neurons serving different purposes can be stacked together to form a complex structure and collectively perform recognitive tasks that were unthinkable in the past. And all these can be done with a few lines of Python scripts and the execution is often done in a cloud computing environment with virtually unlimited computation and storage resources. The progress is astonishing.

Wednesday, September 22, 2021

Online Course: Oracle SQL for Random Sampling

Since many companies store their critical business data in Oracle databases, it is advantageous to perform random sampling within the same environment using SQL without data movement. For example, it is time-consuming to pull a large data set out of a database and do random sampling using Python on a laptop computer. In addition, the data are prone to various security issues once they are not protected by the database.

A Competition-winning data scientist and long-time Oracle SQL practitioner Dr. Jay Zhou creates an online course and shares his expertise in performing random sampling using Oracle SQL. Students will learn practical skills that can be applied immediately in their work. There were hundreds of people from 85 countries who took the course.

The course begins with a description of scenarios where random sampling is necessary. A number of useful Oracle SQL random functions are introduced. The course uses examples and presents SQL scripts to perform the following common tasks.
  • How to quickly view random samples of the data. There are multiple ways to do this task.
  • How to select a precise number of samples randomly.
  • How to split data randomly. This is a necessary task when we build a machine learning model and need to produce three data sets, i.e., training, testing, and validation sets.
  • How to select random samples by groups. For example, we want to randomly select 100 students, 50 of them female and 50 male, from a school.

Are there any course requirements or prerequisites?

Very basic Oracle SQL knowledge

Who this course is for?

SQL developers, data analysts, data scientists, statisticians

Please take the course here.

Saturday, September 18, 2021

Taking Operational Efficiency to the Next Level: Leverage the 95-5 Rule of Automation

Through an odyssey of over two decades helping clients in various industries solve hard problems, I have gained a deep appreciation of a pattern that can be leveraged to dramatically improve the quality and efficiency of the work and, ultimately, the return on investment of businesses.

Whether it is inventory planning, or financial fraudulent transaction detection, or finding costly insurance claims, it generally holds true that 95% of the work can be resolved by automated algorithms. The remaining 5% needs to be done by domain experts using their expertise, intuition, and creativity. I call it the 95-5 rule of automation.

The 95-5 rule is not simply a division of the labor between machines and human experts flatly in that proportion. There is a structural and temporal implication in it. Algorithms are first applied to a raw problem, which involves a large number of cases and big data and is hard or inefficient to solve manually. This step produces as an output a simpler problem where the work is greatly reduced, by 95% generally. Human experts then work on this reduced problem and make their judgment calls to reach the final decision.

Take as an example our solution to a worker compensation insurance claim problem. A company receives about 200 worker injury claims daily. Our algorithm highlights 10 (5% of the total) of them as potentially costly using a machine learning model based on factors including age, cause of injury, and injury body parts. Using these 10 cases as a starting point, analysts review them carefully and take proper action. The solution has resulted in a 40% reduction in claim loss.

To recap, in the real world the 95-5 rule of automation works this way: applying algorithms to a raw problem to reduce the work by 95% and subsequently having human experts take on the reduced problem.

Here are the benefits as reported by our clients that have adopted solutions based on the 95-5 rule of automation.

  • Improved outcomes. For example, a bank sees its fraud loss reduced by 70%. Another bank finds the bad debt rate dropping by 50%.
  • Increased efficiency. In a K12 education company, content tagging is 100 times more efficient than a manual process.
  • More jobs. A group in a bank hires more analysts because the operation there drives a good return on investment.
  • Improved employees' morale. This is because they work on the reduced problem where the same amount of effort generates more fruitful outcomes. ( I did not realize this point until I saw a report produced by an independent department from a client company.)

When the rule is applied to inventory planning, our advanced optimization algorithm generates a set of recommended safety stocks for all items which serves as the foundation for planners to make further improvements.

One lesson that we have learned is that, unless it is an exceptionally simple circumstance, domain experts should not work with the raw problem directly. Unfortunately, the violation of this principle is happening every day resulting in ineffective, inefficient, and unscalable operations and a stressful workforce. The whole situation is avoidable.

The 95-5 rule of automation has worked for us remarkably. I hope you make the most of it in your organization and take operational efficiency to the next level.