Since many companies store their critical business data in Oracle databases, it is advantageous to perform random sampling within the same environment using SQL without data movement. For example, it is time-consuming to pull a large data set out of a database and do random sampling using Python on a laptop computer. In addition, the data are prone to various security issues once they are not protected by the database.
A Competition-winning data scientist and long-time Oracle SQL practitioner Dr. Jay Zhou creates an online course and shares his expertise in performing random sampling using Oracle SQL. Students will learn practical skills that can be applied immediately in their work. There were hundreds of people from 85 countries who took the course.
- How to quickly view random samples of the data. There are multiple ways to do this task.
- How to select a precise number of samples randomly.
- How to split data randomly. This is a necessary task when we build a machine learning model and need to produce three data sets, i.e., training, testing, and validation sets.
- How to select random samples by groups. For example, we want to randomly select 100 students, 50 of them female and 50 male, from a school.