Friday, July 20, 2012

SAS data set vs. relational database table

There is a key difference between SAS data sets and relational database tables. In SAS data set, like a text file, there is a natural, fixed order of records (or observations in SAS term). The order of a data set will not change unless we specifically modify it using method like "proc sort". So in SAS, it makes perfect sense to say "extract the first 500 observations from a data set".

However, in a relational database table, there is no natural order defined for records in a table. To say "exact the first 500 records from a table" does not make sense. We have to specifically define an order. Thus in a relational database, the following statements all make sense.
"Extract 500 records randomly from a table."
"Extract the top 500 records with the highest account numbers from a table (assuming account numbers are unique)".

Caution needs to be taken when converting SAS data set to a relational data table. The natural order of records in the SAS data set will be lost. One way to preserve the natural order of a SAS data set is to  create an extra column based on SAS system variable _N_.