Thursday, November 14, 2013

Add Line Number to Text File

Problem

We have the following text file.

$ cat ads_log_small.csv
ADS_ID,DEVICE_OS,NUM_IMPRESSION,NUM_CONVERSION
 32, Android, 1, 0
 32, Android, 1, 0
 32, Android, 1, 0
 32, Android, 2, 0
 32, Android, 1, 0
 32, Android, 2, 0
 32, Android, 1, 0
 32, Android, 3, 0
 32, Android, 2, 0

If we load records in the text file into a database table, the original order of records in the file is lost. In the post SAS data set vs. relational database table, we mentioned that,Unlike SAS data set, a database table is a set where records no order (unless we explicitly sort them by keys). If we want to preserve the original sequence of the record, how do we add a line number to the original file?

Solution

We can use Linux (or Cygwin on Windows) command awk to add a line number to the file.

$ cat ads_log_small.csv | awk '{if (NR==1) print "LINE_NUMBER" ,",",$0;    else print NR-1,",",$0}'
LINE_NUMBER , ADS_ID,DEVICE_OS,NUM_IMPRESSION,NUM_CONVERSION
1 ,  32, Android, 1, 0
2 ,  32, Android, 1, 0
3 ,  32, Android, 1, 0
4 ,  32, Android, 2, 0
5 ,  32, Android, 1, 0
6 ,  32, Android, 2, 0
7 ,  32, Android, 1, 0
8 ,  32, Android, 3, 0
9 ,  32, Android, 2, 0
We can write the output to a new file.
$ cat ads_log_small.csv | awk '{if (NR==1) print "LINE_NUMBER" ,",",$0;    else print NR-1,",",$0}' > ads_log_new.csv

No comments: