Thursday, November 14, 2013

How to Change Delimiters of Text File

Problem


A text file may contain many fields separated by delimiters. The delimiters could be tab, pipe(|), comma, etc. The following is an example of such a file. It has 4 fields separated by pipe (|). Some utilities only correctly recognize certain type of delimiters. How do we change the delimiter from one type to another, e.g., from pipe (|) to comma(,)?
$ head ads_log.txt
ADS_ID|DEVICE_OS|NUM_IMPRESSION|NUM_CONVERSION
 32| Android| 1| 0
 32| Android| 1| 0
 32| Android| 1| 0
 32| Android| 2| 0
 32| Android| 1| 0
 32| Android| 2| 0
 32| Android| 1| 0
 32| Android| 3| 0
 32| Android| 2| 0

Solution

We can use an editor such as notepad to do character replacement. However, if the file size is very big, using editor to do the work is very slow. A better way would be using Unix/Linux command tr.If we are running Windows system, we can install free cygwin which simulates Unix.

$ cat ads_log.txt | tr '|' ','
ADS_ID,DEVICE_OS,NUM_IMPRESSION,NUM_CONVERSION
 32, Android, 1, 0
 32, Android, 1, 0
 32, Android, 1, 0
 32, Android, 2, 0
 32, Android, 1, 0
 32, Android, 2, 0
 32, Android, 1, 0
 32, Android, 3, 0
 32, Android, 2, 0
We can write the output to the resulting file ads_log.txt.
$ cat ads_log.txt | tr '|' ',' > ads_log2.txt

No comments: