Using grep, sed, awk, regex, pipes and redirects
When it comes to file manipulation in OS X, I find myself using these three commands often, and usually in combination. Almost every time I use them however, I have to reread the man pages, and review regular expressions. I end up googling the same things and ending up at the same pages, so I wanted to make a single post that will contain all the information and links necessary. Here we go…
GREP
“grep searches the named input FILEs (or standard input if no files are named, or the file name - is given) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.”
AWK
“Awk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more files specified as -f progfile. With each pattern there can be an associated action that will be performed when a line of a file matches the pattern. Each line is matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern.”
SED
“The sed utility is a stream editor that reads one or more text files, makes editing changes according to a script of editing commands, and writes the results to standard output.”
REGULAR EXPRESSIONS
Regular expressions allow you to use variables in your commands. Most of us are at least familiar with *, where *.db means “every file ending in .db”. There are however many useful regular expressions, and they are the key to executing the above commands efficiently. Brush up on regular expressions here.
PIPES AND REDIRECTS
In order to send the results from one command to another command you need to “pipe” them together. An example:
cat *.txt | grep "blatti.net"
cat *.txt will display the contents of every file that ends in “.txt” to the standard output. By piping those results, they become the input for the second command. grep "blatti.net" will take the standard input, and display each line that contains “blatti.net” on the standard output. So when these two commands are piped together, we will get every line that contains “blatti.net” in every file that ends in “.txt” in the working directory.
While most of your work can be done with standard input and output, you’ll probably want to write your results to a file. Enter redirection. Using the example above, lets say we wanted to take our results and write it to a file. This is done with the > redirector.
cat *.txt | grep "blatti.net" > results.txt
This command will take every line that contains “blatti.net” in every file that ends in “.txt” in the working director, and write them to the file “results.txt”. Note that this will overwrite any existing “results.txt” file that exists in the working directory. Which begs the question “What if I want to append instead of overwrite?” Well then use >> instead of >. The >> redirector is especially useful if you are creating logs.
Now you too can create obnoxious commands with limited knowledge! The command below is one I created using the resources listed in this article. It is part of a piece of software that pulls text out of a PDF and returns a specified section in a text file. (I’m sure it is inefficient - so someone chime in and tell me how to make it better so I’ll know for next time)
sed 's/^[0-9]$/Period &/’ /tmp/skypdf.txt | sed ’s/[A-Z]*$/&,/’ | sed ‘/Period/G’ | sed ‘$!N;s/\\n/ /’ | sed ‘/Period/{x;p;x;}’ | sed ’s/^ //’ > ~/Desktop/full.txt”
Now say that 10 times fast!
August 3rd, 2008 at 1:03 am
Tahnks for posting