Friday, August 5, 2011

Sed - An Introduction

In this post, I want to discuss about a Unix tool called Sed. Sed is the abbreviation of stream editor.

Sed (stream editor) is a Unix utility that parses text and implements a programming language which can apply transformations to such text. Sed allowed powerful and interesting data processing to be done by shell scripts. The essential command that is used is "s". For example create a file a.text with body "Hello, welcome". Then run the following command.

$ sed s/welcome/bye/ <a.text >c.text

Then in the same directory there will be a new file with name c.text and body "Hello, bye". Here "s" command created a new file with substitution to "welcome" by "bye". We can also see the working of s in terminal.

$ echo welcome | sed s/welcome/bye/

This will give output as "bye". The first "welcome" after the echo is the text content and "s/welcome/bye" is responsible for the substitution. Just try this,

$ echo you are welcome boss | sed s/welcome/going/

We will get "you are going boss". Here s is the substitution command, "welcome" is the search pattern and "going" is the replacement pattern. The symbol "/" is the delimiter.

The "&" symbol can be used for the pattern matching.

echo welcome boss | sed 's/[a-z][a-z]/(&)/'

This will give an output of "(we)lcome boss", because [a-z] matches a single letter. Here 2 [a-z] are there. To match the entire word,

$ echo welcome boss | sed 's/[a-z]*[a-z]*/(&)/'

Now output will be "(welcome) boss". If you want to match the entire string then try,

$ echo welcome boss | sed 's/[a-z]* [a-z]*/(&)/'

Now output will be "(welcome boss)". If we have a text like "welcome boss123", then the above command is will not put the entire string into the bracket. 123 will be outside the bracket.

$ echo welcome boss123 | sed 's/[a-z]* [a-z]*/(&)/'

Output will be "(welcome boss)123". So try,

$ echo welcome boss123 | sed 's/[a-z]* [a-z]*[0-9]*/(&)/'

Now output will be (welcome boss123). We can also use multiple "&" symbols.

$ echo welcome boss123 | sed 's/[a-z]* [a-z]*[0-9]*/& &/'

Output will be "welcome boss123 welcome boss123".

Now you have an idea about the use of "&". If you wanted to keep the first word of a line, and delete the rest of the line, mark the important part with the parenthesis,

$ echo welcome boss123 | sed 's/\([a-z]*\).*/\1/'

Output will be "welcome". We can also switch the words using \1,

$ echo welcome boss123 | sed 's/\([a-z]*\) \([a-z]*\)/\2 \1/'

Output will be "boss welcome123"

You can add additional flags after the last delimiter. These flags can specify what happens when there is more than one occurrence of a pattern on a single line, and what to do if a substitution is found.

These are some basics about sed and there is lots of commands that can be used in sed.



No comments:

Post a Comment