Sunday, December 03, 2006

Display Only Unique Lines From a File

There are many situations in which you may only want to display unique lines. Here is an easy way of doing it.

#uniq filename

e.g. #uniq filename > outputfile (All of the duplicate lines are filtered out. outputfile only has unique lines in it.)

3 comments:

Anonymous said...

Be careful with this, on Linux it says that the lines must be SUCCESSIVE to be filtered out.

craig.terlau@mapledale.k12.wi.us said...

$ cat hosts1 hosts2 | egrep -o ..:..:..:..:..:.. | sort | uniq | wc -l

Here are some tools you can use to extract just unique MAC addresses from a text file containing Mac addresses. In this case, the MAC addresses need to be in the form of:

00:0d:93:6e:40:88

cat is used here with two input files, hosts1 and hosts2. They are both read and you could specify as many input files as you want.

egrep (extended grep) here looks for 6 character pairs separated by a : character. You can be pretty sure these are MAC addresses if they look like this 00:0d:93:6e:40:88

Then the output of egrep is piped into sort. This is because the uniq command only compares adjacent lines in a text file.

Next the output from sort is piped into uniq which strips out matching adjacent lines.

This output is sent to wc –l which returns the number of lines of output. Alternatively, the output of all of this could be redirected to another file


$ cat hosts1 hosts2 | egrep -o ..:..:..:..:..:.. | sort | uniq > hosts

Ben said...

The first person who responded is right; uniq will only filter successive unique lines. One easy way is to use sort with the uniquing option, like this:

sort -u myFile.txt

Although that will, of course, sort the lines before removing duplicates, but sort has a number of options for finding sort keys in the lines (such as for a timestamp).