Bash – Fast way of finding lines in one file that are not in another

I have two large files (sets of filenames). Roughly 30.000 lines in each file. I am trying to find a fast way of finding lines in file1 that are not present in file2.

For example, if this is file1:

line1
line2
line3

And this is file2:

line1
line4
line5

Then my result/output should be:

line2
line3

This works:

grep -v -f file2 file1

But it is very, very slow when used on my large files.

I suspect there is a good way to do this using diff(), but the output should be just the lines, nothing else, and I cannot seem to find a switch for that.

Can anyone help me find a fast way of doing this, using bash and basic Linux binaries?

EDIT: To follow up on my own question, this is the best way I have found so far using diff():

 diff file2 file1 | grep '^>' | sed 's/^>\ //'

Surely, there must be a better way?

Bash – Fast way of finding lines in one file that are not in another

Best Answer

Related Topic

Best Answer

Related Solutions

Can grep show only words that match search pattern

Regex – Negative matching using grep (match lines that do not contain foo)

Related Topic