How to make (non-gnu-)grep ignore binary files

findgrephp-uxunix

Hey, I am on a HP-UX server here. When recursively grepping a directory tree, I have problems when the tree also contains binary files:
grep treats them as text files and displays very long lines containing a lot of non-printable characters. This not only makes the output hard to scan, but also often makes my terminal unusable (and writes funny strings to its title).

GNU-grep has an option --binary-file= which would help (and it does not print the matching line anyway for binary files), but I do not have GNU-tools availabe.

Is there a way to simulate the behavior of GNU-grep or to ignore files that look like they are binary?

Btw. if there is an easy way to do this in perl, that would be fine, too.

Best Answer

Building on the previous answer, you can use the "file" command to identify text files, and then limit your grep to only those files. For example:

  find dir -type f -print |
    xargs file |
    grep text |
    cut -f1 -d: |
    xargs grep "expression"

That's:

  • Find all files in directory "dir"
  • Pass these as arguments to "file"
  • Look for output from "file" containing the word "text"
  • Chop out the first colon-delimited field and use it as a filename
  • Search these files using grep.

This will fail in the case of filenames containing whitespace or colons, but will otherwise do what you want.