Linux – How to use bzdiff to find difference between 2 bzipped files with diff -I option

compressiondiff()linux

I'm trying to do a diff on MySQL dumps (created with mysqldump and piped to bzip2), to see if there are changes between consecutive dumps. The followings are the tails of 2 dumps:

tmp1:

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2011-03-11  1:06:50

tmp2:

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2011-03-11  0:40:11

When I bzdiff their bzipped version:

$ bzdiff tmp?.bz2 
10c10
< -- Dump completed on 2011-03-11  1:06:50
---
> -- Dump completed on 2011-03-11  0:40:11

According to the manual of bzdiff, any option passed on to bzdiff is passed on to diff. I therefore looked at the -I option that allows to define a regexp; lines matching it are ignored in the diff. When I then try:

$ bzdiff -I'Dump' tmp1.bz2 tmp2.bz2

I get an empty diff. I would like to match as much as possible of the "Dump completed" line, though, but when I then try:

$ bzdiff -I'Dump completed' tmp1.bz2 tmp2.bz2
diff: extra operand `/tmp/bzdiff.miCJEvX9E8'
diff: Try `diff --help' for more information.

Same thing happens for some variations:

$ bzdiff '-IDump completed' tmp1.bz2 tmp2.bz2
$ bzdiff '-I"Dump completed"' tmp1.bz2 tmp2.bz2
$ bzdiff -'"IDump completed"' tmp1.bz2 tmp2.bz2

If I diff the un-bzipped files there is no problem:

$diff -I'^[-][-] Dump completed on' tmp1 tmp2

gives also an empty diff.

bzdiff is a shell script usually placed in /bin/bzdiff. Essentially, it parses the command options and passes them on to diff as follows:

OPTIONS=
FILES=
for ARG
do
    case "$ARG" in
    -*) OPTIONS="$OPTIONS $ARG";;
     *) if test -f "$ARG"; then
            FILES="$FILES $ARG"
        else
            echo "${prog}: $ARG not found or not a regular file"
            exit 1
        fi ;;
    esac
done
[...]
bzip2 -cdfq "$1" | $comp $OPTIONS - "$tmp"

I think the problem stems from escaping the spaces in the passing of $OPTIONS to diff, but I couldn't figure out how to get it interpreted correctly.

Any ideas?

EDIT
@DerfK: Good point with the ., I had forgotten about them… I tried the suggestion with the multiple level of quotes, but that is still not recognized:

$ bzdiff "-I'\"Dump.completed.on\"'" tmp1.bz2 tmp2.bz2
diff: extra operand `/tmp/bzdiff.Di7RtihGGL'

Best Answer

At this point I would have given up on spaces and used Dump.completed so that the . matches the space (since this is a regex).

Bash strips a layer of quotes every time it evaluates the argument, as well as stripping a layer of quotes from what you typed when you hit return. Since that case statement insists that the option start with -, you'll need to try something like "-'\"IDump completed\"'" so that the bzdiff is executed with the argument -'"IDump completed"' which should be inserted in the $OPTIONS line as -"IDump completed" which should lead to diff being executed with -IDump completed as a single argument instead of two words.