Bash avoide intrpretate special characters

bash

Hell everyone. I have a problem with the script I have wrote in bash. The script is responsible for simply recursively searching patterns given in [INPUT FILE] in [PATH]. If pattern is not found then it is written to [OPTIONAL OUTPUT FILE]. If [OPTIONAL OUTPUT FILE] is not given the default [OUTPUT FILE] name is: out.
The problem is with special character '.' (dot)
Here is the code of script:

#!/bin/bash

#This script is responsible for simply searching recursively patterns given in input file in path where we have to search. If pattern is not found then is written to output file;
#@version 1.0

function help()
{
    echo -e "This script is responsible for simply recursively searching patterns\ngiven in [INPUT FILE] in [PATH]. If pattern is not found then it is\nwritten to [OPTIONAL OUTPUT FILE]. If [OPTIONAL OUTPUT FILE] is not\ngiven the default [OUTPUT FILE] name is: out"
    echo 'Usage: ./search.sh [INPUT FILE] [PATH TO DIRECTORY] [OPTIONAL OUTPUT FILE]'
    echo 'e.g. : ./search.sh input_file /var/www/html/ output_file'
    echo 'or   : ./search.sh help -> this help'
}

in=$1
path=$2
out=${3:-out}

if [ $# -lt 2 ]; then help; exit; fi

if [ ! -e $in ]; then echo "Input file: $1 does not exist"; exit; fi

if [ ! -d $path ]; then  echo "Path: $path does not exist"; exit; fi

#Delete lines that are either blank or only contain spaces
sed -i '/^ *$/d' $in

tmp='tmpFile'
cat $in | sed -e 's,\\,\\\\,g' | sed -e 's,\",\\\",g' |  sed -e 's,-,\\-,g' | sed -e 's/\./\\./g' > $tmp
counter=0
#Write each line from input file and save it to array
while read line
do
    linesTable[$counter]=$line
    let counter++
done < $tmp

#Clear file
echo -n '' > $tmp

for line in "${linesTable[@]}"
do
    #Find recursively pattern line in path and save result to array
    echo "$line"
    table=($(grep -r -- "$line" $path))
#   echo $(grep -r -- "$line" $path)
    #If array is empty write string to tmp file
    if [ 0 -eq ${#table[@]} ]; then echo "$line" | tee -a $tmp; fi
done

#Free memory taken by arrays
unset table[@]
unset linesTable[@]
#Sort and remove repeated strings. Result save to output file
sort $tmp | uniq > $out
#Remove tmp file
rm -f $tmp

I can't avoid shell to interpret '.'
Here is content of input file called in:

asdf
1234
ALA MA
gtrrr
@
% asdf
~i
?
+
{
|
`
(
)
.
*
-
'
"
""
--
,
;
:
~
\\
\
~~~
printg("asdf\d%d\\\", &g);

Path with files is e.g. /home/user/test/
In this path I have 3 file e.g a b c:
a)

dddno
asdf

asdfasd

asdf
asd

b)

s;dfhiasdf
asdf
asd
fas--
--
0

asdf-
-

c)

d
dafdf
dd
re v
1234
v
c
v

I run script like this: ./search.sh in /home/user/test/ out .
In output file: out should be . (dot), but there is not. Could smb help me with this. I have stuck in this place.
Thank you in advance.


Hello Dennis. Thank You for your suggestions. It really help me, but I have a few questions more:
The purpose of this script is to find string patterns in input_file in given path.
So I guess i have to use grep -F option and remove part of sed expression.

sed -i '/^ *$/d' "$in"

but i don't know how to remove globally blank line and spaces to looks like you did it. I tried this but it doesn't work:

<"$in" sed -e '/^ *$/d'

so i obtain to my solution.
Second problem is your part of code (appending to an array) doesn't work for me:

patterns+=("$line")

I got this error:

./search.sh: line 45: syntax error near unexpected token `"$line"'
./search.sh: line 45: `         patterns+=("$line")'

I have tried to use let but it doesn't work either.

The script now looks like: 
#!/bin/bash

in="$1"
path="$2"
out=${3:-out}

function help()
{
    cat << EOF
This script is responsible for simply recursively searching patterns given in [INPUT FILE] in [PATH]. If pattern is not found then it is written to [OPTIONAL OUTPUT FILE]. If [OPTIONAL OUTPUT FILE] is not given the default [OUTPUT FILE] name is: out
Usage: $0 [INPUT FILE] [PATH TO DIRECTORY] [OPTIONAL OUTPUT FILE]
e.g. : $0 input_file /var/www/html/ output_file
or   : $0 help -> this help
EOF
}

#Delete lines that are either blank or only contain spaces
function extract_patterns()
{
    sed -i '/^ *$/d' "$in"
}

function report_missing_patterns()
{
    local pattern

    for pattern in "$@"; do
        grep -q -r -F -- "$pattern" "$path"
        #if [ 0 -ne $? ]; then printf "%s\n" "$pattern"; fi
        if [ 0 -ne $? ]; then echo "$pattern"; fi
    done
}

function process_patterns()
{
    local patterns line counter=0
    patterns=()

    while read -r line; do
        patterns[$counter]="$line"
        let counter++
    done < "$in"

    #report_missing_patterns "${patterns[@]}" | sort -u > "$out"
    report_missing_patterns "${patterns[@]}" | sort -u | tee "$out"
}

if [ $# -lt 2 ]; then help; exit 1; fi

if [ ! -e "$in" ]; then echo "Input file: $in does not exist"; exit 2; fi

if [ ! -d "$path" ]; then  echo "Path: $path does not exist"; exit 3; fi

extract_patterns | process_patterns

I have comment line #report_missing_patterns "${patterns[@]}" | sort -u > "$out"

because i wanted to display results on screen and redirect it to output_file.

Best Answer

I don't understand what particular problem you are stuck on. Your description is really unclear. So I'll give some general advice on simplifying the script; if it's not enough to solve your problem, try to come up with a clearer explanation.

I'm pretty sure this script is much more complicated than it needs to be. Spending a few minutes browsing the documentation of a command to see if one of its options could help you can save hours of debugging. Spending a few minutes thinking about the general structure of the script can save hours of debugging.


Here are several ways in which you could have made your script simpler.

  • All variable substitutions should be inside double quotes, i.e., always write "$foo" and not just $foo. You've done it sometimes, but not systematically. Always use double quotes unless you know why you do not want them in a particular case.

  • Here's a simpler way of writing your help function; it's called a “here document”.

    function help()
    {
        cat <<EOF
    This script is responsible for simply recursively searching patterns
    given in [INPUT FILE] in [PATH]. If pattern is not found then it is
    written to [OPTIONAL OUTPUT FILE]. If [OPTIONAL OUTPUT FILE] is not
    given the default [OUTPUT FILE] name is: out
    Usage: $0 [INPUT FILE] [PATH TO DIRECTORY] [OPTIONAL OUTPUT FILE]
    e.g. : $0 input_file /var/www/html/ output_file
    or   : $0 help -> this help
    EOF
    }
    
  • Give your script a non-zero exit code to indicate failure:

    if [ $# -lt 2 ]; then help; exit 2; fi
    if [ ! -e "$in" ]; then echo "Input file: $1 does not exist"; exit 2; fi
    if [ ! -d "$path" ]; then  echo "Path: $path does not exist"; exit 2; fi
    
  • Modifying the input file in is surprising, and you can combine the whitespace-only line removal with the multiple sed expressions that add a backslash before some characters.

    <"$in" sed -e '/^ *$/d' -e 's,[-\\".],\\&,g' > "$tmp"
    

    However the quoting you perform here is strange. Why are you quoting - and ", which are not special to grep, but not * and [ which are special? What is the syntax of the patterns supposed to be?

    If you meant the patterns to be literal strings to look for, all this work is unnecessary (except for removing whitespace-only lines): call grep -F.

  • In the part that reads the lines from $tmp, you don't need the counter variable, you can just append to the array. You also need to pass the -r argument to the read built-in, so that it doesn't strip some backslashes.

    while read -r line; do
        linesTable+=("$line")
    done <"$tmp"
    
  • In the loop over the patterns, you store the output of grep in a variable, but all you're doing is testing if grep found a match. It would be a lot easier (and faster) to use the return code of grep for that. (I've also removed what is presumably debugging output from the loop; you don't need tee to append to a file, just use the redirection operator >>.)

    for line in "${linesTable[@]}"; do
        grep -q -r -- "$line" "$path"
        if [ $? -ne 0 ]; then echo "$line" >>"$tmp"; fi
    done
    
  • You don't need to free memory at the end of the script. If this was really a function that was part of a bigger script, you should declare them with the local builtin.


Here's a restructured version of the part of your script after the command line parsing. I've incorporated the local changes outlined above and used functions to make the structure clearer. Note that the clearer structure means I don't need to use a temporary file. I don't know if the resulting script does what you want, since you don't explain precisely what you want.

function extract_patterns () {
    <"$in" sed -e '/^ *$/d' -e 's,[-\\".],\\&,g'
}

function report_missing_patterns () {
  local pattern
  for pattern in "$@"; do
    grep -q -r -- "$pattern" "$path"
    if [ $? -ne 0 ]; then printf "%s\n" "$pattern"; fi
  done
}

process_patterns () {
  local patterns line
  patterns=()
  while read -r line; do
      patterns+=("$line")
  done
  report_missing_patterns "${patterns[@]}" | sort -u >"$out"
}

extract_patterns | process_patterns