Bash – Using wget and Awk to count similar expressions

awkbashunixwget

I am trying to create a script that uses wget to download a data set and then awk to sort though the file and tell you the most common filter used which is $14 column. So far I have the wget function working as seen below,

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv

But then would I pipe that to an awk script or should I try to do it all in one script? Also, I know how you would check for common words, it would be something like

$14=="charcoal" {++charcoal}

but I am not sure how to implement this in an awk script. Any advice or help would be greatly appreciated.

Thanks, kevin

Best Answer

This prints the type of filter that occurs most.

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv | awk -F, '
    {
        filters[$14]++
    }
    END {
        for (filter in filters) {
            if (filters[filter] > max) {
                max = filters[filter]
                type = filter
            }
        }
        print type
    }'

You can easily print each of the types and their counts, if you prefer. AWK can do the sorting, if needed, or you can use the external sort utility.

Related Solutions

BASH Shell Scripting assistance, backup script extracts meta data from filename & moves file accordingly

Pure Bash Solution, using parameter expansion. See this for an explanation of PE.

foo='date-web2-v.7.052509.csv'
file=${foo%*.csv}
date=${file##*.}

month=${date:0:2}
day=${date:2:2}
year=${date:4:2}

I would probably use Perl for this and use parenthesis to capture what I want in groups from a regular expression.

Linux – remove line break using AWK

If you are using linux then this will substitute newlines to +

awk '{printf "%s+",$0} END {print ""}'

or with sed

sed ':a;N;$!ba;s/\n/+/g'

Best Answer

Related Solutions

BASH Shell Scripting assistance, backup script extracts meta data from filename & moves file accordingly

Linux – remove line break using AWK

Related Topic