Bash – Using wget and Awk to count similar expressions

awkbashunixwget

I am trying to create a script that uses wget to download a data set and then awk to sort though the file and tell you the most common filter used which is $14 column. So far I have the wget function working as seen below,

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv 

But then would I pipe that to an awk script or should I try to do it all in one script? Also, I know how you would check for common words, it would be something like

$14=="charcoal" {++charcoal} 

but I am not sure how to implement this in an awk script. Any advice or help would be greatly appreciated.

Thanks, kevin

Best Answer

This prints the type of filter that occurs most.

wget -O- http://energy.gov/sites/prod/files/FieldSampleAirResults_0.csv | awk -F, '
    {
        filters[$14]++
    }
    END {
        for (filter in filters) {
            if (filters[filter] > max) {
                max = filters[filter]
                type = filter
            }
        }
        print type
    }'

You can easily print each of the types and their counts, if you prefer. AWK can do the sorting, if needed, or you can use the external sort utility.