Linux – Convert a raw email message to plain text in linux

emaillinuxmaildirunix-shell

I'm in the cur folder of a maildir store.
I want to cat a message .. pipe it to a command .. and have the body of the message spit out. Simple.

Example:
If its a mime message .. and there is a plaintext version .. show me the plaintext ..
If its an HTML message with no plaintext .. then render the HTML and give me some semblence of the message text …
If its just an image … display nothing .. or maybe an [image] placeholder ..

Why do I want this?
I'm trying to train spamassassin .. and I want to spit out key headers and an excerpt from the email body so I can quickly skim through all the messages and decide which ones are legit, which are ham, and which are spam …
I am already extracting a list of messages from the maildir that match a given X-Spam score .. and am displaying the headers I want .. I just need to append the body of the message .. but hit a roadblock

Some other questions here suggested using mutt. I installed that and looked at it – but from what I could see – I'd have to point it to the specific maildir .. which is going to complicate the process .. ideally I'd like something that just "interprets" an email message from a file and displays it

Your help is appreciated.
Thank You

Best Answer

I've managed to come up with the following script .. but its still a bit lacking. Was still refining it when I noticed Andrew suggested munpack from the mpack package

I found the tool reformime to extract the plain/text portion of the mime message. I was using GNU recode too but found that it was stripping stuff that wasn't quoted printable (QP) .. so I elected to use sed, probably quite inefficiently, to remove the QP code .. and substitute common characters that were QP escaped.

Here's the script I came up with .. I can go into a maildir folder now - run the script .. and get a summary of messages. Supplying an argument will match specific scores using regexp.

#!/bin/bash

DEFSC="3[0-9]"
SPAMSCORE=${1-$DEFSC}

echo "Scanning for messages with a Spam Score filter of ${SPAMSCORE}"

# Get a list of messages with desired spam score
grep "^X-Spam-Score: ${SPAMSCORE}\$" * | sed 's/:X-Spam-Score: [0-9-]*//g' > ~/tmpspam

while read MSG; do
    # Extract Message ID for easy reading
    MSGID=$(echo "${MSG}" | grep -oe '^[0-9]*')
    echo "================= ${MSGID} ================="
    # Find the headers that we are looking for
    grep -e "^X-Spam-Status" -e "^Subject:" -e "^From:" ${MSG} | sed -r 's/=\?[^?]*\?[^?]*\?([^?]*)\?=/\1/g;s/=20/ /g;s/=2C/,/g;s/=3A/:/g'
    # Use reformime to find which mime section is text/plain
    MIMESEC=$(cat ${MSG} | ~/reformime -i | grep -B 1 '^content-type: text/plain' | head -n 1 | grep -oe "[0-9\.]*$")
    # Display that Mime Section
    echo '- - - - - - - - - - - - - - - - - - - - - - '
    cat ${MSG} | ~/reformime -e -s ${MIMESEC} | awk '/./{a=a+1;if(a<=10){print $0;}}' | sed -r 's/https?:\/\/[A-Za-z0-9.?%+_@&;=\/-]*/<<url>>/g'
    echo '============================================'
done < ~/tmpspam

# Delete Temp File
rm -f ~/tmpspam

As an example:

skim_msgs.sh '4[0-9]'

OUTPUT: (finds one message)

Scanning for messages with a Spam Score filter of 4[0-9]
================= 1518851309 ================= 
From: John Doe <jdoe@gmail.com> 
Subject: Watch "If Cops Talked Like Pilots" on YouTube 
X-Spam-Status: No, score=4.1
- - - - - - - - - - - - - - - - - - - - - - 
<<url>>
============================================
Related Topic