Php – parsing raw email in php

emailPHP

I'm looking for good/working/simple to use PHP code for parsing raw email into parts.

I've written a couple of brute force solutions, but every time, one small change/header/space/something comes along and my whole parser fails and the project falls apart.

And before I get pointed at PEAR/PECL, I need actual code. My host has some screwy config or something, I can never seem to get the .so's to build right. If I do get the .so made, some difference in path/environment/php.ini doesn't always make it available (apache vs cron vs CLI).

Oh, and one last thing, I'm parsing the raw email text, NOT POP3, and NOT IMAP. It's being piped into the PHP script via a .qmail email redirect.

I'm not expecting SOF to write it for me, I'm looking for some tips/starting points on doing it "right". This is one of those "wheel" problems that I know has already been solved.

Best Answer

What are you hoping to end up with at the end? The body, the subject, the sender, an attachment? You should spend some time with RFC2822 to understand the format of the mail, but here's the simplest rules for well formed email:

HEADERS\n
\n
BODY

That is, the first blank line (double newline) is the separator between the HEADERS and the BODY. A HEADER looks like this:

HSTRING:HTEXT

HSTRING always starts at the beginning of a line and doesn't contain any white space or colons. HTEXT can contain a wide variety of text, including newlines as long as the newline char is followed by whitespace.

The "BODY" is really just any data that follows the first double newline. (There are different rules if you are transmitting mail via SMTP, but processing it over a pipe you don't have to worry about that).

So, in really simple, circa-1982 RFC822 terms, an email looks like this:

HEADER: HEADER TEXT
HEADER: MORE HEADER TEXT
  INCLUDING A LINE CONTINUATION
HEADER: LAST HEADER

THIS IS ANY
ARBITRARY DATA
(FOR THE MOST PART)

Most modern email is more complex than that though. Headers can be encoded for charsets or RFC2047 mime words, or a ton of other stuff I'm not thinking of right now. The bodies are really hard to roll your own code for these days to if you want them to be meaningful. Almost all email that's generated by an MUA will be MIME encoded. That might be uuencoded text, it might be html, it might be a uuencoded excel spreadsheet.

I hope this helps provide a framework for understanding some of the very elemental buckets of email. If you provide more background on what you are trying to do with the data I (or someone else) might be able to provide better direction.

Related Solutions

Javascript – How to validate an email address in JavaScript

Using regular expressions is probably the best way. You can see a bunch of tests here (taken from chromium)

function validateEmail(email) {
    const re = /^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
    return re.test(String(email).toLowerCase());
}

Here's the example of regular expresion that accepts unicode:

const re = /^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i;

But keep in mind that one should not rely only upon JavaScript validation. JavaScript can easily be disabled. This should be validated on the server side as well.

Here's an example of the above in action:

function validateEmail(email) {
  const re = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
  return re.test(email);
}

function validate() {
  const $result = $("#result");
  const email = $("#email").val();
  $result.text("");

  if (validateEmail(email)) {
    $result.text(email + " is valid :)");
    $result.css("color", "green");
  } else {
    $result.text(email + " is not valid :(");
    $result.css("color", "red");
  }
  return false;
}

$("#email").on("input", validate);

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<label for=email>Enter an email address:</label>
<input id="email">
<h2 id="result"></h2>

Php – How to prevent SQL injection in PHP

The correct way to avoid SQL injection attacks, no matter which database you use, is to separate the data from SQL, so that data stays data and will never be interpreted as commands by the SQL parser. It is possible to create SQL statement with correctly formatted data parts, but if you don't fully understand the details, you should always use prepared statements and parameterized queries. These are SQL statements that are sent to and parsed by the database server separately from any parameters. This way it is impossible for an attacker to inject malicious SQL.

You basically have two options to achieve this:

Using PDO (for any supported database driver):

 $stmt = $pdo->prepare('SELECT * FROM employees WHERE name = :name');

 $stmt->execute([ 'name' => $name ]);

 foreach ($stmt as $row) {
     // Do something with $row
 }

Using MySQLi (for MySQL):

 $stmt = $dbConnection->prepare('SELECT * FROM employees WHERE name = ?');
 $stmt->bind_param('s', $name); // 's' specifies the variable type => 'string'

 $stmt->execute();

 $result = $stmt->get_result();
 while ($row = $result->fetch_assoc()) {
     // Do something with $row
 }

If you're connecting to a database other than MySQL, there is a driver-specific second option that you can refer to (for example, pg_prepare() and pg_execute() for PostgreSQL). PDO is the universal option.

Correctly setting up the connection

Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:

$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'password');

$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

In the above example the error mode isn't strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error when something goes wrong. And it gives the developer the chance to catch any error(s) which are thrown as PDOExceptions.

What is mandatory, however, is the first setAttribute() line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren't parsed by PHP before sending it to the MySQL server (giving a possible attacker no chance to inject malicious SQL).

Although you can set the charset in the options of the constructor, it's important to note that 'older' versions of PHP (before 5.3.6) silently ignored the charset parameter in the DSN.

Explanation

The SQL statement you pass to prepare is parsed and compiled by the database server. By specifying parameters (either a ? or a named parameter like :name in the example above) you tell the database engine where you want to filter on. Then when you call execute, the prepared statement is combined with the parameter values you specify.

The important thing here is that the parameter values are combined with the compiled statement, not an SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters, you limit the risk of ending up with something you didn't intend.

Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name variable contains 'Sarah'; DELETE FROM employees the result would simply be a search for the string "'Sarah'; DELETE FROM employees", and you will not end up with an empty table.

Another benefit of using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.

Oh, and since you asked about how to do it for an insert, here's an example (using PDO):

$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');

$preparedStatement->execute([ 'column' => $unsafeValue ]);

Can prepared statements be used for dynamic queries?

While you can still use prepared statements for the query parameters, the structure of the dynamic query itself cannot be parametrized and certain query features cannot be parametrized.

For these specific scenarios, the best thing to do is use a whitelist filter that restricts the possible values.

// Value whitelist
// $dir can only be 'DESC', otherwise it will be 'ASC'
if (empty($dir) || $dir !== 'DESC') {
   $dir = 'ASC';
}