Multipart/form-data – Why Use It for Mixed Data and File Transfers?

asp.netcfile handlingjsonwebforms

I'm working in C# and doing some communication between 2 apps I'm writing. I have come to like the Web API and JSON. Now I am at the point where I am writing a routine to send a record between the two servers that includes some text data and a file.

According to the internet I am supposed to use a multipart/form-data request as shown here:

SO Question "Multipart forms from C# client"

Basically you write a request manually that follows a format like so:

Content-type: multipart/form-data, boundary=AaB03x

--AaB03x
content-disposition: form-data; name="field1"

Joe Blow
--AaB03x
content-disposition: form-data; name="pics"; filename="file1.txt"
Content-Type: text/plain

 ... contents of file1.txt ...
--AaB03x--

Copied from RFC 1867 – Form-based File Upload in HTML

This format is quite distressing to someone who is used to nice JSON data. So obviously the solution is to create a JSON request and Base64 encode the file and end up with a request like this:

{
    "field1":"Joe Blow",
    "fileImage":"JVBERi0xLjUKJe..."
}

And we can make use of JSON serialization and deserialization anywhere we would like. On top of that, the code to send this data is quite simple. You just create your class for JSON serialization and then set the properties. The file string property is set in a few trivial lines:

using (FileStream fs = File.Open(file_path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    byte[] file_bytes = new byte[fs.Length];
    fs.Read(file_bytes, 0, file_bytes.Length);
    MyJsonObj.fileImage = Convert.ToBase64String(file_bytes);
}

No more silly delimiters and headers for each item. Now the remaining question is performance. So I profiled that. I have a set of 50 sample files that I would be needing to send across the wire that range from 50KB to 1.5MB or so. First I wrote some lines to simply stream in the file to a byte array to compare that to the logic that streams in the file and then converts it to a Base64 stream. Below are the 2 chunks of code that I profiled:

Direct Stream to Profile multipart/form-data

var timer = new Stopwatch();
timer.Start();
using (FileStream fs = File.Open(file_path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    byte[] test_data = new byte[fs.Length];
    fs.Read(test_data, 0, test_data.Length);
}
timer.Stop();
long test = timer.ElapsedMilliseconds;
//Write time elapsed and file size to CSV file

Stream and Encode to profile creating JSON request

var timer = new Stopwatch();
timer.Start();
using (FileStream fs = File.Open(file_path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    byte[] file_bytes = new byte[fs.Length];
    fs.Read(file_bytes, 0, file_bytes.Length);
    ret_file = Convert.ToBase64String(file_bytes);
}
timer.Stop();
long test = timer.ElapsedMilliseconds;
//Write time elapsed, file size, and length of UTF8 encoded ret_file string to CSV file

The results were that the simple read always took 0ms, but that the Base64 encoding took up to 5ms. Below are the longest times:

File Size  |  Output Stream Size  |  Time
1352KB        1802KB                 5ms
1031KB        1374KB                 7ms
463KB         617KB                  1ms

However, in production you would never just blindly write multipart/form-data without first checking your delimiter right? So I modified the form-data code so that it checked for the delimiter bytes in the file itself to make sure everything would be parsed ok. I didn't write an optimized scanning algorithm, so I just made the delimiter small so that it wouldn't waste a lot of time.

var timer = new Stopwatch();
timer.Start();
using (FileStream fs = File.Open(file_path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    byte[] test_data = new byte[fs.Length];
    fs.Read(test_data, 0, test_data.Length);
    string delim = "--DXX";
    byte[] delim_checker = Encoding.UTF8.GetBytes(delim);

    for (int i = 0; i <= test_data.Length - delim_checker.Length; i++)
    {
        bool match = true;
        for (int j = i; j < i + delim_checker.Length; j++)
        {
            if (test_data[j] != delim_checker[j - i])
            {
                match = false;
                break;
            }
        }
        if (match)
        {
            break;
        }
    }
}
timer.Stop();
long test = timer.ElapsedMilliseconds;

Now the results are showing me that the form-data method will actually be significantly slower. Below are results with times > 0ms for either method:

File Size | FormData Time | Json/Base64 Time
181Kb       1ms             0ms
1352Kb      13ms            4ms
463Kb       4ms             5ms
133Kb       1ms             0ms
133Kb       1ms             0ms
129Kb       1ms             0ms
284Kb       2ms             1ms
1031Kb      9ms             3ms

It doesn't seem that an optimized algorithm would do much better either seeing as my delimiter was only 5 characters long. Not 3x better anyways, which is the performance advantage of doing a Base64 encoding instead of checking the file bytes for a delimiter.

Obviously the Base64 encoding will inflate the size as I show in the first table, but its really not that bad even with Unicode capable UTF-8 and would compress well if desired. But the real benefit is my code is nice and clean and easily understandable and it doesn't hurt my eyeballs to look at the JSON request payload that much either.

So why on earth would anyone not simply Base64 encode files in JSON instead of using multipart/form-data? There are the Standards, but these do change relatively often. Standards are really just suggestions anyways right?

Best Answer

multipart/form-data is a construct created for HTML forms. As you've discovered the positive of multipart/form-data is the transfer size is closer to the size of the object being transferred--where in a text encoding of the object the size is inflated substantially. You can understand that internet bandwidth was a more valuable commodity than CPU cycles when the protocol was invented.

According to the internet I am supposed to use a multipart/form-data request

multipart/form-data is the best protocol for browser uploads because it's supported by all browsers. There is no reason to use it for server-to-server communication. Server-to-server communication is usually not form-based. Communication objects are more complex and require nesting and types--the requirements that JSON handles well. Base64 encoding is a simple solution for transferring binary objects in whatever serialization format you choose. Binary protocols like CBOR or BSON are even better because they serialize to smaller objects than Base64, and they are close enough to JSON that it (should be) an easy extension to an existing JSON communication. Not sure about CPU performance vs. Base64.