Best Practices for Uploading, Previewing, and Editing Large Files Before Saving to Database

databasememoryperformanceuploadweb-development

I am allowing users to upload a relatively large .csv file (about 1MB) – it will contain thousands of records that each represent one of my business objects. I want the users to upload the file and then be displayed paged results (after I serverside parse it, add validation messages, etc). They can review/edit the results page by page until they are satisfied. At this point, they'd click a Save button and all of the data would be saved to my database. Alternatively, they could click a Cancel button – this would release all of the uploaded data and none of it would touch my database.

I would prefer to not use "staging" tables in my database. And by that I mean store the uploaded data in my db when it is uploaded and subsequently delete it if the user chooses to cancel the whole operation. Also, my situation is such that I would have to delete it from the "staging" table even if the user chose to save the data (working with legacy systems).

My initial reasoning for NOT using staging tables is primary the following:
this feature could only be a small subset of my needs. I could realistically have several extremely similar features of my site that could necessitate the need for "staging" tables. I would prefer not to have my database littered (for lack of a better word) with such tables.

I would also prefer to avoid saving the file to disk as that type of solution is IMHO not as scalable as storing the file within the database.

Also, it should be noted that all data validation and parsing must (in my circumstances) be done serverside.

My question is this: In order to prevent seemingly excessive/unnecessary "staging" tables in my database, is it a bad practice to store ALL of the uploaded file's data (i.e in a byte array or base64 encoded string) in a property on my view model? The intent being to keep this large-ish value on my view model while the user pages through and reviews/edits it.

Although I think this question is fairly technology independent, if it helps, I am using ASP.Net MVC/Web API and an MSSQL database.

Best Answer

Storing files in memory doesn't seem such a bad idea if there are only a few of them. 1Mb of data isn't very much on modern servers so it's all down to the level of simultaneous users you have and what happens when you 'run out' of memory.

In my experience I try to reduce the memory footprint size of each web service request because the server is usually a better judge of caching and allocating memory between processes and it's sometimes difficult to judge or even test how an application scales when the number of users increase and the memory pool is reducing rapidly as a result. If you go down this route then I'd recommend very careful load testing if memory is even slightly under stress.

Also, if you keep the pre-checked file in memory only then debugging/auditing the upload/edit/save cycle becomes harder because you have no record of what the file looked like before it was reviewed and saved so if you have any issue with the editing code long term (or a user even disputes what your editing code does) you can't trace back and look at the code's effect, even for a short period of hours or days. In memory processing needs more effort with debug/audit tools creation.

Alternatively, two possible suggestions:

  1. Have you considered running a second database instance with staging/temporary working tables in that database only? This would keep your main database 'clean' and (possibly expensive) deletion and post-deletion optimisation functions off your main database too. You could also store a 'purge' time on the table so a scheduled job then deletes old tables in case of a no-show in terms of a user never clicking the 'save' button. (They'd be prompted to re-upload their file if they left the gap between upload and save beyond a sensible threshold).

  2. If you're not expecting too many concurrent users of this functionality then you could also consider using an in memory cache (such as memcached) to store the file whilst it is being reviewed etc. A 1Gb memcached would store up to a thousand of your files simultaneously so unless you were under very heavy load, this would save you having to prompt for a re-upload from the user because the cache had flushed out your file. Note that this solution doesn't necessary preclude a good audit trail, if you choose to audit then you can turn on a simple script that reads memcached entries and writes them to disk for later audit without affecting live code.

Related Topic