I have been tasked with developing a web based (i.e runs in browser) viewer for a proprietary log file.
I have no control over the format of the logs, I just consume them. The log file contains binary data appended by a text message on each line, so part of each line must be de-serialized on read.
What I am posing, is what is the preferred approach here to quickly read the file and make it available for searching and page of text retrieval.
My approaches:
I have a web service read the file and bulk insert its contents
- Into a SQL server installation
- Into a serverless no-SQL like LiteDb
- Into a serverless Sqlite instance with one sqlite file per log file.
The data is wiped after that log file data has not been viewed for a week.
The problem is these approaches works fine on files that are less than 100mb but quickly breaks down with log files that sometime grow to be 2gb.
Even on a 2 gb file, parsing is relatively fast and seemingly not the bottleneck.
On the serveless options, writing the parsed data is also relatively quick
Query of full text is not great on any options, though once an index as has time to build on the Sql Server, it's decent.
I am wondering if anyone has any advice / experience with this sort of project – basically, quickly de-serialize a large file and bulk insert it into some sort of data store (maybe an in-memory data-store would be better) and make it available for interaction via a web service.
CLARIFICATIONS:
@Basile – Sorry, I didn't mean to imply that, I just wanted to make sure I was looking down the best avenue and had the best suited solution
@Basile – The log files are dead simple, just a 45 byte binary blob that I read as bytes and copy to a struct (in .NET), then I read the rest of the text until a new line char. Repeat until EOF.
@DocBrown Sorry, I read this, thought it was appropriate see here
@Christophe I don't have 3 DBs, those are just the three approaches I've tested so far.
Best Answer
I don't think it is a great idea to preload 2GB of log data. That would make for an unbearable user experience, and if you run this thing on a production server you will set off a bunch of alarms in the NOC.
I would focus on keeping a small memory footprint and reading as little of the file as possible. There are ways to search the file without actually loading it all. Some common use cases:
User wants to see log data from a certain portion of the file, as indicated by dragging a scrollbar.
User wants to see log data for a certain time period