File Handling – Optimize Reading of Multiple Files in C#

algorithmscfile handlingnet

I have a unique problem which I'm hoping someone can assist with.

I have One big text file, our Production file. The data in the file is delimited in the following format

Reference|Cost Centre|Analytics Base Value|.... 
UMBY_2288|023437|2883484|... 
NOT_REAL|1343534|283434|...

The average size of this file is about 30MB. with about 120000 rows.

and then I have about 20 Regional files.
these files are similar to the Current big file in structure. except that they are smaller. average size 50000 rows.

Now I have to loop through each line in the big Prod file. For each Reference code, I have to search through each of the 'Regional' files to see which ones contain that specific reference code. and then copying some of the data from that line into a report. There is no way of predetermining what files to look into. And each reference can be in multiple Regional files..

As you can imagine, looping through each row in each file, multiple times is a very time consuming process. Due to memory constraints, I can't load the files into memory.

Does anyone have any smart ideas on how I can do this? I don't need code samples. just pointers on ways I could solve this problem.

I'm developing the tool in C#.

Best Answer

The solution is to read each file once, storing the date in memory. Keep an associative array or similar data structure where the key is the reference number. Then, as you process the master file, looking up each reference should take just microseconds.

If the data is too big to fit in memory, you can create a temporary sqlite database.

Related Topic