I don't have all the answers. Hopefully I can shed some light on it.
To simplify my previous statements about .NET's threading models, just know that Parallel Library uses Tasks, and the default TaskScheduler for Tasks, uses the ThreadPool. The higher you go in the hierarchy (ThreadPool is at the bottom), the more overhead you have when creating the items. That extra overhead certainly doesn't mean it's slower, but it's good to know that it's there. Ultimately the performance of your algorithm in a multi-threaded environment comes down to its design. What performs well sequentially may not perform as well in parallel. There are too many factors involved to give you hard and fast rules, they change depending on what you're trying to do. Since you're dealing with network requests, I'll try and give a small example.
Let me state that I am no expert with sockets, and I know next to nothing about Zeroc-Ice. I do know about bit about asynchronous operations, and this is where it will really help you. If you send a synchronous request via a socket, when you call Socket.Receive()
, your thread will block until a request is received. This isn't good. Your thread can't make any more requests since it's blocked. Using Socket.Beginxxxxxx(), the I/O request will be made and put in the IRP queue for the socket, and your thread will keep going. This means, that your thread could actually make thousands of requests in a loop without any blocking at all!
If I'm understanding you correctly, you are using calls via Zeroc-Ice in your testing code, not actually trying to reach an http endpoint. If that's the case, I can admit that I don't know how Zeroc-Ice works. I would, however, suggest following the advice listed here, particularly the part: Consider Asynchronous Method Invocation (AMI)
. The page shows this:
By using AMI, the client regains the thread of control as soon as the invocation has been sent (or, if it cannot be sent immediately, has been queued), allowing the client to use that thread to perform other useful work in the mean time.
Which seems to be the equivalent of what I described above using .NET sockets. There may be other ways to improve the performance when trying to do a lot of sends, but I would start here or with any other suggestion listed on that page. You've been very vague about the design of your application, so I can be more specific than I have been above. Just remember, do not use more threads than absolutely necessary to get what you need done, otherwise you'll likely find your application running far slower than you want.
Some examples in pseudocode (tried to make it as close to ice as possible without me actually having to learn it):
var iterations = 100000;
for (int i = 0; i < iterations; i++)
{
// The thread blocks here waiting for the response.
// That slows down your loop and you're just wasting
// CPU cycles that could instead be sending/receiving more objects
MyObjectPrx obj = iceComm.stringToProxy("whateverissupposedtogohere");
obj.DoStuff();
}
A better way:
public interface MyObjectPrx : Ice.ObjectPrx
{
Ice.AsyncResult GetObject(int obj, Ice.AsyncCallback cb, object cookie);
// other functions
}
public static void Finished(Ice.AsyncResult result)
{
MyObjectPrx obj = (MyObjectPrx)result.GetProxy();
obj.DoStuff();
}
static void Main(string[] args)
{
// threaded code...
var iterations = 100000;
for (int i = 0; i < iterations; i++)
{
int num = //whatever
MyObjectPrx prx = //whatever
Ice.AsyncCallback cb = new Ice.AsyncCallback(Finished);
// This function immediately gets called, and the loop continues
// it doesn't wait for a response, it just continually sends out socket
// requests as fast as your CPU can handle them. The response from the
// server will be handled in the callback function when the request
// completes. Hopefully you can see how this is much faster when
// sending sockets. If your server does not use an Async model
// like this, however, it's quite possible that your server won't
// be able to handle the requests
prx.GetObject(num, cb, null);
}
}
Keep in mind that more threads != better performance when trying to send sockets (or really doing anything). Threads are not magic in that they will automatically solve whatever problem you're working on. Ideally, you want 1 thread per core, unless a thread is spending much of its time waiting, then you can justify having more. Running each request in its own thread is a bad idea, since context switches will occur and resource waste. (If you want to see everything I wrote about that, click edit and look at the past revisions of this post. I removed it since it only seemed to cloud the main issue at hand.)
You can definitely make these request in threads, if you want to make a large number of requests per second. However, don't go overboard with the thread creation. Find a balance and stick with it. You'll get better performance if you use an asynchronous model vs a synchronous one.
I hope that helps.
First, I agree with the sentiment of some of the commenters above that what you need may simply be sensible data structures. Each data structure can fulfill a simple contract to both read in from some byte stream and write out to a byte stream. For files of 50MB size, that may be all you need. Take that into account with the rest of the answer.
However, I feel that you may be trying to pry at some deeper concepts here.
The first that comes to mind is efficiency with buffers. I believe a common trick here is to have preallocated buffer "parts" of a known size and use lists of the buffer "parts". In C#, the use of IList> comes to mind as an efficient wrapper around presumably preallocated arrays. See here. Note that these buffer sizes often have had affinity to disk sector size and memory page size as well. Efficient structure definition up front can allow for interesting optimizations later. For example, the TAR archive format uses a 512 byte header record for this sort of reason. If you're copying a file out of a TAR, your sector boundaries don't get messed up, which can be very nice.
Second, I have to wonder if a study of the design behind the rope for string handling might yield some insight. It follows a similar line of thought. This would be useful depending on your editing strategy.
Best Answer
I do not think you will gain much, if anything, in performance, by using memory-mapped files instead of performing normal text-file processing. From the moment that you change the length of a single line even by just one byte, the remainder of the file will need to be read, shifted by one byte, and written back to disk. From the point of view of I/O, this is equivalent to normal text-file processing: Read a line, modify it, write it, repeat. And the headache of having to do by yourself all the text processing is probably not worth the hassle.
Have you established an acceptable performance metric for your system?
Have you tried the normal text file processing approach and found it to exceed that metric before starting to look for a more efficient solution?