How to process file line by line in multiple threads and one thread for each line in c# .net?

I've created a c# WPF project, I've to process a csv file having some records which may not be limited to few hundreds or few thousands or millions. I need to read the line of record, then process the record which generally takes 5 to 10 seconds and then update the record with new value.

The operation consists of a network call to server through web service, the server then calls another server to connect to authority server, the authority server responds back with data in the same loop as requested. The authority takes time because it is having a very large database consists of about one billion records. So, to encrypt decrypt and authenticate operation takes about 5-10 seconds to process completely.

I can not perform the operation in one thread as for processing whole file may take months so I want to create hundreds of threads which will process the data. The approach I'm thinking is that I'm trying to create a thread which creates threads up to 100 and monitors them for free threads if available. When a thread returns data after process then it writes it in file and create new thread for new line to process.

This approach I'm thinking seems to be too complex, should I implement the same and how or how should I resolve the problem.

Jon Skeet
people
quotationmark

There are two options that can help you here:

Parallel LINQ is the simpler option, but provides a lot less customization. It would look something like:

var results = File.ReadLines("input.csv")
                  .AsParallel()
                  .AsOrdered()
                  .WithDegreeOfParallelism(100)
                  .Select(ProcessLine);

File.WriteAllLines("output.csv", results);

(You need to implement the ProcessLine method, of course.)

Now that will give you a lot of parallelism, but probably via lots of threads which are blocked a lot of the time... whereas a more sophisticated solution would end up using asynchronous IO so that actually you probably hardly need any actual threads.

One thing to be aware of: if you're making web requests over the network, you may need to configure the maximum number of requests you can make in parallel to the host. See ServicePointManager.DefaultConnectionLimit and the <connectionManagement> settings element.

people

See more on this question at Stackoverflow