Improve LINQ query performance

Assuming there is a class

public class StopTime
{
    public TimeSpan? ArrivalTime { get; set; }
    public TimeSpan? DepartureTime { get; set; }
    public string StopID { get; set; }
    public int StopSequence { get; set; }
    public string TripID { get; set; }
}

I have to read data from a CSV file and map it to the mentioned class. The CSV file can have many records, in my case about 500000 records.

After I parse CSV file and map data to list of StopTime in different functions I want to filter StopTimes based on TripId. In my scenario I've about 8000 TripIds in the list of StopTime.

I have tried to create a Dictionary of a list using this code:

var TripIdStops = new Dictionary<string, List<StopTime>>();

foreach (var tripId in ListOfTripId)
{
    TripIdStops.Add(tripId, StopTimes.Where(x=>x.TripID==tripsDistinct).ToList());
}

To create the dictionary, this loop has to filter StopTimes, remember 500000 records and 8000 instances of TripIds.

However, this is a very time consuming task. Is there a way to improve the performance?

Jon Skeet
people
quotationmark

It sounds like you want a lookup:

var stopTimesByTripId = StopTimes.ToLookup(st => st.TripId);

Or to narrow it down by ListOfTripId first:

var tripIdSet = new HashSet<string>(ListOfTripId);
var stopTimesByTripId = StopTimes.Where(st => tripIdSet.Contains(st.TripId))
                                 .ToLookup(st => st.TripId);

In both cases you'll only need to iterate through StopTimes once.

people

See more on this question at Stackoverflow