Summing duplicate values while reading in data

I am reading in 5000 rows of data from a stream as follows from top to bottom and store it in a new CSV file.

ProductCode |Name   | Type  | Price
ABC | Shoe  | Trainers  | 3.99
ABC | Shoe  | Trainers  | 4.99
ABC | Shoe  | Trainers  | 5.99 
ABC | Shoe  | Heels | 3.99
ABC | Shoe  | Heels | 4.99
ABC | Shoe  | Heels | 5.99
...

Instead of having duplicate entries, I want the CSV to have one row but with the Price summed:

ProductCode |Name   | Type  | Price
ABC | Shoe  | Trainers  | 14.97
ABC | Shoe  | Heels | 14.97

I store each row as a Product:

public class Product
    {
        public string ProductCode { get; set; }
        public string Name { get; set; }
        public string Type { get; set; }
        public string Price { get; set; }
    }

After reading the data from the stream I end up with an IEnumerable<Product>.

My code is then:

string fileName = Path.Combine(directory, string.Format("{0}.csv", name));            
var results = Parse(stream).ToList(); //Parse returns IEnumerable<Product>
if (results.Any())
            {
                using (var streamWriter = File.CreateText(fileName))
                {
                    //writes the header line out
                    streamWriter.WriteLine("{0},{1}", header, name);

                    results.ForEach(p => { streamWriter.WriteLine(_parser.ConvertToOutputFormat(p)); });
                    streamWriter.Flush();
                    streamWriter.Close();
                }

                Optional<string> newFileName = Optional.Of(SharpZipWrapper.ZipFile(fileName, RepositoryDirectory));
                //cleanup
                File.Delete(fileName);
                return newFileName;
            }

I don't want to go through the 5000 rows again to remove the duplicates but would like to check if the entry already exists before I add it to the csv file.

What is the most efficient way to do this?

Jon Skeet
people
quotationmark

That sounds like you just need an appropriate LINQ transformations:

results = results
    .GroupBy(p => p.ProductCode)
    .Select(g => new Product {
        ProductCode = g.Key,
        Name = g.First().Name,
        Type = g.First().Type,
        Price = g.Sum(p => p.Price)
    })
    .ToList();

Or if your ProductCode isn't a unique ID for some odd reason:

results = results
    .GroupBy(p => new { p.ProductCode, p.Name, p.Type })
    .Select(g => new Product {
        ProductCode = g.Key.ProductCode,
        Name = g.Key.Name,
        Type = g.Key.Type,
        Price = g.Sum(p => p.Price)
    })
    .ToList();

This is assuming you've already changed your Product type to have a decimal type for the Price property, however. Prices aren't text, so shouldn't be stored as strings.

people

See more on this question at Stackoverflow