check if collection of objects where each contain another collection of objects contain all values of List<string> via LINQ

I have a collection of objects where each object contains another collection of objects. I need to find out the fastest way to check if it contains all values of List<string>.

Here is an example:

class Video {        
  List<Tag> Tags; 
}

class Tag{
  public Tag (string name){
    Name = name;
  }

  string Name;
}

List<string> selectedTags = new List<string>();
selectedTags.Add("Foo");
selectedTags.Add("Moo");
selectedTags.Add("Boo");

List<Video> videos = new List<Video>();

// Case A
Video videoA = new Video();
videoA.Tags = new List<Tag>();
videoA.Tags.Add(new Tag("Foo"));
videoA.Tags.Add(new Tag("Moo"));
videos.Add(videoA);  

videoA should not be selected by LINQ because it doesn't contain all tags.

// Case B
Video videoB = new Video();
videoB.Tags = new List<Tag>();
videoB.Tags.Add(new Tag("Foo"));
videoB.Tags.Add(new Tag("Moo"));
videoB.Tags.Add(new Tag("Boo"));
videos.Add(videoB);  

videoB should be selected by LINQ because it contains all tags.

I tried this with foreach loops, but it's too slow so I'm looking for a LINQ solution.

foreach (Video video in videos) {
  if (video.Tags.Count() > 0) {
    bool containAllTags = true;
    foreach (string tagToFind in selectedTags) {
      bool tagFound = false;
      foreach (Tag tagItem in video.Tags) {
        if (tagToFind == tagItem.Name)
          tagFound = true;
      }
      if (!tagFound)
        containAllTags = false;
    }
    if (containAllTags)
      result.Add(videoItem);
  }
}

The resulting LINQ should look like this:

IEnumerable<Video> = from vid in videos
                     where vid.Tags.( ..I dont know.. )
                     select vid;

I tried several ways with .Any, .All, etc.. but I can't find the solution and I can't use .Intersect because one is a List of strings and the other is a List of objects. Note that in the production version, Video and Tag elements have many more properties.

Jon Skeet
people
quotationmark

With your current code, you logically want:

IEnumerable<Video> result = from vid in videos
                            where selectedTags.All(tag =>
                                     vid.Tags.Any(t => t.Name == tag))
                            select vid;

Or equivalently:

var result = videos.Where(vid => selectedTags.All(tag => 
                                      vid.Tags.Any(t => t.Name == tag)));

This is assuming you've made Tag.Name and Video.Tags public, of course - ideally as properties rather than as fields.

Note how we're calling All against selectedTags, as (assuming I've read your requirements correctly) it's important that all the selected tags are present in the video - it's not important that all the video's tags are selected.

Now that could be relatively slow, if you have a lot of tags to check and a lot of tags per video.

However, knowing how to optimize it will really depend on some other choices:

  • If the order of the tags isn't important, could you change Video.Tags to be a set instead of a list?
  • Are you always looking through the same set of videos, so you could perform some pre-processing?
  • Is the total number of tags available large? What about the number of tags per video? What about the number of selected tags?

Alternatively, you can project each video to its "list of tags" and check whether there are any in the selected set which aren't in the video's set:

var result = videos.Where(vid => !selectedTags.Except(vid.Tags.Select(t => t.Name))
                                              .Any());

people

See more on this question at Stackoverflow