My name
is
Jon Skeet

Unit test Distinct with keyselector and comparer

How would I go about unit testing this piece of LINQ code?

public static IEnumerable<T> Distinct<T, TKey>(this IEnumerable<T> items, Func<T, TKey> keySelector, IEqualityComparer<TKey> comparer)
{
    return items.Distinct(new KeyEqualityComparer<T, TKey>(keySelector, comparer.Equals));
}

I do not know how to use the comparer at the end, what is the correct usage of this?

Edit: Sorry about being unclear;

What I would like to know:

What this code does.
How this code is used.
How I can unittest this piece of code.

KeyEqualityComparer:

class KeyEqualityComparer<T, TResult>: IEqualityComparer<T>
{
    private readonly Func<T, TResult> _KeySelector;
    private readonly Func<TResult, TResult, bool> _Predicate;

    public KeyEqualityComparer(Func<T, TResult> keySelector, Func<TResult, TResult, bool> predicate)
    {
        if (keySelector == null)
            throw new ArgumentNullException("keySelector");
        _KeySelector = keySelector;
        _Predicate = predicate ?? System.Collections.Generic.EqualityComparer<TResult>.Default.Equals;
    }

    public bool Equals(T x, T y)
    {
        return _Predicate(_KeySelector(x), _KeySelector(y));
    }

    public int GetHashCode(T obj)
    {
        // Always return the same value to force the call to IEqualityComparer<T>.Equals
        return 0;
    }
}

This code is trying to do something similar to MoreLINQ's DistinctBy method, but not as well.

Fundamentally, the idea is that given a collection of items, you want to find a distinct set of items, but specifically testing for distinctness by comparing some notion of a key. As an example, you might have a Person type like this:

// TODO: Use C# 6 primary constructor and read-only autoprops :)
public class Person
{
    public string Name { get; set; }
    public string Hobby { get; set; }
    public string Profession { get; set; }
}

with data like this:

var people = new List<Person>
{
    new Person { Name="Tom", Hobby="Minecraft", Profession = "Student" },
    new Person { Name="Robin", Hobby="Taunting", Profession = "Outlaw" },
    new Person { Name="Robin", Hobby="Angry Birds", Profession = "Student" },
};

Now we could get a distinct set of people by name (in which case we'd get Tom and one of the Robins) or by profession (in which case we'd get the outlaw and one of the students). We can addtionally specify an equality comparer to use when comparing keys - so that "TOM" and "Tom" might be considered equal, for example.

The Distinct method in LINQ already allows you to specify a custom equality comparer, so your Distinct method (which I wouldn't overload, by the way) just projects items to their keys and uses the given key equality comparer to compare those keys.

Unfortunately, the implementation given in the question is bad: an IEqualityComparer<T> has to provide two methods: one to compare two items for equality (which is being done correctly) and one to get a hash code (which is being done very badly - it's "valid" but horrendously inefficient). Basically this changes an O(N) algorithm into an O(N^2) algorithm by performing far more comparisons than are really needed.

In terms of unit tests, you could use pretty much the examples I've given above:

Get a distinct set by name
Get a distinct set by profession
Get a distinct set by name with a case-insensitive equality comparer

You would want to find out requirements in terms of which item out of multiple equivalent ones should be returned - LINQ to Objects always returns the first it comes across, for example (although I don't think that's documented).

Or you could look at my DistinctByTest for the MoreLINQ tests :)

See more on this question at Stackoverflow