Captain Pav on C#: Variance

I want to discuss the importance of variance in the context of generic types arguments.

Starting in C# 4.0, we have two new keywords when declaring generic types: in and out.

in is used to indicate that a type parameter is contravariant, or that the type is only accepted as "in put". An example of a contractvariant interface in the framework is the IEqualityComparer<in T>. The equality comparer is able to accept two instances of T as input to the compare method. There is no way to return a T from the equality comparer.

out is used to indicate that a type parameter is covariant, or that the type is only returned as "output". An example of a covariant interface in the framework is the IEnumerable<out T>. The enumerable is only able to return instances of T; there is no way to pass T to any method of the enumerable.

Knowing that a type parameter is covariant or contravariant is extremely useful for abstraction. If a parameter is neither covariant or contravariant, it is known as invariant.

When we know that the type will only be accepted to as input, the method can accept any further derived variant of the type knowing it can reliably interact with that type as those it were the less derived type.

Consider the case of an equality comparer and a class structure as follows:

public class LivingThing
{
public LivingThing(string kingdom){ this.Kingdom = kingdom; }
public string Kingdom {get; private set;}
}

public class Mammal : LivingThing
{
public Mammal(string family) : base("Animalia"){ this.Family = family; }
public string Family {get;set;}
}

public class Dog : Mammal
{
public Dog(string breed) : base("Canidae"){ this.Breed = breed; }
public string Breed {get;set;}
}

public class Cat : Mammal
{
public Cat(string breed) : base("Felidae"){ this.Breed = breed; }
public string Breed {get;set;}
}

public class LivingThingComparer : IEqualityComparer<LivingThing>
{
public bool Equals(LivingThing x, LivingThing y){
if(Object.ReferenceEquals(x, y)) return true;
if(null == x || null == y) return false;
return StringComparer.OrdinalIgnoreCase.Equals(x.Kingdom, y.Kingdom);
}
}

var someAnimal = new LivingThing("Animalia");
var someFungus = new LivingThing("Fungus");

new AnimalComparer().Equals(someAnimal, someFungus); // returns false

Comparing two living things is based on the kingdom to which the thing belongs. Fungus and animals are not equal. However, this is not very useful; often we want to compare more derived things than some animal and some fungus.

var dog = new Dog("Pug");

new AnimalComparer().Equals(someAnimal, dog); // returns true

Both the some animal and the dog belong to the Animalia Kingdom and therefore are equals as far as LivingThings go.

var cat = new Cat("Persian");

new AnimalComparer().Equals(cat, dog); // returns true

So, while this may be strange to consider a Persian Cat and Pug Dog as the same, but in terms of living things at a Kingdom level, they are the same and we can count on the Kingdom property being available on any derive LivingThing.

public class MammalComparer : IEqualityComparer<Mammal>
{
public bool Equals(Mammal x, Mammal y){
if(Object.ReferenceEquals(x, y)) return true;
if(null == x || null == y) return false;
return StringComparer.OrdinalIgnoreCase.Equals(x.Family, y.Family);// no need to check kingdom
}
}

new MammalComparer().Equals(cat, dog);// returns false
new MammalComparer().Equals(someAnimal, dog); // does not compile

someAnimal is a LivingThings which is less derived than a Mammal so the comparer has no idea how to compare them. The Family property by which the comparer decides equality does not exist on LivingThing.

OK. Now that we understand contravariance, let's look at covariance using the enumerable case and same class structure as above.

IEnumerable<Mammal> collectionOfMammals = new List<Cat>(){ new Cat("Persian"), new Cat("Siamese") };
collectionOfMammals = new Dog[]{ new Dog("Pug"), new Dog("Bulldog") };

The same collection of mammals can be set to a derived collection of a derived type of mammal. It is safe to allow this assignment because the client will only be able to gain access to the objects in the collection as Mammals, which both Cat and Dog are Mammals. We can even mix the bag.

collectionOfMammals = new List<Cat>(){ new Cat("Persian"), new Cat("Siamese") }.Cast<Mammal>().Concat(new Dog[]{ new Dog("Pug"), new Dog("Bulldog") });

Because the client only has access to the Mammal abstraction of the object, it is safe to mix the bag. Every property the client could use would be available to both types. The danger would be when trying to add to the underlying collection when the actual type is different.

foreach(var mammal in collectionOfMammals){
Console.WriteLine(mammal.Breed);// doesn't compile; while both cat and dog have this property, it is not visible behind the mammal abstraction
Console.WriteLine(mammal.Family); // all mammals will have this property so safe to use
}

collectionOfMammals.Add(new Dog("Poodle")); // doesn't compile; the List methods that would make the enumerable invariant is hidden behind the enumerable abstraction

Assume that the enumerable interface had the ability to add items and was still marked as covariant.

IEnumerable<Mammal> mammals = new List<Dog>() { new Dog("Pug") };
mammals.Add(new Cat("Persian"));

At first it may seem like what's the harm; we've declared the variable as enumerble of mammals and we're adding a mammal. However, the actual type is List<Dog> and you cannot add a Cat to this collection type.

OK. Now we know what co- and contra- variant type parameters are, but so what? Well, let me tell you... Before these keywords were available, life was a hellish nightmare when using immutable generics that had nested generic parameters.

Consider a simple example that often drives me crazy:

IDictionary<string, IEnumerable<string>> dict = new Dictionary<string, IList<string>>();// doesn't compile
// IDictionary<TKey, TValue>, TValue is not covariant do to the Add(TKey, TValue) method so you cannot use a further derived type as the TValue (IList<String> : IEnumerable<String>)

This at least makes sense, I could use the Add method and add the wrong type of collection to the actual type (Dictionary<string, IList<string>>). Sometimes I'd rather the framework allow this and throw a runtime exception if the client tries to call Add with an invalid collection type.

Task<T> is probably the most frustrating example of not using covariance. If Task<T> was implementing ITask<out T>, life would be much better. There are so many cases where I'd like to do the following:

Task<IEnumerable<string>> StringTask(){
return Task.FromResult(new List<string>(){ "A", "B", "C" });
} // doesn't compile, Task<List<String>> cannot be assigned to Task<IEnumerable<String>>

Obviously I wouldn't want to just do from result, but imagine I'm awaiting something that returns a list and I want to expose it as IEnumerable. I have to actually downcast the list to IEnumerable<string> before returning. It gets waaay worse with nested generic types and you have to re-create the entire nested structure and downcast to appropriate abstraction.

Well, that's about all I have to say about variance for now. I'm sure I'll be touching this subject again in a future post. Let me know what you think!

Captain Pav on C#

Friday, October 3, 2014

Variance

No comments:

Post a Comment