Saturday, October 11, 2014

Intro to Tasks

Today I'd like to look at a very popular topic in the .net community:  Tasks.  Most people like to focus on the async and await keywords, but I find myself typically using Tasks directly for anything that is not trivial.
I struggled way more than I should have when I first investigated the Task<T> return type in conjunction with the async and await keywords.  I did not understand what the async keyword really meant and that the await keyword applied to the Task type and not the async method signature.  So here's what it took for me to understand tasks.

The async keyword is applied to a method that, in the method's body, uses the await keyword.  Here's an example:

public async Task<Client> GetCustomerByName(string name){
    if(String.IsNullOrWhitespace(name)){
        throw new ArgumentException();
    }
 
    Client client = await new ClientRepository.Fetch(new FindClientByNameCriteria(name));
    return client.IsActive ? client : null;
}

There are two observations worth discussing here.  Firstly, you'll also notice that the client object is unwrapped from the Fetch method which returns Task<Client> because of the await keyword (it unwraps the result for you).  Secondly, the return type of the method is Task<Client>, but in the method body, we return a Client object, but after the await keyword is used.  This will wrap the return in a Task.  If there were a return before the await keyword is used, then you would need to return a Task<Client> object instead.  Let's look at that now.

public async Task<Client> GetCustomerByName(string name){
    if(String.IsNullOrWhitespace(name)){
        throw new ArgumentException();
    }

    // you would never do this, but for the sake of an example....
    if(StringComparer.OrdinalIgnoreCase.Equals(name, AdminUserName)){
        return Task.FromResult(AdminUser);
    }
 
    Client client = await new ClientRepository.Fetch(new FindClientByNameCriteria(name));
    return client.IsActive ? client : null;
}

The thought behind async Task methods is that all the code until the first await is found is executed synchronously, typically to ensure that the call has a chance to succeed or any pre-processing is completed before the rest of the method is executed asynchronously.

Let's rewrite this without using async or await.

public Task<Client> GetCustomerByName(string name){
    if(String.IsNullOrWhitespace(name)){
        throw new ArgumentException();
    }
 
    return new ClientRepository.Fetch(new FindClientByNameCriteria(name))
                                           .ContinueWith(x => x.Result.IsActive ? x.Result : null);
}

First, notice that we do not use the async keyword on this method because we are not using the await keyword in the method body.

Abstractly, you can think of the await keyword as a more terse way of specifying the continuation.  Everything after an await will effectively go into a ContinueWith.  If there are mulitple awaits, there would be multiple ContinueWiths.
Therefore, whenever you use the await keyword, the methd will return a Task since ContinueWith returns a Task.  Since everything in the method following the await is placed into a Continuation, there is no way to return anything else.

It may be strange, at first, to return the object directly in an async method, but consider the Continuation approach.  Inside the continuation, we just return whatever object we want, and it's wrapped in a Task.  This is the same, since your code is wrapped in a continuation, you just return the object you want and it will be wrapped in a Task.

OK, so now that we've covered the basics, let's look at a few special cases or best practices when using Tasks.

Methods with void return type

You may be wondering how to convert a method that has a void return type into an async method since the return type must be a Task.  Luckily, there is a non-generic Task that comes to the rescue.

Microsoft recommends that async methods return Task instead of void, with the exception of Event Handlers (ie, callbacks).  I assume the reasoning here is that the method that is 'calling back' does not need to know when the call back has completed execution and callbacks returning a value is a very strange paradigm.

Async event handlers also come with very non-intuitive exception handling.  Typically, exceptions in a Task-returning method will be wrapped in the exception and returned.  In an async event handler, the exception is thrown in the original SynchronizationContext so you cannot easily catch an exception raised by an async event handler.

Aside from event handlers, you should convert void methods to returning the non-generic Task type.  This allows the caller to be signaled when the method has finished executing, investigate the task for error, and specify continuation functions on success and failure.

Async All the Way

Let me start by clearly stating that this is by far the most problem laden aspect of asynchronous programming:  Context.  We don't often have to worry much about context in synchronous programming; everything is in the same context and only one thing is executing at a time.  When we move into asynchronous programming, everything comes down to context.  Even more annoying, code that works just fine in a console application will fail with dead locks in GUI and web applications.

The easiest way to avoid this problem is to be asynchronous everywhere.  Deadlock context issues tend to arise when using the Wait() method or Result property of a task.  It can be the case that the thread that calls Wait() is also the thread that has to process the task that Wait() was called on.  The thread will block waiting on itself to finish the task and you are deadlocked!  The thread cannot complete the Task's work because it is blocked waiting for that work to be completed.  Going async all the way avoids calls to Result and Wait.

Note:  Using Result in a continuation is perfectly acceptable as the task has finished and the result is guaranteed to be available.

You'll find that as you are converting your code to be asynchronous, it is naturally 'contagious' and code calling asynchronous methods tends to want to be asynchronous also (typically via continuation).

One of the nicer aspects of this approach is that the continuation is nearby the code that it is continuing.  Often, the continuation applies only to that block of code and is not re-used anywhere.  It is frustrating to have to write a method for the callback and specify it to execute after the called code is finished since they no longer feel related and there is no way to tell that the method only applies to that continuation.

Consider the difference:

public Task<CustomerDto[]> FetchCustomers(Criteria<Customer> criteria) {
    new CustomerRespository.Fetch(criteria)
                                             .ContinueWith(x => x.Result.Select(y => new CustomerDto(y))
                                                                                          .ToArray());
}

vs

private static CustomerDto[] ConvertCustomers(Task<Customer[]> customers){
     return customers.Select(y => new CustomerDto(y)).ToArray();
}
public Task<CustomerDto[]> FetchCustomers(Criteria<Customer> criteria){
    new CustomerRespository.Fetch(criteria)
                                             .ContinueWith(ConvertCustomers);
}

vs

private static CustomerDto[] ConvertCustomers(IEnumerable<Customer> customers){
     return customers.Select(y => new CustomerDto(y)).ToArray();
}

public async Task<CustomerDto[]> FetchCustomers(Criteria<Customer> criteria){
    var customers = await new CustomerRespository.Fetch(criteria);
    return ConvertCustomers(customers);
}

I tend to use the first approach as it contains all the logic for that method.

Awaiter Configuration

This is another annoying implementation details.  Tasks have a method, ConfigureAwaiter(bool), that specifies whether the code in the continuation needs to execute on the same thread as the calling code.  It drives me crazy that the default is not false, especially when it's best practice to set this to false.  I would think that code that required same context would need to specify it since that is the minority of cases.  This is another reason that I favor continuations directly over using the async keyword.

If you are developing winform applications, you may want the context that called the code to also handle processing the code.  This is true especially when manipulating any GUI elements or bound properties.  In web applications, using the HttpContext is a good reason to require same context.  One way around that is to capture any variables in the context locally before awaiting.

Context free code is typically easier to manage and maintain.  A strategy for converting context sensitive code into more manageable asynchronous code is to put the core logic of the method into an async method and call that from the context sensitive method, requiring same context.


I think that about sums up what I have to say for today about Tasks.  I know I'll be revisiting this topic in the future with some better examples, but for now... Enjoy!

No comments:

Post a Comment