Friday, March 27, 2015

Tracing is EVIL: Part 1

So, for the last year or so I've been looking into application logging and event tracing.  I looked into ETW after listening to a .net rocks episode a long time ago, but quickly stopped when I realized that starting and stopping the tracing required elevated privileges on the server.  I haven't worked at many places where that would even be up for disucssion...  So I eventually came across System.Diagnostics.TraceSource.  This seemed to be a good fit as I could turn it on and off via configuration file and I could add custom listeners to defer the 'write' to be handled by some external handler(s).

The application under test is built around interfaces and base classes implementing the tracing where constructed.

Let's bring in an example...

/// interface for loading stuff
public interface ILoadStuff {
    /// load the stuff with the specified id
    Stuff LoadStuff(int id);
}

public class TracingStuffLoaderWrapper : ILoadStuff {
    protected readonly ILoadStuff wrapped;
    protected readonly TraceSource trace;
    public TracingStuffLoader() : this(new StuffLoaderProxy()/*WCF/service proxy*/, new TraceSource("NS.TracingStuffLoaderWrapper", Information)){}

    public TracingStuffLoader(ILoadStuff wrapped, TraceSource trace){
        // arg null check
        this.wrapped = wrapped;
        this.trace = trace;
    }

    public Stuff LoadStuff(int id){
       trace.TraceInformation("Request to load stuff with id `{0}`.", id);
       var stuff = wrapped.LoadStuff(id);
       trace.TraceInformation("Stuff fetched successfully");
       return stuff;
    }
}

or

public abstract TracingStuffLoaderSql : ILoadStuff {
    protected readonly TraceSource trace;
    protected TracingStuffLoaderSql() : this(new TraceSource("NS.StuffLoaderSql")){}
    protected TracingStuffLoaderSql(TraceSource source){
        this.trace = source;
    }

    protected abstract SqlCommand GenerateLoadCommand(int id, SqlCommand cmd);
    protected abstract Stuff DecodeStuffFromDataSet(DataSet ds);
    public Stuff LoadStuff(int id){
        trace.TraceInformation("Request to load stuff with id `{0}`.", id);
        using(SqlConnection conn = GetOpenConnection()){
            trace.TraceInformation("Generating load command.");
            using(SqlCommand cmd = GenerateLoadCommand(id, conn)){
                trace.TraceInformation("Fetching data from database.");

                DataSet ds = new DataSet();
                new SqlDataAdapter().Fill(ds);
                trace.TraceInformation("Dataset fetched with {0} tables and {1} rows in the first table.", ds.Tables.Count(), ds.Tables.First().Rows.Count());

                Stuff decoded = DecodeStuffFromDataSet(ds);
                trace.TraceInformation("Decoding succeeded: {0}", decoded);
                return decoded;
            }
        }
    }
}


Using this pattern, the StuffLoader can work against any Sql database, just implement the SqlCommand method in the derived class.  The subclass would hopefully do the right thing and use the trace source to write messages, but even without writing to it, this gives a very good idea as to where a failure may have occurred..

If the trace prints "Generating load command." and does not print "Fetching data from database." the failure is inside the GenerateLoadCommand subclass implementation.  Otherwise, my base class is broken and I should have a good idea where.

This approach was working fine as there were only a few traces happening here and there.  After introducing tracing into just about every operation in the system I've uncovered a drastic flaw in this approach!!!

This SO post shares exactly my feels.  I could not believe that the tracing was synchronous and that the default trace listener was so horribly performing.  The app actually started hanging and crashing from unresponsive tracing.  I insisted that it could not be the tracing as it *must* be asynchronous....

When profiling the application locally, except for start up, the app has very little resource use (<5% CPU, < 100MB RAM) across 3 WCF services and a website.  With as few as 10 users, the site went from blazing fast to painfully slow.

Lesson learned.... DO NOT USE THE DEFAULT TRACE.  However, no matter what I tried, I could not get the default trace to stop outputting.  I had to disable tracing completely and enable it only after an error is observed; try to reproduce to capture (while horribly impacting the running application for all other users), and disable.  This really sux.

Anyway, so I've known about ETW for a while, but haven't really tried to use it.  I feel like ETW is the best choice for what I'm striving for.  In fact, my misunderstanding of how TraceSource works is exactly how EventSource works, so lucky me :)

So, how do you convert between old TraceSource (.net 2.0 feature) stuff and implement the new EventSource stuff (.net 4.5/.1 feature), you ask?

Stay tuned for Part 2!









No comments:

Post a Comment