A Taxonomy of Test Doubles

Many words have been written in the TDD community about the myriad ways of mocking in service of unit tests. After all this effort, there remains a great deal of confusion, ambiguity, in the understanding of many--maybe even most--developers who are using mocks.

No less than the likes of the eminently wise Martin Fowler has tackled the subject. Fowler's article is indispensible, and it in large part built the foundation of my own understanding of the topic. But it is quite long, and was originally written several years ago, when mocks were almost exclusively hand-rolled, or created with the record/replay idiom that was popular in mocking frameworks before lambdas and expressions were added to C# and VB.NET with Visual Studio 2008. Add to that the fact that the article was written in the context of a long-standing argument between two different philosophies of mocking.

Unfortunately these arguments continue on even today, as can be seen in the strongly-worded post that Karl Seguin wrote last week. Looking back now, with several more years of community experience and wisdom in unit testing and mocking behind us, we can bring a bit more perspective to the discussion than what was available at that time. But we won't throw away Fowler's post completely. Within his post, there are firm foundations we can build on, in the definitions of the different types of mocks that Fowler identified.

There are four primary types of test doubles. We'll start with the simplest, and move through in order of ascending complexity.

Dummies

A dummy is probably the most common type of test double. It is a "dumb" object that has no real behavior. Methods and setters may be called without any exception, but without any side-effect, and getters will return default values. Dummies are typically used as placeholders to fill an argument or property of a specific type that won't actually be used by the test subject during the test in question. While a "real" object wouldn't actually be used, an instance of a concrete type may have strings attached, such as dependencies of its own, that would make the test setup difficult or noisy.

Dummmies are most efficiently created using a mock framework. These frameworks will typically allow a mock to be created without actually configuring any of the members. Instead they will provide sensible defaults, should some innocuous behavior be necessary to satisfy the subject.

Stubs

A stub is a test double which serves up "indirect input" to the test subject. An indirect input is information that is not provided to an object by the caller of its methods or properties, but rather in response to a method call or property access by the subject itself, to one of its dependencies. An example of this would be the result of a factory creation method. Factories are a type of dependency that is quite commonly replaced by a stub. Their whole purpose is to serve up indirect input, toward the goal of avoiding having to provide the product directly when it may not be available at the time.

Stubs tend to be quite easy to set up even with more primitive mocking frameworks. Typically, all that is needed is to specify ahead of time the value that should be returned in response to a particular call. The usual simplicity of stubs should not be taken as false comfort that the doubles are not too complicated, however. Stubs can get quite complex if they need to yield a variety of different objects multiple calls. The setup for this kind of scenario can get messy quick, and that should be taken as a sign to move on to a more complex type of double.

Mocks

A mock is a type of test double that is designed to accept and verify "indirect output" from the subject class. An indirect output is a piece of information that is provided by the test subject to one of its dependencies, rather than as a return value to the caller. For example, a class that calls Console.WriteLine with a message for printing to the screen is providing an indirect output to that method.

The term "mock" for a particular type of test double is in a certain way unfortunate. In the beginning there was no differentiation. All doubles were mocks. And all the frameworks that facilitated easy double creation were called mocking frameworks. The reason that "mock" has stuck as a particular type of double is because in those beginning times, most test doubles tended to take a form close to what we today still call a "mock". Mocks were used primarily to specify an expectation of a particular series of method calls and property access.

These "behavioral mocks", or "classical mocks" as Fowler calls them, gave birth to the record/replay idiom for mock configuration that reached its peak in the days of RhinoMocks. And due to the tendency of inexperienced developers to create complicated object interactions and temporal coupling, mocks continue to be a very popular and common form of test double. Mocking frameworks make it far easier to unit test classes that rely on these types of coupling. This has led many to call for the abolishment of mocks and mocking frameworks in a general sense, claiming that they provide a crutch that makes it too easy to leave bad code in place. I'm sympathetic to the sentiment, but I think that this is throwing the baby out with the bathwater.

Fakes

Fakes are the most complicated style of test double. A fake is an object that acts simultaneously as both a stub and a mock, providing bidirectional interaction with the test subject. Often fakes are used to provide a substantial portion of the dependency's interface, or even all of it. This can be quite useful in the case of a database dependency, for example, or a disk storage service. Properly testing an object that makes use of storage or persistence mechanisms often requires testing a full cycle of behavior which includes both pushing to and pulling from the storage. An in-memory fake implementation is often a very effective way of avoiding relying on such stateful storage in your tests.

Given their usefulness, fakes are also probably the most misused type of test double. I say this because many people create fakes using a mocking framework, thinking they are creating simple mocks. Or worse, they knowingly implement a full-fledged fake using closures around the test's local variables. Unfortunately, due to the verbosity of mocking APIs in static languages, this can very easily become longer and more complex code than an explicit test-specific implementation of the interface/base class would be. Working with very noisy, complicated, and fragile test setup is dangerous, because it's too easy to lose track of what is going on and end up with false-passes. When your test's "arrange" step starts to overshadow the "act" and the "assert" steps, it's time to consider writing a "hand-rolled fake". Hand-rolled fakes not only remove brittle and probably redundant setup from your tests, but they also often can be very effectively reused throughout all the tests for a given class, or even multiple classes.

It's not Just Academic

These are the primary categories into which nearly all, if not all, test doubles can be grouped. Fowler did a great job of identifying the categories, but I think this crucial information is buried within a lot of context-setting and illustration that doesn't necessarily offer great value today. Mocking is ubiquitous among the subset of developers that are doing unit testing. But too many people go about unit testing in an ad hoc fashion, rather than deliberately with a plan and a system for making sense of things. I believe that a simple explanation of the major types and usages of test doubles, as I've tried to provide here, can aid greatly in bringing consistency and clarity of intent to developers' unit tests. At the very least, I hope it can instill some confidence that, with a little discipline, pattern and reason can be found in the often messy and overwhelming world of unit testing.

Retrospective on a Week of Test-First Development

Any programmer who is patient enough to listen has heard me evangelizing the virtues of Test-Driven Design. That is, designing your application, your classes, your interface, for testability. Designing for testability unsurprisingly yields code which can very easily have tests hung onto it. But going beyond that, it drives your code to a better overall design. Put simply, this is because testing places the very same demands on your code as does incremental change.

You likely already have an opinion on whether that is correct or not. In which case, I'm either preaching to the choir, or to a brick wall. I'll let you decide which echo chamber you'd rather be in, but if you don't mind hanging out in the pro-testability room for a while, then read on.

Last week I began a new job. I joined a software development lab that follows an agile process, and places an emphasis on testability and continuous improvement. The lead architect on our development team has encouraged everyone to develop ideally in a test-first manner, but I'm not sure how many have taken him up on that challenge. I've always wondered how well it actually works in practice, and honestly, I've always been a bit skeptical of the benefits. So I decided this big change of environment was the perfect opportunity to give it a shot.

After a week of test-first development, here are the most significant observations:
  1. Progress feels slower.
  2. My classes have turned out smaller, and there are more of them.
  3. My interfaces and public class surfaces are much simpler and more straightforward.
  4. My test have turned out shorter and simpler, and there are more of them.
  5. I spent a measurable amount of time debugging my tests, but a negligible amount of time debugging the subject classes.
  6. I've never been so confident before that everything works as it is supposed to.

Let's break these out and look at them in detail.

1. Progress feels slower.

This is the thing I worried most about. Writing tests has always been an exercise in patience, in the past. Writing a test after writing the subject means sitting there and trying to think about all the ways that what you just wrote could break, and then writing tests for all of them. Each tests include varying amounts of setup, and dependency mocking. And mocking can be tough, even when your classes are designed with isolation in mind.

The reality this week is that yes, from day to day, hour to hour, I am writing less application code. But I am re-writing code less. I am fixing code less. I am redesigning code less. While I'm writing less code, it feels like each line that I do write is more impactful and more resilient. This leads very well into...

2. & 3. My classes have turned out smaller, and there are more of them.
My interfaces and public class surfaces are much simpler and more straightforward.

The next-biggest worry I had was that in service of testability, my classes would become anemic or insipid. I thought there was a chance that my classes would end up so puny and of so little presence and substance that it would actually become an impediment to understandability and evolution.

This seems reasonable, right? Spread your functionality too thin and it might just evaporate like a puddle in dry heat. Sprinkle your functionality across too many classes and it will become impossible to find the functionality you want.

In fact the classes didn't lose their presence. Rather I would say that their identities came into sharp and unmistakable focus. The clarity and simplicity of their public members and interfaces made it virtually impossible to misuse them, or to mistake whether their innards do what they claim to. This enhances the value and impact of the code that consumes it. Furthermore it makes test coverage remarkably achievable, which is something I always struggled with when working test-after. On that note...

4. My tests have turned out simpler, and there are more of them.

The simple surface areas and limited responsibilities of each class significantly impacted the nature of the tests that I am writing, compared to my test-after work. Whereas I used to spend many-fold more time "arranging" than "acting" and "asserting", the proportion of effort this step takes has dropped dramatically. Setting up and injecting mocks is still a non-trivial part of the job. But now this tends to require a lot less fiddling with arguments and callbacks. Of course an extra benefit of this is that the test are more readable, which means their intent is more readily apparent. And that is a crucial aspect of effective testing.

5. I spent a measurable amount of time debugging my tests, but a negligible amount of time debugging the subject classes

There's not too much to say here. It's pretty straightforward. The total amount of time I spent in the debugger and doing manual testing was greatly reduced. Most of my debugging was of the arrangement portions of tests. And most of that ended up being due to my own confusion about bits of the mocking API.

6. I've never been so confident before that everything works as it is supposed to.

This cannot be overstated. I've always been fairly confident in my ability to solve problems. But I've always had terrible anxiety when it came to backing up correctness in the face of bugs. I tend to be a big-picture thinker when it comes to development. I outline general structure, but before ironing out all the details of a given portion of the code, I'll move on to the interesting work of outlining other general structure.

Test-first development doesn't let me get away with putting off the details until there's nothing "fun" left. If I'm allowed to do that then by the time I come back to them I've usually forgotten what the details need to be. This has historically been a pretty big source of bugs for me. Far from the only source, but a significant one. Test-driven design keeps my whims in check, by ensuring that the details are right before moving on.

An Unexpected Development

The upshot of all this is that despite the fact that some of the things I feared ended up being partially true, the net impact was actually the opposite of what I was afraid it would be. In my first week of test-first development, my code has made a shift toward simpler, more modular, more replaceable, and more provably correct code. And I see no reason why these results shouldn't be repeatable, with some diligence and a bit of forethought in applying the philosophy to what might seem an incompatible problem space.

The most significant observation I made is that working like this feels different from the other work processes I've followed. It feels more deliberate, more pragmatic. It feels more like craft and less like hacking. It feels more like engineering. Software development will always have a strong art component. Most applied sciences do, whether people like to admit it or not. But this is the first time I've really felt like what I was doing went beyong just art plus experience plus discipline. This week, I feel like I moved closer toward that golden ideal of Software Engineering.

Don't Give Up Assembly Privacy For Sake of Unit Testing

Just a little PSA to other .NET unit testing newbs out there like me. This info is available various places on the internet, but you'll be lucky to find it without just the right search terms. So hopefully adding another blog post to the mix will make it easier to stumble on.


Unit testing frameworks need to instantiate your types, in order to run unit tests. What this means is that they need to be able to see them. The easiest way to do this is to make your classes public and/or place the tests right inside your project.


Placing the tests right in your project means that you'll very likely have to distribute the unit test framework assemblies along with your product. This might give more information to potential hackers than you would like. And making your classes public brings with it the often undesirable side-effect of opening up essentially all the types in your assembly to be used by anyone who knows where the assembly file is, in essentially any way they like.


But you don't have to make these concessions. You can move the tests out of your assembly, to keep them out of the deployment package, and still keep your classes internal (though not private). The .NET framework allows an assembly to declare "friend" assemblies that are allowed to see its internal classes. (Yes, very similar to the old C++ friend keyword). This is accomplished by adding an assembly attribute called InternalsVisibleTo to your AssemblyInfo.cs file.


If your unit test project does not have a strong name, it's as simple as referencing its assembly name:


[assembly: InternalsVisibleTo("MyCoolApp.UnitTests")]

However, I strongly recommend giving your unit test assembly a strong name. A strong name is a name involving a public-private key pair, and which is used by the .NET framework along with some hashing and encryption technology to prevent other people from creating assemblies that can masquerade as your own. Furthermore, if you give your app itself a strong name (which you should if you plan to distribute it), any libraries it references will need strong names, including the ones it just allows to see its internals.


So, if you decide to give your unit test project a strong name, you'll need the public key (not just the token) as well:


[assembly: InternalsVisibleTo("MyCoolApp.UnitTests, PublicKey={Replace this, including curly braces, with the public key}")]



(If you need to learn about strong names, and/or how to extract the public key from your assembly, this is a good place to start: http://msdn.microsoft.com/en-us/library/wd40t7ad.aspx.)


Once you've done this, you should be able to compile and run your unit tests from a separate project or even solution, and still keep the classes you're testing from being "public".


This is all well and good, but if you're working with mocks at all, you probably have another problem on your hands. The most popular .NET mock frameworks (e.g. RhinoMocks, Moq, and NMock) use the Castle Project's DynamicProxy library to create proxies for your types on the fly at runtime. Unfortunately, this means that the Castle DynamicProxy library ALSO needs to be able to reference your internal types. So you might end up with an error message like this:


'DynamicProxyGenAssembly2, Version=0.0.0.0, Culture=neutral, PublicKeyToken=a621a9e7e5c32e69' is attempting to implement an inaccessible interface.

Complicating this fact is that the Castle DynamicProxy library places the proxies it generates into a temporary assembly, which you can't just run the strong name tool against, because the temporary assembly doesn't exist as a stand-alone file. Fortunately, there are programmatic ways of extracting this information, and the work has been done for us. The public key for this assembly, at the time of this writing, has been made available, here and here. You might find some code at those links that could help you extract the public key from any future releases of Castle as well.


The important information is basically that, as of today, you need to add this to your AssemblyInfo.cs file, without line breaks:


[assembly: InternalsVisibleTo("DynamicProxyGenAssembly2, PublicKey=002400000480000094000000060200000024000052534131
0004000001000100c547cac37abd99c8db225ef2f6c8a360
2f3b3606cc9891605d02baa56104f4cfc0734aa39b93bf78
52f7d9266654753cc297e7d2edfe0bac1cdcf9f717241550
e0a7b191195b7667bb4f64bcb8e2121380fd1d9d46ad2d92
d2d15605093924cceaf74c4861eff62abf69b9291ed0a340
e113be11e6a7d3113e92484cf7045cc7"
)]

Caution: The name and public key of this temporary assembly was different in earlier versions, and could change again in later versions, but at least now you know a little more about what to look for, should it change.


So remember: You don't have to open wide your assembly to just anyone who wants to reference your types, just for sake of unit testing. It takes a bit of work, but you don't need to compromise.