When I was first learning about classes and object-oriented design in college, classes were explained to us cutely as "data plus behavior". This is oversimplified and inadequate in the same way as would be an explanation of the
Cartesian Plane as "a horizontal axis plus a vertical axis". This neat package doesn't begin to encompass all that the Cartesian Plane is, especially compared with its constituent parts.
Through my own work and experience, I've found it useful to break classes down into four different categorizations that better express the continuum of data/behavior relationships. Each category serves a different class of needs, and each interacts with other classes and categories in consistent ways. The four categories, as I have identified them, are data, algorithms, stateful algorithms, and models. Let's take a look at each of these in turn. We'll go over the traits that qualify each, the roles they serve, and their typical relationships with other classes.
Data
Data classes are quite straightforward. They are "pure" data, with little to no behavior attached. Your typical "data transfer objects", primitives, structs, event argument classes, etc. fall into this category. There are few, if any, methods defined on this type of class. When a data class does have methods, they will typically be related to type conversion or projection, simple derived value calculation, or state change validation. However, methods offering complex mutability or interactions with other objects would push a class out of this category and toward the "model" category.
Classes in this category are most often composed completely of other data classes. When I was initially considering other qualifications, I thought immutability would a be a big one, but I've found that's not necessarily true. Rather, it's dependency on services (especially stateful algorithms) that almost certainly indicates that a class is a model rather than a data class. Data classes very rarely have more than a passing dependency on external behavior, though they may have multiple dependencies on external state, made primarily of other data classes. This makes sense, since the most common thing to do with algorithms, stateful or otherwise, is to compose them into more complex behavior.
Data classes are the quanta of information in your application. They will be passed around and between other classes, created and processed by them in various ways. They exist first and foremost to be handled. They are rarely seen as dependencies except indirectly via configuration layers, but rather show up most commonly as state, or as operational inputs.
Algorithms
An algorithm is a class that is essentially
functional, at least in spirit. A strong indicator that you have an algorithm on your hands is when a development tool such as Resharper suggests that methods, or the class itself, be made static. This generally indicates a near-complete lack of state.
One common application of algorithm classes is as a replacement for what would be a method library in a language which allows loose methods. For example a string library, or a numeric processing library. In fact, data processing in general is one big problem space that lends itself to algorithm classes. Also, many of the methods seen on primitives in languages such as C# could very easily be extracted into algorithm classes.
Algorithms rarely have public properties and almost never have public setters. In general algorithm classes have the potential to be expressed even without private properties, i.e. as truly functional code. Usually this means handling data that would otherwise be stored in properties as arguments instead. But often the realities of memory and processing power limitations mean that this isn't actually possible. So complex algorithm classes do often have some private state.
Depending on the amount of internal state required for performant execution of the algorithm, such classes may be expressed as either static/singleton, or by instance. For algorithms requiring no internal state, I prefer a container-enforced singleton pattern (as described in my post on
lifecycle control via IoC containers). Static classes encourage tight coupling, and are very difficult to sub out for testing, even with a powerful isolation tool such as
Microsoft's Moles,
Telerik's JustMock, or
TypeMock Isolator. As such, I avoid statics like the plague.
Instanced algorithm classes are usually lightweight and treated as throw-away objects, repeatedly created and disposed. Some may also choose to implement them with a reset/clear mechanism to allow reuse of the same object for different inputs. However, given the memory and garbage collection power currently available, I view this as an optimization, and avoid it unless it provides clear, needed performance benefits. Otherwise, it simply adds complexity to the implementation and muddies the interface with methods that don't directly relate to its purpose.
Algorithm classes often have dependencies on other algorithms. Data dependencies, however, are usually just another form of argument for the algorithm. Something that affects the output of the work from one instance to another, but is likely constant within one instance. Usually this is data that simply isn't going to be available at the call site where the algorithm will be triggered. Environmental state is one example. Mode switches are another. Anything more complicated or mutable than this is an indicator that you might in reality have a stateful algorithm on your hands.
Stateful Algorithms
The simple explanation of a stateful algorithm is that it is a bundle of behavior with a limited reliance on internal or external state. It differs from a pure algorithm in that the internal state isn't necessarily directly related to the processing itself. The most common application of a stateful algorithm is a class whose primary purpose is to expose the behavior of a model, while internally managing and obscuring the state complexities of that model. The goal of this being to encapsulate in a service layer all the concerns that the other layers need not worry about explicitly.
IO services are common instances of these types of classes. Some examples include file streams, scanner services, or printer services. Any time a complex API is wrapped so as to expose a simple interface for bringing in or flushing out data in the natural format of the application, a stateful class is the most likely mechanism. Looking at the file stream example, the state involved might consist of a buffer, a handle to the endpoint of the output, and parameters regarding how the data should be accessed (such as the file access sharing mode).
Stateful algorithms may take dependencies on any other type of class, but more commonly pure or stateful algorithms. Stateful algorithms are more likely to interact with data classes and models as arguments, or internally in the course of API wrapping.
Models
Models are a diverse, messy bunch. They're probably the closest to fitting the classical "data plus behavior" description. A class in this category will have intertwined state and behavior. The state will typically be of value without the behavior, but the behavior exists only in the context of the state. Most often, a model class comes about not because extracting the behavior and state into separate classes is impossible, but rather because it muddies the expressiveness of the design. Domain models, UI classes, and device APIs are the places where the model category of classes tend to serve best. Note, however, that these spaces also tend to attract convoluted coupling, inscrutable interfaces, unintuitive layering, and a host of other design pathologies.
These warts proliferate because it is difficult to make generalizations or rules about models. The foremost rule I keep in dealing with them is to try to avoid them. Not in the "model classes are considered harmful" sort of way. If you look at a program I've written, you won't find it devoid of model classes. In fact, they aren't even particularly rare. When I say I try to avoid them, all I really mean is that before I write a model class I make sure that the role I'm trying to fill isn't better addressed by some at least relatively simple combination of the other categories.
It's common for the behavior of a model class to depend on some private aspect of the state. Separating the behavior from the state would thus also mean dividing the state into public and private portions, to be referenced by the consumers of the model, and the model's services, respectively. This can be a nightmare to maintain, and the service implementor must often make a decision between downcasting a public data interface to a compound public/private one, or internally mapping public state objects or interfaces to their associated private state objects or interfaces. You see this type of thing crop up repeatedly in ORM solutions, where the otherwise internal state of the change and
identity tracking is pulled out of the model and maintained by the ORM.
This separation is difficult and messy to make. If you find yourself facing this choice, you can be fairly certain that a model solution is at least an acceptable fit for the role. There are benefits to the separation, but often the cost of implementation is high, and should be deferred until necessary. But beware as you forge ahead with the consolidated model, because every model is different. I find that the cleanest way to incorporate a model into a project is to establish a layer in which that model and its service classes will live, and make heavy use of data transfer objects and projections to inter-operate with other layers.
A model can, and often does take dependencies on any category of class. Because of the varied roles models can serve and the many ways that state and behavior may be mixed in them, it's very difficult to identify any patterns or rules as far as how dependencies typically arise and how they should be handled. Any such effort would likely hinge on further subdividing the categories into common roles and basing the rules and patterns at that level instead. I'll leave that as an exercise for the reader, for now. =)
A Guideline
As I've said, I find these categories to be a useful guideline in my own programming efforts. It's quite easy and natural to mix state and behavior together in most every class you write, simply because it's convenient on the spot to have behavior neighboring the state it's relevant to. But, it tends to be far easier to reason about the first three categories in the abstract than it is to do so about models. This is likely because the former tend to enable and encourage
immutability, which minimizes and simplifies a whole host of problems from concurrency, to persistence, to testing. For this reason I try to identify places that lend themselves well to the extraction of a data class or an algorithm class.
This strategy seems to make my code more composable, and my applications more discrete in terms of purpose and dependency. Which in turn makes the divisions between the different layers and functionality regions more clear and firm. This not only discourages leakage of concerns, but also makes the design more digestible to the reader. And that is always a good thing, even if the reader is just yourself 6 months later.