Introduction
The subject of this tutorial is a bit more advanced than the subjects I have tackled in the past, but I hope this will find some interested readers. From time to time in C++, programmers find themselves torn apart between using a Generic Programming style which primarily involves the use of templates and compile-time mechanisms (e.g., STL), and using a Object-Oriented Programming style that often favors inheritance and run-time dispatching mechanisms (e.g., Qt). As it is so often the case, there is no obvious "winner", i.e., there is no paradigm that should always be preferred, even within a single problem domain.
This tutorial will take you through a number of alternative techniques and will describe how these choices affect the performance and flexibility of the code. The novice-to-advanced programmers will discover in this tutorial a number of somewhat classic techniques that will certainly come in handy. The expert programmers might rediscover those techniques but mostly will see in this tutorial an occasion to reflect on some of the trade-offs that they face and the surgical ways to manipulate them.
Working Example: Printing the Fibonacci Sequence
For this tutorial, we'll be using a very simple example problem: printing a Fibonacci sequence. As is usual in these kinds of examples, the techniques that I will show will be much too complex for the simple example used (i.e., overkill), but the point is to imagine a more complicated scenario where the techniques shown are more appropriate given the complexity of the problem. Let's say we want two different versions of the Fibonacci printing function: one that just prints the numbers, and one that prints a table of the numbers and their sequence number. It's that inter-changeable behavior that we will require some polymorphism.
The Conventional Approach
I'll start off by showing the conventional approach. Printing the Fibonacci sequence is a trivial problem, and thus, the conventional solution is trivial:
template <typename T>
void next_fibonacci(T& prev, T& start) {
using std::swap;
prev += start;
swap(prev,start);
};
void print_fibonacci_values(int limit) {
int prev = 0, start = 1;
while(start < limit) {
std::cout << std::setw(10) << start << std::endl;
next_fibonacci(prev, start);
};
};
void print_fibonacci_table(int limit) {
int prev = 0, start = 1;
for(std::size_t seq_num = 1; start < limit; ++seq_num) {
std::cout << std::setw(10) << seq_num
<< std::setw(10) << start << std::endl;
next_fibonacci(prev, start);
};
};
Like many algorithms of this kind, it involves some sort of main loop, more generally called the main control flow, where each iteration performs one step of the algorithm. When you want to report or print the values between the steps, the conventional solution is to insert, within the main loop, the code necessary to report or print the values.
In order to reduce code repetition, we can, in general, lump the code executed at each iteration within a separate function: a role played by the next_fibonacci
function in our example. Obviously, this is not always the most practical solution because the iteration code could be large and require many parameters since the entire "state" of the algorithm must be passed and changed by that function.
More importantly, the main loop's code must be repeated for each different choice of reporting or printing method. Every piece of code is an opportunity for a bug, and thus, repetition is not desirable, and also leads to bloated, unmaintainable code. It would be nice to avoid repeating that code, no?
Inversion of Control Pattern
The repetition of the main loop is due to the fact that the control flow of the algorithm is implemented at the top-level function, and the printing code is inserted in the middle of every iteration, turning the whole thing into one monolithic or "frozen" block. Now, we'll discuss a form of inversion of that "conventional" pattern, which is often called Inversion of Control (IoC). The idea with this pattern is to have the top-level function choose an option (e.g., type of printing), and then, hand over the control to another function with customization points or polymorphic aspects.
Here is the function that controls our Fibonacci algorithm:
template <typename T, typename Printer>
void fibonacci(const T& limit, Printer print) {
using std::swap;
T prev = T(0), start = T(1);
for(std::size_t seq_num = 1; start < limit; ++seq_num) {
print(seq_num, start);
prev += start;
swap(prev, start);
};
};
which accepts a generic Printer
functor that is a simple callable object to which the current state of the Fibonacci algorithm is given at each iteration. This functor class could be seen as a policy class, or, in this context, an algorithm visitor, reflecting the fact that it's a form of polymorphism injected into an algorithm (cf., in OOP, the term Visitor is use to describe a technique for "tacked-on polymorphism").
To implement a printing of the values only, we simply write:
void print_value_only(std::size_t, int value) {
std::cout << std::setw(10) << value << std::endl;
};
void print_fibonacci_values(int limit) {
fibonacci(limit, print_value_only);
};
Or, if we want to print the complete table, we simply write:
void print_seq_value(std::size_t seq, int value) {
std::cout << std::setw(10) << seq << std::setw(10) << value << std::endl;
};
void print_fibonacci_table(int limit) {
fibonacci(limit, print_seq_value);
};
I hope the advantages of this method are clear. We have one complete implementation of the algorithm, two independent implementations that contain only the "custom behavior" code (i.e., the printing), and a few trivial pieces of decoration. There is no code duplication, in fact, because we have this Printer
template argument, the compiler takes care of duplicating the code, so the programmer doesn't have to (note: programmers make errors, compilers don't).
This simple example does not do justice to this design pattern. In practice, this method to inject customizations in the core of an algorithm is extremely useful, especially for more complex visitors that actually play a crucial role in the control flow, as opposed to simply observing and reporting. Also, this pattern helps shine a different light on certain algorithms and see them for their generic aspects and their customizable parts, and often allowing for closely related algorithms to share significant amounts of code. In these contexts, a more appropriate term might be "participatory polymorphism", as the polymorphic objects play an active part in the behavior of the algorithm in which they are invoked, as opposed to limiting the effects to the polymorphic object itself.
Generic versus Object-Oriented
Now that we have established the basic code we'll be working with, I will move on to the main topic of this tutorial: walking the line between Generic Programming (GP) and Object-Oriented Programming (OOP).
The analogies between GP and OOP run very deep. Many concepts in both paradigms have their direct counterparts in the other; the main difference is that GP revolves around compile-time mechanisms while OOP relies on run-time mechanisms. Here is a table with some common analogs (N.B.: this list might not be unanimous, and the analogies are not perfect):
Object-Oriented Prog. | Generic Prog.
--------------------------------------------
interfaces | archetypes
base classes | concepts
base class ref/ptr | template arguments
virtual functions | overloaded functions
dynamic dispatching | ADL
inheritance | CRTP
composition | policies
Needless to say, if any of these terms are unknown to you, look them up. It's also important to point out that these analogies are not perfect, i.e., they differ slightly in purpose and implementation between the two programming styles. And because GP has access to another complete programming language, template meta-programming, it presents a much richer set of possibilities and mechanisms.
Combinatorics in Generic Programming
Generic programming has many attractive features. However, it's plagued by one important and inevitable problem: the combinatorial explosion of instantiations. This is both a blessing and curse.
Just to make it clear, the problem is that for every special case (instantiation) of the templates, a brand new and complete piece of code is generated by the compiler for that instantiation specifically. This can get out of hands very quickly, a problem that is often referred to by this clumsy saying: "templates cause code bloat". The trade-off here is between the many optimization opportunities created by having one completely specialized piece of code for each use-case, and the penalties in terms of compilation times and binary sizes (code bloat) associated with compiling and optimizing several instantiations of nearly identical code.
In our Fibonacci example, we have a clear example of this issue, lets look at our main function template's signature:
template <typename T, typename Printer>
void fibonacci(const T& limit, Printer print);
Clearly, this function template will be instantiated once for every possible combination of a value-type T
and a printing-policy type Printer
. For a minute, imagine that instead of the Fibonacci series we had some complex algorithm that called upon many other functions and classes, probably also templates themselves, and overall, required a large amount of time compiling and optimizing it. Understandably, we will need one instantiation of that algorithm for each value-type, because that will have a profound impact on performance because it's the fundamental operational type of the algorithm (and we will probably select only one value-type in the end anyways).
But, what about the policy type (e.g. Printer
)? This problem is much less obvious. If the policy is very rarely used within the algorithm, compared to all the other operations that need to be done, then it would seem that whichever way the policy is inserted into the algorithm, it will likely have very little impact on performance. On the other hand, if the policy is used a lot in the algorithm, then the performance impact of the method by which it is inserted could be significant, especially if the code performed by the policy object is small, easily inlineable, and optimizable for its calling context. This is the case for the Hash
policy in STL unordered containers (hash-tables) where the hash function is used extensively and is very optimizable in general. But in our Fibonacci example, a printing policy would generally use IO-streams to print values to the console or to a file, both of which are fairly expensive operations, and thus, there is probably very little performance impact associated with inlining and optimizing the calls to the printer.
Another aspect to consider about the policy is whether it's likely to ever really change and if it could have a trivial case(s). For example, we might wish to compute some value using an algorithm, but add to it a printing policy for debugging purposes. Then, most of the time, the policy type would be some empty type that does nothing at all, i.e., its calls will all be optimized away by the compiler. And only a few times, the code would be recompiled with a debug-print policy. In that case, we should make the implementation choice that favors the final (released) code or most likely version of it. This is the case for the Allocator
policy in the standard STL containers because the standard allocator (new
/ delete
) is nearly always used, but allowing that policy to be changed can be helpful in certain special situations. If the allocator policy was not a template argument of the containers, but instead used a more dynamic / object-oriented approach, it would penalize 99% of the people, to accomodate the needs of a few, but instead, with the current solution, the default case has no overhead.
This pretty much summarizes the core issue of combinatorial explosion in GP. In short, every combination of individual customizations generates a new complete chunk of code, i.e., a new instantiation of the class or function template. So, the question now becomes, how do we get around this problem? The answer is that we will have to trade-off some of the performance by using a dynamic method, i.e., an object-oriented approach.
Object-Oriented Approach
Our little Fibonacci example can easily be turned into an object-oriented style by simply replacing the template argument by a base class reference (as per our analogy table). For that, we first need to define a base class, and the rest is straight-forward:
template <typename T>
struct sequence_printer {
virtual void operator()(std::size_t seq, const T& value) const { };
virtual ~sequence_printer() { };
};
template <typename T>
void fibonacci(const T& limit, const sequence_printer<T>& print) {
using std::swap;
T prev = T(0), start = T(1);
for(std::size_t seq_num = 1; start < limit; ++seq_num) {
print(seq_num, start);
prev += start;
swap(prev, start);
};
};
and along with that, we need, of course, our derived classes to perform the different printing methods:
template <typename T>
struct value_only_printer : sequence_printer<T> {
void operator()(std::size_t seq, const T& value) const {
std::cout << std::setw(10) << value << std::endl;
};
};
template <typename T>
struct seq_value_printer : sequence_printer<T> {
void operator()(std::size_t seq, const T& value) const {
std::cout << std::setw(10) << seq << std::setw(10) << value << std::endl;
};
};
void print_fibonacci_values(int limit) {
fibonacci(limit, value_only_printer<int>());
};
void print_fibonacci_table(int limit) {
fibonacci(limit, seq_value_printer<int>());
};
This solves the combinatorial problem quite well. For a given value-type, we end up with one instantiation of the fibonacci
function and one instantiation of each of the printing methods used. In other words, if A is the number of algorithms, and P is the number of policies, we have a number of instantiations (or size of the code) that is proportional to O(A + P)
, instead of the O(A * P)
we had with the GP approach.
The cost is, of course, the fact that each call to the printing method is dispatched dynamically through the virtual function call on the base-class reference. The point is often made that the overhead of a virtual function call is not significant, and that's correct: it's no more costly than calling any other function pointer, and only slightly more costly than calling a normal function. However, the performance penalty is not due to the function call, but to (1) a possible missed opportunity to inline the function and optimize it for the calling context, or (2) the lack of predictability of where the execution will go next, thwarting the efforts of pre-fetching, caching, and out-of-order execution pipelines that the good folks at Intel and AMD have spent the last decades perfecting. Nevertheless, the overhead is not that great, and certainly not significant in comparison to costly operations like file I/O, console I/O, or other similar things. But if you do a dynamic dispatch just to go and compute a few simple math expressions, then yes, the overhead will be very significant.
In any case, the point remains, the object-oriented approach certainly presents a nice alternative to stop the proliferation of instantiations and keeping the size of the executables within bounds of reason.
A bit of Set Theory
I'm gonna make a little detour to discuss a bit of set theory. I've already mentioned the combinatorial explosion problem, which, of course, implies the joining of sets and a combination of their cardinalities. But which sets are we talking about here? That's the interesting question. In technical terms, the sets involved in our Fibonacci example are the set of all value-types T
that could be used, and the set of all policy types Printer
that could be used. But it's more important to see those sets according to the purpose they serve. For example, we could call the first set the "operating set" O
and the second the "policy set" P
. Now, the set of all instantiations of the GP version of the fibonacci
function is the Cartesian product of the "operating set" O
and "policy set" P
, i.e., |O x P| = |O| * |P|
. On the other hand, the set of all instantiations in the OOP version is the disjoint union of O
and P
. The compilation independence (disjoining the sets) is a result of the dynamic binding.
In general, it's quite useful to think about the functional sets that compose a software design and to group them according to different criteria, some of which I have mentioned in previous sections. A good starting point to defining groups of functional sets is abstraction, because abstractions (base classes or concepts) are the representatives of corresponding sets of possible specializations (derived classes or instantiations). These sets are usually infinite (or unbounded) but we can generally estimate their typical size, e.g., there may be infinitely many geometric primitives but a typical geometry library might include at most about a dozen types of primitives.
The point I'm trying to make here is that for any given method or algorithm that you wish to implement, an important step in its design is to identify the sets of functional abstractions that it requires. Beware, identifying those abstractions is often a lot harder than you might think, and the right question to ask yourself is "what is the core logic of this algorithm?" because generally, everything else can be externalized into a kind of abstraction (value-type, policy, visitor, etc..). Once the functional sets are identified, they must be examined against some of the criteria discussed previously, that is, their role in the algorithm's performance, their size (or cardinality), and their inter-dependencies.
This latter point is especially important, because, more often than not, the functional sets are not entirely disjoint and that can pose some logistics and instantiation challenges. We have a common example of that in the Fibonacci example where the policy set $P$ is clearly a product of some policy logic and the algorithm's operating set O
(i.e., the policy class Printer
must know about the value-type T
or provide an instantiation for each possible value-type). The point is that it is worth investing the necessary time to identify functional sets and examine their inter-dependencies in order to avoid spurious or unnecessary ties between them, or identify where it's possible to disjoin the sets through dynamic binding (as shown in the previous section) or through type-erasure (as will be shown shortly).
Dual Approaches
One question that might come up at this point is: what if I can't really choose between GP and OOP, is there a solution that allows both? This is certainly a legitimate concern, and the answer is, of course, yes, this is possible. However, there are a few things to watch out for.
If we use, as our starting point, the generic version of our fibonacci
function:
template <typename T, typename Printer>
void fibonacci(const T& limit, Printer print);
we notice that the generic print
object could be of any type, including a polymorphic class. This means that we could simply call the generic version with our polymorphic classes:
void print_fibonacci_values(int limit) {
fibonacci(limit, value_only_printer<int>());
};
void print_fibonacci_table(int limit) {
fibonacci(limit, seq_value_printer<int>());
};
Simple enough, right? Well, this will indeed have the desired effect, i.e., it will behave almost exactly the same as the original generic version. The important part here is that the virtual call will be optimized away because the printers are passed by value to the fibonacci
function template calls, and virtual calls are optimized away when the type can be determined at compile-time (which is always the case for an object passed by value). The only difference in this code, as compared to the original GP version, is that the construction of the printers requires an initialization of the virtual table pointer, but that is essentially equivalent to initializing the function pointer that is passed to the function in the original GP version, and it represents, of course, a negligible overhead.
But here, we seek a dual solution, so, we need to find a way to accomodate an OOP version in there as well. All we really need here is to send over our printer objects in the form of a base-class reference (as before). However, we cannot really do that because the fibonacci
function template takes its parameter by value. We could, in principle, make the call like this:
void print_fibonacci_values(int limit) {
fibonacci< int, const sequence_printer<int>& >(
limit, value_only_printer<int>());
};
but that would cause two problems: (1) it requires specific enumeration of template arguments, and (2) it provides a reference type for a template argument that is generally assumed to be a value type (copyable). The first problem has implications beyond the fact that it's ugly, especially when it comes to maintainability, but let's not dwell on that.
The second problem is more important because it's a breach of contract (the interface). The contract here, as implied by the function signature of the fibonacci
function template, is that the printer must be a copyable object that can be carried around by value (and often, should be stateless). Sending over a reference is in contradiction with that contract. To solve this problem, we can turn our reference into a copyable object using a standard adaptor called a reference-wrapper (or std::reference_wrapper
), which we can create with the simple std::cref
function template (for a const-reference):
void print_fibonacci_values(int limit) {
value_only_printer<int> tmp;
fibonacci(limit, std::cref< sequence_printer<T> >(tmp));
};
where the tmp
object needs to be constructed before the function call to guarantee its existence for as long as the fibonacci function is calculating, otherwise, if created as it is passed to std::cref
, it would mean that it would only exist until the std::cref
function returns. This is the kind of caution to be applied when using adaptors like reference-wrappers.
To make things a bit nicer, we could also provide two Fibonacci functions:
template <typename T, typename Printer>
void fibonacci(const T& limit, Printer print) {
// as before..
};
template <typename T>
void fibonacci_oop(const T& limit, const sequence_printer<T>& print) {
fibonacci(limit, std::cref(print));
};
which certainly makes things easy: if you want the generic version, call fibonacci
with any functor you like (incl. a trivial one); and if you want the object-oriented version, call fibonacci_oop
with a printer object derived from the base-class. And all this is done with no code repetition, minimal decoration (wrappers and forwarding functions), and minimal overhead (any overhead that is not needed is washed away by the compiler).
However, there are a few problems remaining with this solution. For one, if the printers are to be used with dynamically dispatched calls, then they must inherit from the OOP base-class sequence_printer<T>
, which could be a problem. For example, it is very common, when making functors, to make the call operator a function template instead of making the class as a whole into a class template, as in this simple case:
struct value_to_stream_printer {
std::ostream* p_out;
value_to_stream_printer(std::ostream& aOut = std::cout) : p_out(&aOut) { };
template <typename T>
void operator()(std::size_t seq, const T& value) const {
(*p_out) << std::setw(10) << value << std::endl;
};
};
This kind of pattern makes sense from a practical point of view and from the general principle of minimum requirements. The printer class does not require knowledge of the value-type T
until it reaches the code to print values, i.e., it only needs T
within the call operator function. In that case, the scope of the template argument T
should be reduced to the smallest possible area where it is needed to avoid excessive or spurious type-parametrization, such as creating a unique instantiation of an identical class for each value-type T
. But now the problem becomes clear, value_to_stream_printer
cannot inherit from sequence_printer<T>
because T
is unknown at the class scope.
Naively, this could seem like a pretty simple problem to solve, let's just redefine the base-class as so:
struct sequence_printer {
virtual ~sequence_printer() { };
template <typename T>
virtual void operator()(std::size_t seq, const T& value) const { };
};
but that's not going to work because a "virtual member function template" is impossible, and not just because the folks that are designing the C++ language want to mess with your mind, but because it is not feasible in practice. The reason is simple, a function template is a set of instructions for the compiler on how to generate code for a given set of template arguments, which implies a theoretically infinite amount of possibilities, and on the other hand, a virtual function tells the compiler to place an entry for this function within a look-up table (a.k.a. the virtual table) such that derived classes can put in there the correct function pointer, i.e., one function pointer. One function pointer cannot act as a representative of an infinite number of possible function template instantiations, regardless of how it's implemented (virtual table or otherwise).
Type-erasure
Type-erasure is the general term for any technique that circumvents or subverts the type rules of a programming language. Regardless of the context and language, the basic problem is often posed as follows: "A" needs to send any data to "B" but the interface between them can only accomodate one kind of data. In basic terms, the solution always involves casting away the type, sending the data over, and re-casting back to what it should be. There is always a danger present, the danger that there is a disagreement between what "A" sends over and what "B" thinks it got. Under normal circumstances, static type-safety guarantees harmony in that transaction, but when the type is erased, you are left to your own device: you can either enforce type-safety dynamically, or you can encapsulate the type-erasure to externally ensure type-harmony.
But before I talk too much, let's just show an example of what I'm talking about. We could make a wrapper class to wrap the generic Printer
object into a class that inherits from the sequence_printer<T>
:
template <typename Printer, typename T>
struct seq_printer_wrapper : sequence_printer<T> {
Printer print;
seq_printer_wrapper(const Printer& rhs) : print(rhs) { };
virtual void operator()(std::size_t seq, const T& value) const {
print(seq, value);
};
};
template <typename T, typename Printer>
void fibonacci_dyn(const T& limit, Printer print) {
fibonacci_oop(limit, seq_printer_wrapper<Printer, T>(print));
};
where the seq_printer_wrapper
class template acts as glue between some generic policy and the operating types with which it will be used. And we would call this a type-erasure technique because the type of the policy Printer
is erased (or invisible) from the context in which it is called upon to print values. Now, we have a very nice solution because the printers do not have to inherit from the OOP base-class but can still be used in a dynamic way, with no additional overhead (i.e., the glue-class seq_printer_wrapper
will not cause any overhead except for the dynamic dispatching (virtual function)).
In terms of set theory, we can see that the point of this technique is to make the link between the GP approach that causes a Cartesian product of the functional sets, and the OOP approach that makes a disjoint union instead. This link is made by encapsulating the Cartesian set product within the thin wrapper class seq_printer_wrapper
, and thus, disjoining the union between the algorithm and its policy. In other words, we don't avoid the Cartesian product, but rather limit it to small section of glue code, and thus, avoiding the long compilation times and the bloated binaries.
Conclusion
In this tutorial, I hope I managed to show a number of techniques that you can use when walking that line between generic programming and object-oriented programming, between static and dynamic polymorphism. But the broader point I hope to have gotten across is the fact that when you program in a multi-paradigm language like C++, your software engineering analysis has to cross the boundaries between paradigms, as opposed to following the rule book of one or another paradigm (the so-called "pure" approaches, like "Pure OOP" or "Pure Functional").
The key is to understand the trade-offs involved in both paradigms. On the one hand, we can benefit from a deeper static analysis, and thus, better performance, when relying on generic policies and static polymorphism, as well as increased flexibility when relying on meta-programming techniques to compute, infer and verify the types of the objects involved. The price for that is, of course, the increased compilation times, larger executables, reduced encapsulation, and combinatorial explosions. On the other hand, we can benefit from greater run-time flexibility and limited combinatorics when relying on dynamic polymorphism (or object-oriented programming), as well as increased encapsulation, smaller executables and fast compilations. And again, the price for that is, of course, the reduced opportunities for optimization (and slight overhead), the immutability of the types involved (unless someone finds a way to add run-time reflection to the C++ language), and the delaying of many diagnostics to manual run-time checks and asserts.
Once those trade-offs are understood, and given your particular requirements for the project at hand, you can devise, using some of the techniques illustrated in this tutorial, methods to surgically insert the correct trade-offs where most appropriate, and that is the beauty of multi-paradigm software engineering.