I am interested in getting some opinions on the use of include files to sub-divide a large module into functionally related groups. What I'm interested in is opening a discussion on best practices and minimal requirements. Here's the scenario:

I am working with a library with 200+ functions. They naturally divide into 2 sets: and advanced set of calls and a "simplified set" which fills in some of the more arcane, rarely used arguments with typical defaults and then calls the advanced functions. So, instead of having one file with 2000+ lines of code, you have a master file with some functions in it that includes the other functions. Something like this:

master.cpp: Contains non-categorized, general function, misc stuff
master.h : Contains the usual stuff needed to make the program

advanced.cpp: Contains functional code for the advanced functions
advanced.h: Contains function prototypes for advanced.cpp
simplified.cpp: Contains functions that call things in Advanced.cpp
simplified.h : Contains prototypes for functions in simplified.cpp

The trick is that master.h includes advanced.h and simplified.h while
master.cpp includes advanced.cpp and simplified.cpp.

This has the following advantages:

1) With proper structuring of the includes, it is really easy to add/remove entire feature sets.
2) Studying the source and looking for specific code blocks gets easier because there's less to look through
3) It's easier to structure the code in an organized fashion

This idea is somewhere between the old Fortran "File per Function" paradigm and the more recent tendency to pack everything into a few mega-files. I already know that the strategy works (I use it now all the time). What I hope to get from this posting would consiste of ideas about where the dangers might be and how to work around them.

For example. to prevent multiple inclusions from generating multiple definition errors the typical include file usually has something like this:

#ifndef STRUCTS_H
#define STRUCTS_H
...BODY OF FILE
#endif // STRUCTS_H

In this case, you would never want one of these files included more than once. I suspect that the above code might actually make this problem worse. In some ways I think that leavin the ifdefs absent completely would be better since then a second instance of the include would certainly bomb the compiler - which might be better than some arcane error produced by a multiple inclusion.

the more recent tendency to pack everything into a few mega-files.

Who does that? I've never seen any well organized project do that.

master.cpp: Contains non-categorized, general function, misc stuff
master.h : Contains the usual stuff needed to make the program
advanced.cpp: Contains functional code for the advanced functions
advanced.h: Contains function prototypes for advanced.cpp
simplified.cpp: Contains functions that call things in Advanced.cpp
simplified.h : Contains prototypes for functions in simplified.cpp

The problem with having "advanced" and "simplified" compilation units is that the names are not meaningful. Also, 2000 lines of code is small. What happens when your company reaches 1 million? This kind of organization is hard to index for programmers, and will also result in slow recompilation speeds. Splitting up the "advanced" and "simplified" parts also doens't provide much more organization, since C++ already provides classes, namespaces and overloading.

When a programmer looks at a C file, he'll often use the names of the functions as guides. When a programmer is going through a source tree, he should also be able to use the filenames as guides.

For example, most c++ projects will have a header and a source file per class. So, if the classes are well designed and have meaningfull names, then the filenames will also follow suit. It also makes it easy for programmers to search a class up (search for a file), or to guide themselves through a project simply by looking at the file names. The classes will have private members and public members to split up the unimportant functions. Then there will be a few extra headers to define constants, and group togeather usefull classes.

The same thing can be made about C. You should break up your code into "lagical" parts, and put all of the logical parts togeather. For example, say your program makes use of linked lists. You would expect to see one header and one source file for all of the functions manipulating the list, as well as the structure itself. You can hide unimportant functions by not including them in the header file.

Esentially a program would be well designed if a directory listing ls -R would be enough to give a programmer a good idea about how the program works.

Hiroshe,

Thanks fo the thoughtful response. However, first, the names used were not intended to represent an actual project, but a conceptual framework. Second, I fundamentally agree with everything you say but it is not about classes, which are obviously inherently organized into units, or is it about naming conventions - while I agree with you that intelligent names are among the most important factors in an intelligible program. The problem is that the discussion was supposed to be about logical grouping into sub-fles using the include mechanism. I might note that this does not appear to slow down compilation at all. I suppose that it should in theory since the pre-compiler would need to hook more files together, but this has not been my experience - in fact if anything it seems to speed up compilation.

Well, who does the Megafile thing? Right now I am working on a re-write/rationalization of a library with over 200 entrypoints that are all stuffed into one big file and rather poorly organized. Each entry point is in duplicate, with an advanced form and a simple form in which the simple form calls the advanced form after supplying missing arguments. I didn't write this thing, I just have to deal with it.

Now back to the point. It does occur to me that I could make a bunch of smaller modules following the classical source/header file model and link these into a single library. However, for the short term, the idea was to split this monster up into functionally related groups using the include mechanism. What I hope to learn here is if there are any hidden dangers in this technique.

However, for the short term, the idea was to split this monster up into functionally related groups using the include mechanism. What I hope to learn here is if there are any hidden dangers in this technique.

Provided the "modules" are independent enough to justify breaking them down, or the breakdown is internal and not exposed to users of the library, it's all good. The key with organizing a library is transparency. A user of the library must be able to figure out what headers to include for which functionality with relative ease.

If users end up asking "what the heck do I include to do XYZ?", your library is poorly organized.

master.cpp includes advanced.cpp and simplified.cpp.

I would not recommend doing that. There is really no reason to create a "master" cpp file that includes a bunch of other cpp files. You should compile them separately and link them together (either as a static or dynamic library).

I generally find it acceptable to create some "include everything" header files, and many libraries do exactly that. Although, I would still recommend breaking things up into different headers, such that users still have the option to include only the minimal number of headers for the specific functions / classes that they need. Several Boost libraries are structured this way, i.e., you have the choice of either including the "include all" header to get everything, or including just the individual headers for individual features of the library.

I might note that this does not appear to slow down compilation at all. I suppose that it should in theory since the pre-compiler would need to hook more files together, but this has not been my experience - in fact if anything it seems to speed up compilation.

If your experience is limited to this 2,000 LOCs library, then I can easily imagine that you have not found it makes a difference in compilation times. Also, C-style code (no classes, no templates) typically compiles very fast anyways. 2,000 lines of code is a trivial amount of code, and I would expect it compiles very fast and easy. If we moved that figure up to 200,000 LOCs, then we're starting to have a conversation here. When proper structuring of the library makes the difference between compiling the project in 30 minutes versus compiling it in 5 hours, then it becomes a real issue.

There is a whole set of techniques that people use (and are very serious about using them) that are aimed at reducing compilation times. All these techniques boil down to reducing the amount of code that the compiler has to chew at once while at the same time reducing the compilation redundancy (when the compiler chews essentially the same piece of code several times). First rule is that headers should include the absolute bare minimum amount of other headers, e.g., such as using a forward-declaration of a class instead of including the header where its defined, or hiding an external dependency through opaque types (or forward-declarations of opaque types) such that the dependency is only included in the cpp file. Second rule is that cpp files should be rather small and split in such a way that any external dependencies / included headers they use are logically split. It is not uncommon to have a single header file associated to several cpp files, for example, if some of the functions require a particular (heavy) dependency, such as for networking / GUI / etc., then you can group those in one cpp file. Obviously, if you split things up too much, the compiler will end up doing a lot of the same work again and again, which will end up increasing the compilation times overall, i.e., it's a fine balance, and it takes practice to achieve it.

Another important aspect of reducing compilation times and structuring a library's files is the problem of maintenance and inter-dependencies. Typically, in any non-trivial project, you would do incremental compilations, usually managed by a build-system that can detect which files have changed since the last compilation. Furthermore, any non-trivial project is developed under version control, and often by several people, concurrently working on it. For those reasons, having a very fine-grained file organization and having minimal inclusions between headers (as I just explained) is very beneficial.

I find that a good trick in this regard is this: whenever you want to figure out how to split things, ask yourself "If I ever edit/change function X, what other functions am I likely to modify as well?". According to that rule, the kind of "advanced" / "simplified" functions that you are describing should probably be paired together in the same header / cpp files, because whenever (or if ever) you modify the advanced function in any serious way, there is a good chance that you will want to modify the simplified function too. This also promotes fine-grained splits. Whenever you do incremental compilations (which only compiles cpp files that have changed, and cpp files that include (directly or indirectly) a header file that has changed), it's much faster and it's even faster when modifications to the files have very little ripple effects (i.e., the amount of recompilations that a particular file modification triggers). When the whole project takes 2 hours to compile from scratch, minimizing those ripple effects saves a lot of time. Furthermore, when you are working on a big project, with lots of people, under version control (git / svn / cvs / etc..), this grouping by likelihood of being edited together makes that coordination quite a bit less painful to manage.

In some ways I think that leavin the ifdefs absent completely would be better since then a second instance of the include would certainly bomb the compiler - which might be better than some arcane error produced by a multiple inclusion.

That makes no sense to me. The whole point of the header-guards is to prevent multiple inclusions from being an error by preventing the same code from appearing multiple times. It's virtually impossible in a project to prevent or enforce a rule against multiple inclusions of the same header, i.e., multiple inclusions will happen, the important thing is that when they do, all inclusions of the header file after the first one will be ignored, that's what the header-guards do. Why would you want multiple inclusions to trigger compilation errors instead?

The problem is that the discussion was supposed to be about logical grouping into sub-fles using the include mechanism.

There not much point to splitting up "simplified" and "advanced" code. The language mechanisms handle that well (private/public, overloading and plain old not putting it in the header in the case of C).

Well, who does the Megafile thing? Right now I am working on a re-write/rationalization of a library with over 200 entrypoints that are all stuffed into one big file and rather poorly organized. Each entry point is in duplicate, with an advanced form and a simple form in which the simple form calls the advanced form after supplying missing arguments. I didn't write this thing, I just have to deal with it.

If you have 200 functions (many of which are just overloads), and there is no way to make productive breakdowns of it, then it sounds fine. The programmer simply ignores the functions he doesn't use. "A lot" doesn't necessarily mean "unorganized" or "bad". If they're poorly organized then the problem is that it's poorly organised, and splitting up the "simplified" and "advanced" functions won't solve much.

First rule is that headers should include the absolute bare minimum amount of other headers

I completely agree. But, remember, these are not really typical header files per-se. Some are CPP source to be included in-line. The header like files are only to be included into another header file -again in an in-line fashion

That makes no sense to me. The whole point of the header-guards is to prevent multiple inclusions from being an error by preventing the same code from appearing multiple times.

Well, that's exactly the point. Using include the way im proposing requires that they are never ever included by anyone else. The idea is to put a bunch of functions in another file and then include them in line when compiling the library.

If they're poorly organized then the problem is that it's poorly organised, and splitting up the "simplified" and "advanced" functions won't solve much.

Here again, I agree - with one exception. In the advanced-simplified case, I may want to generate a library with NO simplified functions at all. Right now they are scattered all around and I would need 200+ conditional blocks to turn them all on/off. If I move them into a separate pair of files then a single ifdef endif around the include lines for the simplified set (one each for the cpp and h files) does the trick. True enough, I could, with a lot of cutting and pasting, sort them all into groups in one file and accomplish the same thing. Still it seems more readable to have something like this in "Library.cpp":

#ifdef LIBRARY_HAS_SIMPLIFIED_FUNCTIONS
    #include "simplified.cpp"
#endif

Than it does to have the same conditional block with 400 lines of code between the #ifdef and #endif

However, the opposite applies in the case of simplified.h Since I really don't want the user to have to deal with an include file with a split personality. In fact, I don't even want them to know that simplified.h even exists. I now think it would be much better to have one library include file, the contents of which depends upon the presence or absence of the simplified functions. More or less like capturing the output of the pre-processor's rendition of library.h, which has this in it:

#ifdef LIBRARY_HAS_SIMPLIFIED_FUNCTIONS
    #include "simplified.h"
#endif

Which seques nicely into this next bit--

A user of the library must be able to figure out what headers to include for which functionality with relative ease.

Don't get me wrong here. The library user would never - ever include any of these files. Only the compiler would (should) ever see them when building the library. In the end case, there would only be one library and one include file.

After re-reading Mike's post, I'm inclined to limit the proposed use of includes for source in-lining only to special cases and instead break functional groups up into separate source/include file pairs - rather like the Boost model that was mentioned, or indeed like the Qt model of including some classes in groups and others only in the files where they are needed. It does take a lot of care to get the include tree rationalized, but it's probably worth the effort most of the time.

By the way, so far this thread has been fantastic. The replies have all been thoughtful and have been very useful in refining my sense of when to and when not to use this technique - as well as giving me a lot of additional ideas about code structuring in general. I cut my teeth in the assembly and Fortran world - i.e., in a world where header files either don't exist at all, or in which they don't play much of a role, so this kind of advice is really golden.

The library user would never - ever include any of these files.

Provided you can guarantee this, then it's only a question of transparency for maintenance programmers. All in all, I see no problem with breaking internal headers down like you've suggested. An important lesson is that best practice is not always practice. But care must be taken when deviating from best practice, and it looks like you're taking great care in doing that. :)

From what I gather, you want to make a macro such that if it's set, you get a few extra functions that are easier to use.

The question is why? Why not just have them availible all of the time? Even better question: What's the disadvantage of having them on all the time? Having more functions availible doesn't imply that it's unorganised. I don't see splitting them up serving any advantage.

The most it will do is make the programmers autocomplete a little longer, and the table of contents in the documentation for that class in that namespace a little longer.

Right now they are scattered all around and I would need 200+ conditional blocks to turn them all on/off.

Leave them all on. Again, having "more" functions does not imply that it's unorganised. It's unorganised because it's unorganised, and turing them "off" won't help.

#ifdef LIBRARY_HAS_SIMPLIFIED_FUNCTIONS
#include "simplified.cpp"
#endif
#ifdef LIBRARY_HAS_SIMPLIFIED_FUNCTIONS
#include "simplified.h"
#endif

The problem with this is that the ENTIRE library will need to be recompiled if you change this one macro. That's going to take a while. The second problem with this is you'll have 2 incompatible binaries for the library. The means that if some programs didn't use the simplified version, and some did, it would require the user to have 2 copies of the library on his or her computer (assuming it's dynamically linked). And again, I don't see this helping other then to "hide some functions ad compile time".

The question is why? Why not just have them availible all of the time?

That's a really good point but it still focuses on the pluses and minuses of a single case. More to the point then. I am, in fact porting an SDK into a Qt library and eventually I will have 2 versions of the library - one with all of the simple functions completely removed from the library. The macro to remove them makes it much simpler to test and verify that nothing gets broken when I remove them.

The problem with this is that the ENTIRE library will need to be recompiled if you change this one macro. That's going to take a while.

This is true, but then the library takes less than 30 seconds to re-compile.

Yet, it occurs to me that there is another "why?" to address. "Why do away with the Simplified functions at all?" Well, at least in this case, these functions hide the hardware device identifier from the programmer. This is a single gchar argument. This then means that the programmer must have at some point passed this information to the library. Because of the way this library is written and also because of some arcane hardware issues, the library's copy of the device is not always correct. Therefore, application level code written with the simplified functions is more fragile. However, until I get the library completely ported to Qt and regression tested, I want to be able to compile the simple form of the functions back into the library for testing purposes. Therein lies my interest in this not-so typical use of the include directive. It is primarily a convenience issue.

Using macros to create debug builds and release builds is perfectly acceptable. As long as the release builds are the only ones used in the field, then there's not much to worry about.

If I understand correctly, it sounds like you have some old library that is poorly organized, and you are tasked with porting it to a new support library (e.g., Qt). It sounds like that poorly organized interface (set of functions) for the library is something that you would eventually want to re-design, but that in between, you need to keep that interface such that you can do some regression tests when the port is completed.

Is that assessment correct? If so, then you have to realize two things: (1) you can't really significantly change the interface, and (2) you don't need to. In other words, you shouldn't worry about the organization of the interface, only the back-end. So, you should just stop worrying about that whole simplified / advanced business and how that is done, because it has to remain that way, and you have no choice.

Now, if you're going to eventually re-design that interface to solve some of those robustness issues that you said the library's interface has, then you need to start thinking about how the new interface is going to be structured, and then, model your back-end (Qt port) of the implementation to tailor it towards that new interface. Then, you write whatever code is needed to glue together the old interface with the new back-end so that you can do your regression tests. And finally, you write the new interface, and migrate whatever user-side code you need to migrate. Does that sound like a good game plan? It's probably what I would do in that situation.

Mike,

That sounds like a really good plan. In fact, I've been thinking of making a real Qt class out of the library, with which I could have a separate instance for each hardware device, and patterning it after qtserialport, also have a helper class that manages a list of devices available. This would obviously break binary compatibility with the old library, but it really wouldn't matter much in a larger sense because anyone using the older library could just continue to do so. It would also eliminate the extended vs simplified interface by default, since each class instance would already contain all the device specific information that the user now supplies in the extended interface. In fact, it really simplifies the entire structure of the library. Actually, this sounds like a rather fun project. Thanks again for all the useful suggestions and ideas.

I think you have the right idea, but since everything is better explained with some terrible bits of ascii art, so here it goes.

Currently, you have this:

(Old API)  --->  (Old Impl)

where the "Old API" is that set of advanced / simplified functions that is currently used to call that library, and the "Old Impl" is the current implementations of those functions.

I think you were originally talking about doing this:

(Repaired API)  --->  (New Impl)

where you were trying to find a way to make the old API a bit better or more organized, but without breaking compatibility. It's kind of difficult to do that, and often just leads to only very trivial improvements to the API (such as reorganizing the header files' structure).

The solution I recommend is either one of these two:

(New API) ---------------------> (New Impl)
                            ^
(Old API) --> (Glue Code) --'

Or:

                        (New API) --> (New Impl)
                            ^
(Old API) --> (Glue Code) --'

The idea is that you create the "New Impl" such that it is primarily meant to be called and used with some "New API" that is a complete re-designed API that is more modern, robust and modular, and whatever else you want to add to it. In other words, you design the new API and the new implementation together (it could literally be that the "New API" is a set of classes declarations, and the "New Impl" is just their implementations / definitions). And then, for regression testing and/or backward compatibility, you maintain the "Old API" (essentially unmodified) but replace it's back-end (implementation) with some simple glue-code that forwards all the work either via the new API (option 2 above) or to the new implementation directly (option 1 above). From a maintenance point of view, going through the new API is preferred, because this way the glue-code uses the same well-documented and maintained interface that user-side code should be using in the future, instead of relying on implementation details that you might want to change in the future. But if your old API is a bit more low-level so to speak, then you might have to create some special "cut through the interface" methods in your new implementation to allow for such bare-metal accesses.

since each class instance would already contain all the device specific information that the user now supplies in the extended interface.

Yes. Classes are a great way to bundle together a bunch of default parameters and things like that. The "simplified" versus "advanced" dichotomy can be reduced to simply providing a few different constructors for the class, with more or less parameters given to the constructor (the rest being given default values). The class can have functions to do some advanced manipulations, and it doesn't infringe on the most basic use that most people would do, such as default-constructing the object and calling some basic function(s) on it. This is one of the most mundane advantages of classes (instead of free functions), yet one of the most useful ones in making an API nice and clean for the user to use.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.