This is a long post with no specific questions as such. I'm just looking for any advice from people with real-life experience in the software development arena.

----------

Intro:
I've just taken up a summer job, and I was really enthusiastic about this. It will be 12 weeks of C++, which is exactly what I was looking for. I was slightly flattered to have been offered this position as I'm only a second (going into third) year engineering undergrad, not a computer scientist. I won a C++ game competition with over 200 entries so I know how to code, even though I lack large-scale project experience.
The project is a sort of simulator run by the college, and it has been under development for years. It is run by a professor and the team consists of bright and friendly post-grads. I honestly had no reservations or cynicism going into this.
My task is to implement a certain new algorithm for the project and implement some new features (I won't get into details). So essentially I would have to understand how the current code works, understand where the modifications need to be made and then actually make the modifications.

Problem:
So I was quite shocked after my first day on friday. I was completely unprepared for the demands, and now I don't know what to do.

First, the project was clearly developed by many people. Just glancing through the code put me in a panic. Some of it is nice indented C++, some of it is totally outdated with C-style casts. Some parts use references and others pointers; some #defines and others const variables; some arrays and others stl containers; std functions and C functions (strcmp, etc..); with variables, functions, classes etc NamedLikeThis, sometimes_like_this or _EVEN_LikeThis. Some of it is actually written in French with French comments and French couts! There are commented lines of code everywhere, with comments such as // FIXME QUICK!! . There is no consistency as to the coding style whatsoever.
Second, the project is 139 files, with 16,000 lines of code. This isn't that big (my game was 10,000 lines long and this was a solo effort), but it is still substantial considering that there is no obvious scheme to the code. Beyond int main the 139 files generally have #includes all over the place and it is absolutely impossible to construct any sort of hierarchy in my mind as to what goes where and what is essential and what is completely auxillary. Some of the included files are missing completely, but the project still compiles.
Third, there is absolutely no documentation or meaningful comments anywhere. It may as well be bare code. There is no manual. The variable names are not instructive either.
Well, lastly, I've been using Visual Studio ever since I started and it's become an extension of my programming, like a piano to a pianist. I like hovering over things to see the type, right-clicking and going-to-defintion, seeing the args with intellisense and seeing the member functions by typing a dot. Now I'm forced to use Eclipse with Linux, since the project is over a network and even though I'm sure Eclipse has these features too, for the time being anyway it may as well be Notepad as it will probably take about 12 weeks to get used to it.

So this sounds like a whinge, and ok maybe it is. And honestly, I admit that I lack experience and it's not my place to look at others' code and say if it's good or bad. Basically all I want to know is where do I start? I've only ever written programs myself, big and complex, but still entirely mine, with my heart and soul in every neat, indented line of code.
So how do I start to disect messy code that is not documented and not written by me? How does somebody look at code and start to make sense of it? Is this actually possible now that some of the original authors are long gone. Every book I've read stresses the value of writing code with maintenance in mind, but hey this is the real world, right and am I just being naive?
I am looking at 16,000 lines of code and it may as well be in Chinese: the lines themselves make sense e.g.

Future::Schedule(TL_ADVERTISE_EVENT, TL_ADVERTISE_EVENT_TIME, ARR_CRL_ADV_tkn);

is clearly a function call from a class or namespace, but how do I know what it does? For a start I can't find those macro definitions anywhere because of a new unfamiliar IDE (not a problem, but a hinderance and waste of time), I don't even know if those args are variables or constants or functions, the function documentation does not say what this is supposed to do, the macro names themselves are not instructive and the call is done without any comments.
So book theory aside, how do I construct the bigger picture? Should I grab an A0 sized wall chart and start drawing out box diagrams of every variable in all 139 files? (The OO approach) It might make sense by then but it will take time. Should I try to construct a function call hierarchy?
Or maybe should I start at the first line of int main and trace the program all the way to the end? (The procedural approach)
Should I start at the top chains of classes or at the very lowest chains? How do I even tell which are which?

All I want is just a grasp of what the program does, that's all. I will take it from there.
I know I can program, but this is a really steep hill for me. I have not read about this kind of reverse-programming in any book or on any website. There is seriously NO documentation, and the comments are scarce as you can see.. it seems like archeology more than programming..

To give an illustration I will just post int main(). People's names have been changed for discretion. (Now imagine that it's exactly like this for the entire 16,000 lines.)

I would appreciate ANY advice at all. Like where to start, what to read, how to approach this.
Even some real life anecdotes from real commercial situations would be heplful. Even if you can't help, just at least give me your opinion, so that I know if others share my reaction or not...

<snipped code>

> Every book I've read stresses the value of writing code with maintenance in mind,
> but hey this is the real world, right and am I just being naive?
Most real world software, especially commercial software is much better written.

I think the problem you have is that the s/w was written by droves of students on 3 month assignments, so everyone did their own thing. The lack of a design and a standard just makes it worse.

Is there even a config management system in place, like CVS?
All those "John", "Mike", "Tom", "Dick", "Harry" comments don't inspire either.

My first suggestion would be to make your own local repository, check in the whole project "as is", then import it into visual studio (if that's your IDE of choice).
Just delete all the useless commented out code and // user comments.

Thanks for the reply,

Yeah we use CVS. Also I don't think it's possible to port it to Windows, at least not with ease. There are lots of unix specific headers. But like I said the IDE is just a nuisance, nothing more.

If anybody can suggest anything at all about how to interpret other people's code I would really appreciate that.

The Source Insight can help in understand the meaning of each defined symbol, find references etc. It support C/C++ and can make this work faster.

It uses versioning, so just delete the comments; they'll be maintained in old versions.

I'd look for unix-specific calls; if there are none, you should be ok bringing it to Visual Studio.

Of course there's an argument to be made for learning the Eclipse way. You Could also look into doing it from a terminal (depending on your comfort with that kind of thing)

I've seen far worse code... (and I've always gotten myself into a lot of trouble by 'complaining' about it --even when it was obvious that it was costing the company thousands of dollars in delays and workarounds).

It might be worth suggesting to the prof in charge that he consider a term (not now, but in the future ;) ) where a small group of students cull and clean the code, then identify simple refactorings and do them. Tell him it will be good experience for them (if they want it), and it will make it faster (probably true) and smaller (also probably true). If you can show that it breaks under common conditions you can also argue that the effort would increase the program's stability and security.

But for now, don't worry about it. You can't be responsible for the mess. You can only not contribute to it. If you get into making it better you'll just give yourself a headache and produce dissatisfied evaluations of your performance (because you spent your time doing something other than what you were supposed to do).

The best way to get used to the code is just to get a local copy of it and start trying to change things. After playing around with it for a short bit you'll have a pretty general idea of where things are and what you have to do to make modifications. Then you can check out the current copy and start adding/modifying what you need to accomplish your assignment.

Alas. Good luck though!

Thank you everyone for the advice,

> ...139 files, with 16,000 lines of code...
> ...So how do I start to disect messy code that is not documented and not written by me?
> How does somebody look at code and start to make sense of it?

ok. so you have about 70 components or so. each with a header and a .cc average of about 200+ lines of code per component. (could have been a lot worse; imagine your plight with 16 components with about 1000 lines of code per component).

you could start by making a dependency graph of these components. compile a component a.cc with a -MMD switch to get a.d which lists its dependecies. compile all components this way and prepare the dependency graph out of information in the .d files.

now, scan through components which are at the lowest level in the dependency graph (those that have no other dependencies) to get an overall idea of what they do. and move up level by level in the dependency hierarchy to see what the high level architecture is.

then, determine where the component you are going to write fits in in this hierarchy and focus on the sub-graph that involves this component.

Doxygen can also generate useful information, and some pretty heirarchies.
http://www.stack.nl/~dimitri/doxygen/

Plus the HTML results are nicely clickable in a browser for following chains of information.

> compile a component a.cc with a -MMD switch to get a.d which lists its dependecies.

I have never heard of this switch before. Is that a gcc *nix thing or Windows or general or what?


In my experience a full program trace is not worth your time, as long as you are only working on a specific section. These type of college projects, where people just layer code without a good redesign every once in a while tend to be too disjointed to try to understand fully. I mean its doable, but your time is probably just better spent understanding whatever section you need to understand in order to get your work done. When I've seen people work on these kinds of projects in the past the professor is usually of little help, but you might try finding a student who has worked on it in the past to help you out, they may even still have a college email address and be willing to shed at least a little light on some of the thought processes that went into it.

Mainly the advice I would give on a project like this is that if you do not have the power to restructure a large part of the program then your time is probably better spent simply cleaning up your area of the code and leaving it in better shape than you found it. Take it as a lesson to always use comments and all the other best practices I'm sure you've been taught.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.