Let's participate in writing. For my graduation project I had to do a literature study. I discussed the book "Clean Code", written by Rober C. Martin (a.k.a. "Uncle Bob"). I copy-pasted this and hope I didn't forget to adjust the formatting.
Motivation
My literature study started when I inspected the existing code. It was almost unreadable. Their product was worked on by many programmers. Not only the people who work there now, but also those who left the past few years. A senior programmer told me many employers switched companies and all did their part in their own way. There were parts of the code where he didn’t know where it was for, but when he removed it, the site broke. Often variables are written as (a, s) and you certainly need to check the code to use it. When expectations didn’t match the outcome, the functionality was sometimes changed and again the web shop breaks.
Goal
My idea was to make a coding standard which suite the Indian programmers. This will make it easier for them to use the functionalities properly and for a senior to understand the framework and even make additions in the same way. When they do, putting the new functionality into a web shop will be no problem for the others. Also new employers will have less problems getting familiar with the web shop. Of course the code itself should not be a mess, but how do I make sure it's not? And how do I make sure it will be clean after some years of development? My goal is not to know everything about writing clean code. The parts which don't suit the project borders I will leave out.
Questions
Before I can write a good standard, I want some questions answered. The questions for the literature review are:
- Is there a preferred kind of standard which is followed most of the times in the code?
- What are the biggest advantages for writing clean code?
- How to write clean code?
- How to keep files clean and simple?
Approach
I don’t want to search for a coding standard, because I want to write one suitable for the Indians. Therefore I need literature with more than just naming standards. At my education I learned quite some techniques, like the Hungarian notation naming standard and (Mark Allen Weiss) writing quality functions with good algorithms. Maybe you've heard of “Uncle Bob”. Robert C. Martin is almost a guru when it comes to knowledge of clean code. He gives lectures on big events and has written a book with his vision about how to write clean code. He discusses all kinds of coding issues including the Hungarian notation and how to write a good function from naming to line amount. The book came out on the 1st of august 2008, saturday the 2nd I found it after searching through Chennai for 6 hours.
Preferred kind of standard followed?
Although several different styles were used, some kind of preference was certainly shown. Most functions had a name like getClients(amt, arg); where ‘amt’ stands for amount and ‘arg’ stands for arguments. Unfortunately comment lines were written rarely and if they were, often it was written in Hindi or Tamil. Most employers speak Tamil or Hindi, but only few speak both languages. Further more there was actually nothing to be found which was interesting. Some code copied from the internet was coded clean, but most of it was a mess.
What are the biggest advantages for writing clean code?
Some people believe that in the future programmers don't have to write code anymore. Everything will be generated. Nowadays a java object can be generated for most of the lines. New components make it easier to create a program in short time. You can copy almost everything from the internet, but still... There will be code. Clients want special work, and whey will always ask for new features and adjustments. To be able to make adjustments right, you must be able to understand the code. If not, bugs will be built in.
Martin states clean code is when you look at a routine and it's pretty much what you expected. He also states that the best metric for design results can be WTF's per minute. Clean code has 0 WTF's per minute. Clean code is a simple, straightforward piece of functionality which does one thing well and where it's difficult for bugs to hide. Clean code can be read and enhanced by a developer other than the original author. (see http://sievertschreiber.files.wordpress.com/2009/12/good-code-is-measured-in-wtf-per-minute.jpg)
As a coder, you are an author and your work will be read. Make sure it can be read easy. We read 10 times more code than we write, keep that in mind while writing another line of code for your future readers.
A coding standard increases reusability by suggesting all programmers to write in the same way. Which means that a programmer could use existing code for a new project and understand someone else's work.
How to write clean code?
Martin gives lots of examples to write clean code. I want to light out the ones which I think are most important for the use of this project. I will sometimes use Martin's (Java) examples. To keep it all a little bit the same, I will write my own examples in Java as well. The first paragraph I will use far more examples than the other ones, because naming is best shown with examples. The other paragraphs will be more story-wise.
Naming
Use meaningful intention-revealing names. The name must specify where the parameter is used for. Avoid disinformation which could trick you or your colleague. For example if you want to store the elapsed time in days, name it that way. Of course don't make it all too long, the more you need to read, the slower you will have read it.
- Good: Int elapsedTimeInDays;
- Bad: Int n; // elapsed time in days
- Bad: int elapsedTime; // in days
- Bad: int elapsedNumberOfDaysFromStarttime;
- Good: void postPayment( ) { ... }
- Bad: void do(){...}
Use pronounceable and searchable names. Unpronounceable names make it difficult to search through your code and find errors. You will need to look for the name elsewhere in the code.
- Good: Date generationTimestamp;
- Bad: Date genymdhms; // generate year month day hour minute second
- Bad: Date randTime; // sounds like ran time
Use names for constant numbers. When you need to search for a certain pattern, these words help you telling where a number is used for and you don't need to guess it. Write constants in capitols.
- Good: Const int WEEKDAYS = 7; for (int i = 0; i < WEEKDAYS; i++) { ... }
- Bad: for (inti=0;i<7;i++) {...} //whatdoes7mean?
Use camel casing. It's easier to read words when a new word is marked with a capitol. Besides most editors can jump from uppercase to uppercase, which gives you extra functionality and speed writing the code. Start small when you write functions or variables. Start with a capitol when you write a class name.
- Good: int aNewInteger;
- Bad: int anewinteger;
- Bad: int Anewinteger;
- Bad: int AnewInteger;
- Good: public class MyClass { ... }
- Bad: public class Myclass { ... }
- Bad: public class myClass { ... }
Avoid encodings or the Hungarian Notation*. Classes are objects and therefore should not be a verb. Method names should have verb or verb phrase names like postPayment, deletePage, save or getName. They are the actions objects can perform.
- Good: public class Car { ... }
- Bad: public class Drive { ... }
- Good: postPayment( ... ) { ... }
- Bad: do(...){...}
- Good: setName( ... ) { ... }
- Bad: name( ... ) { ... }
- Good: getName( ) { ... }
- Bad: name( ) { ... }
Keep it simple and straight. If you want to use delete, use delete always. If you rather use remove or setActive, all ok. But don't use more than one word for a type of action. Don't pick too clever words. Inactivate, kill or disable will finally be functions only the author will know of. A moderator could easily look over the function and make the same function under another name. Say what you mean, mean what you say. Special attention to the function add. This is used too many times. For specific purposes choose to use insert or append.
- Good: taskList.append(task); OR taskList.add(task);
- Bad: taskList.push(task);
- Good: delete( ); OR remove( );
- Bad: deleteItemFromPage( );
- Bad: inactivate( );
Add meaningful context and don't add gratuitous context. If your function is getting too long you can extract parts to new functions and give those good names which will tell you what will happen. If you are working on an imaginary project called Gas Station Deluxe it's
The Hungarian Notation used to be a good format, but nowadays it's no longer of any use. In early times the compiler couldn't see the difference between an integer, a string or a form. A programmer could make it easier for himself to detect a form by starting the name with frm, an integer with i, a string with str, etc. Nowadays you only need to keep your mouse above the variable and you'll get all the info you need.
A bad idea to prefix every class with GSD. Frankly you are working against your tools. You type a G and press the completion key and are rewarded with a mile-long list of every class in the system.
Good:
String getMessage(List<Message> messages) {
if (messages.size( ) == 0)
return getNoMessages( );
else if (messages.size( ) == 1)
return getOneMessage( );
else
return getMoreMessages( messages.size( ) );
}
Bad:
String printTopBarGuestMessage(List<Message> guestMessages) {
if (guestMessages.size( ) == 0) { ... a lot of code ...
} else if (guestMessages.size( ) == 1) { ... a lot of code ...
} else { ... a lot of code ...
} return outcome;
}
Naming is very important to keep your code readable. I personally think it's most important. If I have to fix a bug en I look at the piece of code and everywhere I look I see 1 character long variable names I know it will be a long stay in the evening. Camel casing gives great advantages with a good developer tool. For example Eclipse, a freeware developer tool. If you search for a class with several capitols, just type the capitols and the code completion does the rest. Because of the code completion getting better day by day I think making names too long is not a problem for writing anymore, but still for reading it is. Again camel casing helps. Your mind will split the words faster and you're able to read the purpose of the class. Constants increase readability and will make it easier for you to change the value later in the project. I agree that you can better use a constant for the number of rows rather than pin it to 8 in the code. And also if you need a value more than once it's better to use a constant because then the reader sees it works together probably. But in case of a loop which has to run only once for all weekdays I think a line of comment is just as good. I found it interesting to read why the Hungarian Notation is no longer of any use. I didn't like to use it, when we had to in a school project. Every form started with 'frm' and when you searched one, you had to search with your eye, because using your keyboard for the first character didn't quite work. After that project I never used the notation again, because most of the guidelines Martin describes I already follow.
Classes and functions
The standard Java convention tells that classes should begin with a list of variables. If you have static constants defined, define those on top of the list. Followed by static variables, followed by instance variables. All should be private, there's almost no reason to define a public variable, writes Martin. After the variables come the constructors if you define any. If not, Java will create one for you without arguments. The constructor is followed by public functions. Private functions are put under the public function they're used by. I don't agree with the placement of the private functions. I think all private functions should be on the bottom of the class. If you write a private function, you want it to be used as much as possible. If you decide to write a new public function and you know of the existence of the private function written earlier, it's very likely it's located above your new public function calling it. Therefore I foresee problems maintaining this. I'd say privates below and
protected in between. About the public variables I agree. Almost. Except standard numbers you would like to use everywhere in your project. You locate these in a different class and you make one instance of this class. This class is called a singleton. If you use a singleton for let's say your maximum search results, you can easily call Constants.MAX_SEARCHRESULTS and change the value in the Constants class without problems.
According to Martin the first rule of functions and classes is that they should be small. The second rule of functions is that they should be smaller than that. Classes should have little amount of responsibilities. Simplified said: count the amount of public functions. What does the class do? Is it placed in the right class? Can we extract parts? In the health sector software you'll find a client object. It contains the name and illnesses. You can make this smaller by creating a person class. The client class will extend the person class and contain illnesses. The person class will contain the name and birthdate. Names like manager, super or processor often hint a class is charged with too many responsibilities. A class should have one responsibility if possible and most of the times it is.
Martin states a maximum of about 20 lines of code for a function. This suites the next most important thing in my opinion. A function should do one thing and one thing only. This one thing should be done well. Having said this, what to do with a switch statement? This statement is built to do more than one thing and it's difficult to write a small switch statement. The only reason we use it is because sometimes you just have to. Flags implicate a function can do more than one thing. The boolean flag normally makes a function have two functionalities. Don't use flags. Flags are sometimes useful, but most of the times you can leave them out by just making another function which does the other part. If you need a split somewhere, let that be another function which calls the two ways of functionality. Still most of the times a boolean flag comes from a statement. This statement can be put into the function and you won't need the flag.
Use as minimal amount of arguments. The more specification the more work for the user. Make it easy, use as little as possible. Martin states functions with one (monads) or two arguments (dyads), functions with three arguments (triads) should be avoided. Many of you will think this is impossible. Maybe, but try first. Martin points out that you can minimize the number of arguments by creating extra classes. Look at the piece of code:
public Circle(double x, double y, double radius) { ... }
This constructor has three arguments, which Martin says it's too much. You can refactor this using another class and make it easier to understand using two dyads.
public Circle(Point center, double radius) { ... }
public Point(double x, double y) { ... }
Personally I think you can make exceptions on this rule. For example if you want to make a new instance of a class Person( ) and you can enter the first name, last name and birth date in the constructor it's easier and you can even make sure you have these three arguments if you need to. If another function needs these three arguments, it should ask for the instance of Person. In this case it's useful. Still, the example above shows an improvement. Not even have you made the circle easier to create, you also know where the point will be used for.
The bracket indention he prefers is opening bracket behind the statement on the same line. This will decrease the amount of lines you use and more code will appear in your screen. Here I totally disagree. Not everything is about speed, it's also about readability. Sometimes you can't get your complete piece of code written on one line, because for example you really need a triad and you need to name your variables specific. Your editor
will put the remaining part on the new line and adds indention. The same as the new statement would have. If you put the bracket always on the new line, you'll see directly when a statement or function definition is 2 lines long. Besides, you'll write small functions, right? At least you'll have around 30 lines of code printed on your screen.
I also have my doubts about Martin's suggestion concerning your non-void functions. He also writes that you should avoid returning null. Everywhere in the code null pointer checks have to be built into the software in case a function returns null. I know sometimes you just can't do anything else than returning null. I would keep it in mind, but an exception like this is easily found and corrected. He did give another advise which I think is far more useful. Avoid negative conditionals. If you have a boolean property called active, you can either make isActive( ) or isInactive( ). Think about what you need, it would be weird to see the clause if (!var.isInactive( )) rather than isActive( ).
In some software you'll find the data model setter for a certain attribute returns the instance of its class. This gives you the possibility of chaining, calling setters right after each other on the same line. Martin didn't speak of this in his book and I would like to say something about that. Chaining makes your code ugly. If you don't want to write the variable name, put it in your ctrl+c and paste it a couple of times. Your indention stays ok and you don't get enormously long lines of code.
Martin tells how he writes functions like this. First he'll start with rough drafts and many nested loops. He writes tests to make sure the function keeps doing what it should do, and then he starts taking out parts of code and he puts it into a new function. Sometimes he even extracts new classes and at the end he checks every function for his rules and finally reorders the functions and it's done. I don't know if he meant that sarcastic.
Comments
Nothing beats a well-placed comment. Everyone agreed, the more comment lines your code has, the better it gets readable. Martin disagrees. Comments are needed when your code is not clean. You can better write extra code to make it clear than to write a comment line. Side effect is that you will change the code always, but do you also change the comment? Nothing can be more damaging than an old comment which nowadays tell lies about functions. And some comments are worse than code itself. For example if you have a regular expression to check email, what would you rather see?
//Set the email pattern
string Pattern p = Pattern.compile(".+@.+\\.[a-z]+");
Or this
Const String EMAIL_PATTERN = ".+@.+\\.[a-z]+";
Pattern emailChecker = Pattern.compile(EMAIL_PATTERN);
Martin writes in his book that you need to keep in mind that the only truly good comment is the comment you found a way not to write.
Don't write any unnecessary comments. For example:
/* returns the full name */
String getFullName() { ... }
You could have guessed that getFullName would return the full name. Compare it with this one:
/* returns the first name, prefix if it has one, and the last name */
String getFullName() { ... }
Now you know the partners name is not implemented.
But of course sometimes comments can be a good thing. For example if you want to clarify a decision you made while you were writing the function. You can also use it for amplification. If one line is very important, you can tell why.
If you decide to write a comment line, think about the space it will take and the time to read it. Write a short, good comment, don't talk too much and think twice before you decide to make a more line comment. Take the next two examples:
/**
* argument needed
*/
Object argument;
Or:
/** argument needed */
Object argument;
If you have 5 members, the time to oversee them all takes a lot longer. For indention you can choose to use slashes instead of /* */ when you want to write more than one line of comment line.
Lots of problems can occur when you don't write good comment lines. Because it's just comment, your compiler will not give errors when you copy-paste an attribute and forget to change the Javadoc with it. Or writing a good piece of comment to describe the complete class purpose and having your colleague change the code. Misinterpreted functionality because of the used mumbling comment language, too much information and of course all the todo actions you placed and never fixed. Martin is very clear in his opinion of comments. Only use them if you really need to.
How to keep files clean and simple?
Programs grow, websites get new features and clients get more wishes. Therefore, maintaining files is very important in order to keep a high level of code reusability. Martin gives a good advise, the Boy Scout Rule: “Leave the campground cleaner than you found it.” If you come near a piece of code and something doesn't feel good, make it better. Don't leave the mess for your colleagues or for yourself the next time. Extract parts from classes and functions and make new ones to make it all a lot more simplistic. Keep small files. If a file becomes too big, it becomes unreadable. Again extract parts. Have a look at the circle / point example or the client / person example. It's very likely you can do something like this too with your classes.
When you change or alter code, the functionality it delivers at the end shouldn't change. Therefore you can write test scripts. When you're done, run the tests. All tests pass? Then it should be ok. The last assumption can only be made when the tests are well-written. All functionalities should be checked. If your tests don't cover all the functionalities, you could have created a bug without knowing it.
Let's assume you did test everything and a test fails, you know you made a mistake or the test is wrong or outdated. If a test fails, you want to know why as soon as possible. It doesn't help to make one test and check all your functionalities. You'll need to find the assert which didn't get what it expected and then you can start debugging. All your tests will run with one click, so why not make more tests. All with one assert. Now you know exactly what fails and after a possible fix you can run this test only to save time. If you wrote a test for a wrong functionality and therefore you had to fix a bug it could be a bug in the test. But make sure you check it twice before committing your changes.
There's even more. Subversion and Continuous Integration will encapsulate the problem of committing bugs. Subversion keeps track of all the project's commits. Every commit gets a version and is stored. Continuous Integration will run all available tests after every commit. If a test fails, the commit can be denied. You can even alter a code style checker to your project which will not allow you to commit too long files, functions or classes. You can have a style checker check lots of bad code like for example an empty void function, an unused variable, an unnecessary class cast, you name it. It takes more effort, but if you don't allow exceptions, your project won't have any and you will save more and more time and bugs.
Conclusion
The naming was something everyone could think of and actually I already used this naming before I knew of the name Robert Martin. But his ideas about the size of functions and classes and how you can reduce these classes found very interesting to read and I can really use the knowledge for my project. For me the Boy Scout Rule will not apply because I will not be maintaining the project in the future. But I can leave instructions for the Indians.
The book had more to offer than I described for this research. It's not only very interesting but also fun to read. “Uncle Bob” left a lot of humor in his very clear writing. I would advise every programmer to read it or to go to one of his lectures. He's truly inspiring.
Not for the first time of my education I found out that so much already changed in creating programs. First there was procedural oriented programming, soon there was object orientated programming. Almost all applications were desktop applications, for three years now I haven't touched any code of any program which didn't run on the server and had a web interface. I think it was one of the first projects we had to learn the thumb rule that your code should have as much lines of comment as the rest of your lines were together, nowadays you'll need to fix it with your code. What will be the next step? Will there really always be code?
Literature list
Robert C. Martin Clean Code, A Handbook of Agile Software Craftsmanship First edition, August 2008