A recent Hacker News discussion about source code comments has grown into a debate about whether you need them or not. Apparently it is a contentious issue. The comment that started it all included “Comments are for the weak” and these 5 words incited a hefty discussion and this post. Some people argued code comments are a bad code smell and others said that comments are essential for people to understand code. My theory is that this divide is between people using low level languages like Assembly, C, C++ and to some degree Java and people using high level languages like Python and Ruby.
The biggest reason I think that is that low level languages need a lot more lines to do the same thing and those lines are harder to understand. A simple but poignant example is opening a file and reading its content:
C code (taken from here):
char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");
if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}
Python:
buffer = open(filename).read()
The python example is one line long and uses descriptive names for the actions – open and read – and doesn’t bother you with implementation details of allocating memory for the data. The C example is about 12 lines long and exposes a lot of implementation details both about memory and about how file systems work (seek etc.). Both methods have their uses and advantages but one thing for sure – anyone not familiar with C will have a hard time understanding the C code and even non-programmers can understand roughly what the python code does.
One commenter specifically caught my attention giving an example of “readable” C code from the Unix source code. Here is the second function from the source file:
/*
* Wake up all processes sleeping on chan.
*/
wakeup(chan)
{
register struct proc *p;
register c, i;
c = chan;
p = &proc[0];
i = NPROC;
do {
if(p->p_wchan == c) {
setrun(p);
}
p++;
} while(--i);
}
This is part of one the most influential operating systems written, but at least in my book this code wouldn’t pass code review. It will fail because:
- using one letter variable names is bad – this is the biggest offender by far.
- wakeup is a really general name for such a specific function. wakeup_on_channel is better.
- Abbreviating “Number” to N in NPROC.
- Not declaring input variables type (TiL that the default is int…)
This brings me to the second contributing point to my theory – writing something hard to read will be strongly discouraged by some communities more then others. High level languages are written with the axiom that code must be easy to understand. It’s even in their name – high means farther from machine code and closer to humans while low-level means closer to machines and their language.
Looking back at the HN discussion, you can make some good educated guesses about who in that thread is a high level programmer and who works closer to the metal. These two groups might not get each other’s context and so this discussions goes round and round. Both groups need to acknowledge that languages like C will need more comments and documentation to be understandable and that while commenting is good it might be a strong code smell that you need to refactor in a higher level language. I usually use this Python idiom – if you feel the need to comment something, make it into a function and write a docstring.