How Hard Can It Be to Read a File?
Reading code is good for you. Everyone knows this, but few actually follow the advice. It’s like healthy eating: Nobody would oppose it, but it is often ignored out of inertia. I’m certainly guilty of skipping my healthy dose of code reading in the past, but I’m going to make a habit out of it from now on. To support the process, I will write about the code I’m reading.
(c)iStockphoto.com/skodonnell
For the first exercise, I’m going to take a look at a simple bit of library code: reading a file into a string. I choose this example, because it turns out that reading a file with Java is more involved than I though it should be.
Coming from Ruby I’m used to being able to read the contents of a file into a string using one line of code like this:
str = File::read "bla.txt"
The simplest solution I found for Java is this
String str = "";
try {
BufferedReader in = new BufferedReader(new FileReader("bla.txt"));
String line;
while((line = in.readLine()) != null) {
str += line + "\n";
}
in.close();
} catch(IOException e) {
}
Let us ignore the missing error handling and the inefficiency of concatenating strings for the moment. The number of lines is not of interest either – this not some “Java sucks”-rant, there are more than enough of that already.
The difficulty of this trivial task illustrates a concept I have been thinking about regularly lately: the Radius of Comprehension which I wrote about here. It is a property of a codebase defined as follows
If you are looking at a given fragment of code, how far away from that bit of the code do you need to have in your mind at that time in order to understand the fragment at hand?
So how far from the code in our examples do we have to have in our heads? In Ruby, we need to understand one method in one class. (I’m not including String here, as I consider that to be part of fundamental understanding of the language.) However, to actually find the read method, you need to look into the IO class, the parent of File. The names of the classes and the method are pretty obvious, so you quickly find them when looking through the documentation.
I’m going to make a naive attempt at quantifying the radius of comprehension here. The Ruby solution tentatively gets a score of three: two classes and one method each with meaningful names. I’ll talk about the merits of this quantification at the end of the post, but first let’s get the comparison with Java.
In the Java version, we have two classes, BufferedReader and FileReader, and one method, readLine. Having to loop through the file and concatenate all the lines makes the code more verbose and thus harder to read, but I wouldn’t say it makes it harder to understand. A source of confusion in this code is the indirect relationship between the FileReader and the actual reading. FileReader is a self-explanatory name, but BufferedReader? Having one reader and passing it to another reader from which you can actually read strings in your program increases the radius of comprehension by more than an obvious inheritance. As a consequence I assign the Java code a radius four: one for the method, one for the FileReader, and two for BufferedReader.
Error Handling
There is another factor I want to take into account here: error handling. In both examples an exception could be thrown, and we need to understand what kind of exception that is in order to write robust code. So I would add one to both radius scores.
In the Java code there is an additional complication: an error could occur when creating the readers or in the call to readLine. In the latter case, we should to close the reader to avoid leaking resources. In Ruby we don’t have to worry about closing anything. Understanding this additional error case in Java adds to the radius. Thus, with error handling we end up with: Ruby: 4, Java: 6.
Again, the intention of this post is not to prove that Java stinks and Ruby is great. I’m looking at a very limited use case here and actively ignore scenarios where the added complexity in Java’s library might be useful. The point is understand more about the radius of comprehension.
What I like about this metric is that the difference between the examples is a factor of 1.5 while the more straight forward metric of lines of code differs by a factor of 10. This score feels about right. The Java code is more complicated, but not hugely so.
Why Quantify the Radius of Comprehension?
While I’m satisfied with the result of my calculation here, the ad-hoc way I pulled the number from my pants is not where near scientific or generally useful. Maybe I can improve and formalize the process while reading more code. There is, however, another deeper question: Is it even worthwhile to try and formalize the radius of comprehension?
Mike Taylor, who invented the term, writes
I’m talking about a human issue here (and therefore, sadly, an all but impossible one to measure, though we know it when we see it)
While we may not be able to measure it given the vagueness of the definition, it might be possible to approximate it. Like any metric about code quality, it brings the danger of creating an arbitrary goal that is pursued at the expense of others, with a negative overall effect on quality. But I also see potential in quantifying the radius of comprehension.
Code always has two audiences: the compiler or interpreter and other programmers who have to use or maintain it. While we can reliably and immediately assess how well the machine understands our code, feedback on how well others understand it is rare and fuzzy. If we had a way to determine the radius of comprehension, we could judge how well other programmers can understand what we wrote – and take steps to improve it.
