Clarity in Software

by coatta 5/24/2009 9:21:00 AM

I've been wondering about what makes good code recently, or to phrase it another way, what are the basic principles that you can follow that will generally keep you pointed in the right direction. I've come up with two that really stand out for me: correctness and clarity. The former is obvious, but can be turned around into something more useful -- that is, you can't code something until you've defined what it means for it to be correct. So there is a need for well-specified requirements and even perhaps a motivation for techniques such a test-driven development (TDD). I can't say that I've gotten to the point of doign TDD myself, but I can't imagine working on a project without a solid set of unit tests and the continuous integration tools to ensure that they are run whenever the code changes.

Clarity, on the other hand, is not so easy to draw a box around. Like all issues of communication between human beings there is no scientifically proven right way. But that should not be an excuse to ignore the issue. In fact, as time goes on we're seeing that software is not really emphemeral -- that is, much software continues to be used and modified over long periods of time. Because of this clarity is increasingly important because it governs how difficult it is to maintain and modify a given piece of software (see https://www.softwaretechnews.com/stn_view.php?stn_id=47&article_id=118).

So how does one achieve clarity in software? As I mentioned above, this is really an issue of communication. In this case, does the source code effectively communicate its structure and purpose? I don't believe in the idea of self-documenting code. There are always design issues and requirements related information that needs to be communicated via prose -- although that can simply mean as comments within the code. However, the code itself can either contribute to one's understanding of the software or, in the worst case, it can detract.

I think that there are lessons to be gleaned from best practices in technical writing. For example, in writing that is expository in nature, the rule of thumb is that one should begin with an overview and then proceed to elucidate the points identified there. To me this is very much akin to the idea of starting at a high level of abstraction and then working down through successively more detailed layers. But what do I often see when I read code? Something that starts with private member declarations and helper routines and works its way up to the more abstract operations provided. This is completely backwards from the point of view of good communication. Now I realize that back in the bad old days, compilers couldn't handle forward declarations so this inversion was necessary, but there's really no excuse for it now.

Even worse than this, are source files in which there is no discernible organization at all. Public and private declarations are interspersed and there is no rhyme nor reason to the proximity of various elements of the code. My own theory is that this is often the result of development environments which provide sophisticated tools for navigating within source code. When you can immdiately get to a definition by choosing it from a drop-down, or can find the something with a mouse click or two, the actual location of those entities within the source code starts to become irrelevant. But to someone who is reading the code for the first time, the lack of structure is aking to reading an article in which the order of the sentences had been randomized.

Another style which I find takes away from clarity is statements that compose a large nubmer of actions together. This is great if you'd like to win the Marcel Proust Award for Turgid Code, but looking again to best practices in technical writing we find that the ideal is normally short concise sentences that convey essentially one idea. Brevity is not the utlimate goal because at some point increased brevity impedes communication.

There are many other aspects to writing code that is easy to understand. In fact, there is quite a history for literate programming that you can look into (see http://www.literateprogramming.com/). But the main point that I want to make is that programming is more than just creating functionality. Its also about communicating to others and that requires a different kind of thinking about the code that we write.    

ORM's and Locking (Part 2)

by coatta 2/3/2009 9:15:00 PM

Back in January I wrote about some of my initial experiences with ORM's and deadlocks. By explicitly marking each transaction as being either read or write we were able to solve a large number of deadlocks that cropped up in our initial testing. In fact, that tactic was good enough to take the system live. Even with people accessing the system on a regular basis we did not see any deadlocks. This lulled me into a false sense of security that I wouldn't have to think about deadlocks any more. In hindsight I can't imagine why I thought this -- self-delusion rears its ugly head again.

Anyhow, after a while we got around to stress testing the system; pushing one or two orders of magnitude more data through the system than was typical. Once again the deadlocks started cropping up. To make this part of the story short, it finally occurred to me that all the usual sorts of deadlocks that I had grown to love when programming directly with locks were just as possible in an ORM-based system with the locking done by the database. Locks acquired in the wrong order, read locks acquired when write locks should have been, etc.

But even though the issues are fundamentally the same, the experience is quite different. There are a number of reasons for this:

  1. The database has much nicer tools for dealing with deadlocks. In the first place, it doesn't actually deadlock, it notices that a deadlock exists and aborts one of the transactions. Second, it has very nice tools that identify the exact cause of the deadlock (at least SQL Server does, I assume other DB's do too) - that is, it will identify the resources in conflict, the cycle of resources held/requested that created the deadlock, and the statements associated with the resource requests.
  2. To compensate for this somewhat, the connection between your code and the underlying SQL is indirect. The ORM is responsible for generating the SQL statements that the DB executes, and its not always obvious which SQL statements are associated with particular operations in the code. Not only does one have to have a good grasp of the underlying mapping of objects to the DB, its also necessary to understand how the ORM moves data back and forth between the objects in memory and the DB.
  3. An extension of this last point is that the ORM can make it hard to resolve some locking issues. For example, the particular version of NHibernate that we are using always uses read locks when it is pulling in collection data. This is done implicitly as part of loading an object and there isn't a way to force it to use an update lock which is necessary if elements of the collection are written to later in the transaction. Similarly, if you're working directly with locks a fairly common technique to decrease contention is to release locks before the end of a transaction when that doesn't compromise serializability -- a technique that may not be available to you with your ORM.

Overall, I find the experience with the ORM to be positive in the balance. Its certainly made it easier for us to program without thinking explicitly about locking most of the time. But as with any layer of abstraction, it tends to complicate things when things get complicated :-)


Calendar

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

View posts in large calendar

Recent posts

Disclaimer

My opinions are my own, but you can borrow them if you like.

© Copyright 2010

Sign in