Self Documenting Source Code Unrealized
from COBOL to javadoc®
© Conrad Weisert, Information Disciplines, Inc., 12 May, 2000

Updated July, 2011


COBOL's claim and failure

In 1960 COBOL enthusiasts were proclaiming that the new "English like business-oriented" language would yield programs that were "self documenting". By 1968 we never heard that claim any more, except in ridicule. Millions of lines of COBOL source code were viewed as impenetrable nightmares beyond human comprehension. The structured revolution of the 1970's was welcomed by desperate programming managers largely because of the notorious COBOL maintenance problems.

The COBOL zealots had targeted the wrong audiences. Early articles described how a non-technical manager or an end user could easily read a COBOL program in order to understand what it did and to verify that it satisfied requirements. Of course, managers and end users rarely had any interest in reading program source code, but even when they tried they found COBOL's syntax of little help.

While it's true that some of COBOL's PROCEDURE DIVISION statements convey their intent to the uninitiated, e.g.

    ADD 1 TO ITEM-COUNTER
others are quite cryptic1
    UNSTRING FASTKEY-RECORD-INPUT-AREA
       DELIMITED BY ALL '/' OR ALL '*' OR ALL ' '
       INTO OUTPUT-1 OUTPUT-2 OUTPUT-3 OUTPUT-4.
And even if the manager gets past those out-of-context statements, what was he or she supposed to make of whole pages of them? Some COBOL statements actually badly mislead the reader, e.g. the notorious always true condition:
    IF TRANSACTION-CODE IS NOT EQUAL TO 'P' OR 'S'  . . .
or this nonintuitive way of modifying QUANTITY-ORDERED:
    MULTIPLY PRICE BY QUANTITY-ORDERED.
Needless to say, programming language designers no longer consider readability by managers and end users an objective.

Two audiences for program documentation

The main audience for source code is the maintenance programmer, sometimes the same person as the original developer at a later time. Since we assume that this audience has a command of the programming language, issues of language syntax matter less to code readability than issues of quality and style: program structure, choice of names, layout on the page, use of commentary, etc. A well-organized highly readable source program in any language can be a work of art.

A second audience for information about a program is the potential component user, another programmer who is thinking of incorporating the component into a larger program. Traditionally, we didn't expect such people to read through (or even have access to) source code. Instead we prepared a separate document, the component write up or usage documentation, for that audience. Today's object-oriented and event-driven programming paradigms are encouraging more component reuse, which in turn is focusing attention on the creation, maintenance, and usability of the component usage documentation.

Trying to serve both audiences: javadoc

To meet the demand for high-quality, consistent component usage documentation, many in the Java community favor an approach in which:

That's an intriguing idea. If it works then:

Alas, it doesn't work well, for reasons explained below.

What does javadoc do?

If you feed javadoc a java source code file containing no comments, it will generate HTML entries for everything public2 in the file, i.e. the principal class and within that class definition:

Of course, such information has limited usefulness without any further explanation. If you choose your method parameter names with care, the experienced and perceptive reader may be able to figure out how to use parts of your component, but more is needed, in the form of:

Note that javadoc doesn't recognize such comments just anywhere in your source code. You have to place them immediately in front of one of the public items listed above. It follows that such a comment should pertain only to that one public item. Commentary of a more general nature will only confuse the reader, because it will appear in the special section for a particular member.

Unfortunately, all those tags and highly structured comments tend to impair the readability of the source code for the primary maintenance programmer audience. Defenders claim that we'll get used to it, and that an experienced Java maintenance programmer will soon be able to read the source code as easily as if it were written for his or her eyes. I have yet to see that claim validated and I remain skeptical of it.

Usability of javadoc output

HTML intrusion

Since javadoc puts your comments into HTML, programmers soon discovered that they could improve the usage documentation by embedding HTML tags, such as <p> in their javadoc comments. But such codes further impair readability for the maintenance programmer looking at the original source code, and we don't expect every java programmer to be an HTML expert.

Worse, they soon discovered that they have to do so! Since certain symbols common in usage documentation (<, >, &) are reserved in HTML, you have to code special HTML character symbols to produce them.

Furthermore HTML is a relatively unstable "language", often deprecating tags in new versions. We can't have the validity of our java program documentation depend upon the whims of the HTML community.

Putting up with all that work and inflexibility could be justified if the end result were a first rate usage document. Sorry. Here, again javadoc falls well short of what a competent writer could produce with comparable time and effort.

First, the web-based documents consume far too much screen space and (assuming we print them) paper. In an attempt to achieve a distinctive "look and feel" javadoc generates HTML that your browser turns into huge fonts and lots of white (and sometimes blue) space. Unnecessary and confusing repetition, such as a "Constructor summary" section and later a "Constructor detail" would never occur in a well-organized write up that you'd prepare manually.

Second, the rigid section structure makes it hard for the author to write or the reader to find information of a general or tutorial nature.

Of course, you can edit the generated HTML3, but what's the point? You could have written it more simply and more maintainably in the first place.

What about the uniformity argument?

All right, javadoc partisans concede, but doesn't the consistency of usage documentation outweigh those disadvantages? Shouldn't a programmer browsing through Java class libraries all over the world get information in a form he or she is accustomed to and can quickly grasp?

Not necessarily. Since javadoc is a tool for documenting only Java class files, it's useless for documenting your organization's C++, IDL, PL/I, or BASIC components. For a very small programming staff using only Java worldwide javadoc uniformity may outweigh compatibility with non-Java component documentation. The rest of us are reluctant to divide our world into the Java and non-Java realms.

Conclusion

Mixing usage documentation and maintenance documentation in the same source code file will probably never be practical. Relying on javadoc hurts the quality of both kinds of documentation, and drives a wedge between Java components and others.

IDI continues to teach and to practice the highest level of source-code readability aimed at the maintenance programmer. We use a traditional write-up format, exemplified by our freeware components. Unless javadoc is significantly improved and simplified and able to handle non-java components, we're sticking with tradition.


1 -- From James Janossy: COBOL: a Software Engineering Introduction, 1989, Dryden Press, ISBN 0-03-029564-5, p. A10

2 -- You can also include protected items, thereby creating documentation for users who may wish to derive another class from yours.

3 -- In the few samples I've examined the generated HTML is itself far more voluminous than the original source code and full of repeated constants and formatting details. I wouldn't want to change it manually.

Last modified July, 2011

Return to technical articles
IDI home page.