Fad methodologies undermining rigor . . .

Implicit Data Dictionaries are Dangerous!

by Conrad Weisert
September 8, 2005
© 2005 Information Disciplines, Inc.


What is a Data Dictionary?

Elsewhere on this web site, I've explained why rigorous data definitions are an indispensable component of a system specification (or detailed requirements document). The set of such definitions is the data dictionary of the application system. Some organizations have consolidated the data dictionaries of all their application systems into a global or corporate data dictionary in order to reduce redundancy and increase consistency.

The data dictionary was a component of structured systems analysis as codified by Tom DeMarco1 and others a quarter-century ago. But DeMarco got it only half right.

A data dictionary might be maintained either as pure documentation or by a computer-based data dictionary processor. Some data dictionary processors were integrated into a broader C.A.S.E. (computer aided software engineering) tool, many of which were tied to specific methodologies. Some were combined with a data directory tool that kept track of where data items were referred to or created.

Does anyone really know what "everyone knows"?

Sometimes impatient managers, analysts, or users would complain that defining data items, especially elementary ones, was a waste of time because "everyone knows what ___ is". Once in a while they'd be right; the name of the item was so self-descriptive that no one could misunderstand it. Most of the time, however, they'd be wrong.

DeMarco himself fell into that trap:
"You could define Age further, specifying it to be precisely two iterations of any digit,2 but that would be a pointless exercise during analysis. Everyone knows what Age is, so there is no value in continuing." - p. 139

Sorry, I don't know what Age means in the context of a particular application. Is 59 ½ a legitimate value? When does the age in years change for a person born February 29? Is the age of a car figured from its model year, its manufacturing date, or its delivery date? The answers may have a big impact on insurance or taxes.

We know from long experience that inconsistent interpretations of data items by sponsoring end users and by developers account for a lot of the late-stage reworking (unplanned "refactoring") that causes schedule slippage, cost overrun, and user disappointment.

What happened to data dictionaries?

Those lessons had been fairly well learned a decade ago, and for a while rigorous data definition was taken for granted in mainstream courses, proprietary methodologies, and a few textbooks. But then a couple of fad methodologies put an abrupt end to that, and the lessons of the past are mostly forgotten by today's practitioners.

The demise of the formal data dictionary came as a result of a growing distaste for most forms of user-requirements documentation. On more and more large projects today we find data definitions, if they exist at all, embedded within use cases (UML) or user stories (XP) or both, process-driven rather than data-driven documentation. Defenders of this practice claim that the data items are sufficiently defined for both the sponsoring end user representatives and the developers. When this approach is deliberately chosen, we call it an implicit data dictionary.

The embedding is subtle:

I've encountered these phenomena surprisingly often in projects driven by the most dogmatic UML- or XP-zealots.

Not mutually exclusive

When I encounter such a project and sense that it's in serious trouble, I try to persuade the project team to establish a data dictionary independent of their other methodology choices. You don't have to give up anything. You can still have use-cases along with a data dictionary. You can still have user stories along with a data dictionary. In fact, both of them, as well as report specifications and other documentation, will become a lot simpler once we replace the embedded data information by simple references.

When they take that advice, project teams almost always discover inconsistencies or ambiguities that might otherwise have gone undetected until late in the project.


1 -- Tom DeMarco, Structured Analysis and System Specification, 1978, Yourdon Press, ISBN 0-917072-07-3.

2 -- DeMarco's only mistake wasn't to assert that Age is self descriptive. By discussing decimal digits, he also echoed the common error of confusing what a data item looks like (COBOL PICTURE) with what it is. (Note that Age is a derived and perishable data item; we can define it and use it in business rules, report specifications, etc., but never store it in a data base or permanent file.)

Return to table of contents
technical articles
systems analysis topics

Last modified September 10, 2005