Misguided advice from some "experts" . . .

Avoid Redundant Data Validation

Conrad Weisert, 1 July 2011
©2011, Information Disciplines, Inc.

NOTE: This article may be reproduced and circulated freely, as long as the copyright credit is included.


Good or Bad advice?

Every function or subroutine should validate its arguments. For example:

  • If a function takes a Date parameter, it should check that the date is legal, i.e. that the month is between 1 and 12, and so on.
  • If a function takes a MailingAddress parameter, it should check that the U.S. ZIP code (or foreign equivalent) is properly formatted, that the City is not blank, etc.

If the validation fails and if the programming language supports exceptions (or assertions) the program should then throw (or assert) an out-of-range or illegal-argument condition.

If we don't do that, we risk hard-to-diagnose errors, even loss of control, downstream in the processing, as program logic tries to process the illegal data.

I heard that old quasi programming standard again last week in a presentation by an alleged "expert". The mostly student audience nodded appreciatively.

Such a practice is, of course, not only painfully redundant and inefficient, but often also harmful to the integrity of a program design.

Object-oriented safety

One of OOP's valuable contributions to program simplicity is the ability to localize validity checking (input editing) to the constructors. Once an object has been created, we should be able to depend upon its value being legal. For example, no matter what the internal representation of a Date object is, it should be impossible for any Date constructor (or any other method in the Date class) to leave a Date object in an illegal state. Therefore, a function that receives a Date parameter, knows that it's a legal date, and needn't bother doing a costly1 re-validation.

The source of the above quasi-standard confused two very different things:

  1. A Date parameter to a function, and
  2. three integer <year, month, day> parameters to a function.

But the only function that should take the latter is a constructor for the Date class, which, of course, must do all needed validation.

Unfortunately, a program can undermine that safety by deliberately creating uninitialized or incomplete objects. Extremely long constructor parameter lists sometimes tempt us to create an empty object and then fill in the fields one-by-one with set (write accessor or mutator) functions like this:

    USMailingAddress addr();
    addr.setCity("Chicago");       //  Don't 
    addr.setZip("60606");          //    do 
    addr.setStreet("Suite 513",    //      this!
                   "120 East Wacker Drive");

That technique risks leaving the object in an inconsistent, incomplete, or illegal state. What if setZip throws an exception?

Worse, the practice is often used to justify the inclusion of public set functions that serve no other sensible purpose and invite violations of data integrity. Would a customer move to the same street address in a different city? Hardly. Contrary to some textbooks, write accessors are rarely needed in a class definition. A constructor is (or should be) an atomic function; if it gets inconsistent parameters or if it can't complete its work, then no object is created.

Non-object equivalent

We don't need OOP to enjoy the safety described above. For decades before we knew about OOP good programmers had been cleanly separating external data representation from internal. When data originates in the outside world (including keyboard entry) an input editing function:

  1. validates its legality, and
  2. converts it to internal representation

From that point on, the value may or may not be correct, but we can be sure that it's legal and internally consistent. Unlike genuine OOP, the internal representation is not hidden from the rest of the program, but disciplined practice can provide equivalent simplicity and safety.

A program should never pass unedited data from an external source to a processing function.

Needed validations

It's possible, of course, that a parameter may have a perfectly legal value that is nevertheless out of the range the function can handle. Then context validation (usually of the whole object) is appropriate and desirable. For example:2

   assert(startDate <= endDate);
    
   if (balance > creditLimit) . . . 
But in all cases we should still be able to assume that each object conforms to its specifications.

Special situations?

Computer programming is a vast open-ended discipline that continually encounters new situations. If you've encountered situations where either:

  1. revalidation of an object, or
  2. member-by-member initialization by set functions
is necessary or advisable, let me know3 (cweisert@acm.org), and I'll amend this article accordingly.
1—Depending on the internal (private) representation, the cost of revalidation may be far higher than the cost of the original validation. Considerable computation may be required for a month() accessor, for example, to extract the month number from a serial number date.
2—The example, using relational operators, is valid C++ or C#. The Java equivalent would use named functions.
3—or send me your rebuttal article to post on this web site.

Last modified 1 July 2011

Return to IDI home page
Technical articles
Methodology material