Why do OOP gurus reject objects?

Too Many String Data Items

by Conrad Weisert
July 21, 2010
© 2010 Information Disciplines, Inc.

This article may be circulated freely as long as the copyright notice is included.

Surprising examples

In the past week's worth of reading articles and textbooks I've encountered these source-code (Java, C#, etc.) data declarations:

    string  customerName;
    string  city;
    string  productCode;

What do those declarations mean? The first one specifies that a customerName can be a sequence of between 0 and 655351 positions, each of which is a letter, a digit, a punctuation sign, or a control character. That's probably not what the programmer intended.

The second example combines the same flaw with a misleading data name. The programmer obviously meant cityName. In an object-oriented programming world, a code reader would naturally expect city to be an object that models the properties (e.g. name, location, population, founding date) and behavior of a city.

Note that a CityName object would be a member of City (where it's data name would be just name), but could also be used independently, e.g. for a field on a mailing label. Presumably the City class would have a name()2 accessor function, which should return a CityName object, not a string.3

Rejecting objects

The surprising thing is that many of these examples come from sources that claim to support the object paradigm. OOP offers an obvious, simple, and type-safe way of representing such data items. We pointed this out seven years ago, but the flood of silly string data continues at an increasing rate.

Is avoiding objects an error or just a poor choice? Experienced programmers could have a long debate on that without agreeing.4 However, at best such code indicates a naïve beginner programmer and betrays a lack of understanding of what OOP is about.

So, when should we use string data?

String data items are appropriate for actual text. If you're developing a compiler, a query decoder, or a word processor, then your input consists of character strings. If you're generating a report or free-text messages, then your output consists of character strings. That's what character strings are for.

It starts with the data dictionary

In a mature software development organization, individual programmers wouldn't even be making these choices. Why should each computer application system have its own way of representing, say, names of people? A corporate data administrator or enlightened analyst, having recognized the need, would have proposed a standard. Upon appropriate review and approval that standard would have been disseminated through a corporate data dictionary or other standards repository. Then if the organization practiced object-oriented programming, the supporting class definition would be developed and placed in a central library.

From that point on, programmers wouldn't be tempted to declare a PersonName data item as a string, and doing so would be recognized by everyone as an error.

1—or some other huge implementation-defined size limit
2—or getName() for those who prefer verbs as function names.
3—I'm indebted to my colleague Nevin Liber for pointing out the need for this clarification, inserted August 20.
4—In grading students' work and in project code reviews I consider this a serious error that can seriously undermine software maintainability and reliability.

Return to table of contents
Return to technical articles

Last modified August 20, 2010