September 28, 2008
NOTE: This article may be reproduced and circulated freely, as long as the copyright credit is included.
I was recently surprised to discover that two students in an intermediate programming course lacked facility in understanding and manipulating binary integers, the internal data format of most computers. Those students were uncomfortable, therefore, with shifting, masking, and bitwise logical operations, as well as with octal and hexadecimal notations.
It's certainly possible for some computer programmers using certain high-level tools to avoid confronting binary data. SQL and XML are important character-based tools. Many of the popular scripting languages and spreadsheet processors require no knowledge of binary data. Some older programming languages such as COBOL dealt exclusively with characters and decimal numbers.
But the C family of programming languages—C, C++, Java, and C#—not only supports binary integers but relies heavily on them as the basic numeric data type. You cannot do any serious programming in those languages if you're uncomfortable with binary integers.
The same thing applies to the hardware. A few early computers, such as IBM's very popular 1400 minicomputer series, featured a kind of built-in decimal arithmetic. Those computers, however, were messy to program in comparison with a "clean" binary machine, and today the main memory and registers of just about every computer are based on 32-bit or 64-bit binary integers.
So, many computer users including some programmers can survive knowing little about binary data formats. Computer scientists and serious professional programmers, on the other hand, continue to deal with them routinely.
Since it's extremely awkward to read or write a long binary integer,
shorthand notations were needed. When popular computers had 36-bit
words, divisible by 3, octal (base 8) notation was an obvious choice.
It was easy to memorize the bit patterns that made up the octal digits
0-7, so that upon seeing
a programmer instantly visualized
000 111 111 100 100 010 101 101 110 000.
When 32-bit words and 8-bit bytes, divisible not by 3 but by 4, took over it was more natural to use hexadecimal (base 16) notation. That required 6 additional digit characters, and the unfortunate1 choice was the letters A-F. Instant visualizion wasn't quite so easy then, unless you made your living looking at memory dumps, but the notation was still more convenient than binary zeros and ones.
The C family of languages supports both octal and hexadecimal literal constants in source code. Every serious programmer needs to understand them.
A concise explanation of binary, octal, and hexadecimal is found in Wikipaedia under Binary_numeral_system.
Last modified July 3, 2012
Return to IDI home page
Professional Education articles