by Conrad Weisert, April 10, 2006
© Information Disciplines, Inc., Chicago
NOTE: This article may be circulated freely as long as the copyright notice is included.
A couple of years ago Software Development Magazine featured a short case study illustrating the so-called "test first" approach to program development. The task was a simple one that appears in hundreds of textbooks on programming: Decompose a positive integer into its prime-number factors. The author showed that a correct program would emerge by successively
Starting with 2 he showed the reader how some rather trivial code that had little apparent relationship to a factoring algorithm would yield the correct result (a single prime factor: 2). He went on to code test cases for 3, 4, 5, 6, 7, and so on. Each time he ran a new test case, he would modify or extend the factoring function to get the right answer, so that, for example, 6 yielded two prime factors: 2, 3. |
Update, May, 2006This article generated some vigorous responses from readers holding a broad range of views. Several, even a couple who supported the concept, objected to the term "document first programming", on the grounds that it just sounds bureaucratic. Accordingly, I'm substituting another term that more exactly captures what we're trying to do in software development: Think-First ProgrammingIf you consider documentation a vital step in thinking, then that amounts to the same thing. If not, then I wish you luck. |
The problem was, that even if you believed that the routine worked, it was an atrocious implementation in almost every way. The process of writing the least code that made each test case yield the right answer produced a function that:
That sounds like the distaster it was, but the author meant it as a positive demonstration. Of course, one atrocious example doesn't invalidate a technique. This was test-first done badly, but thoughtful programmers might do it better. Here, however, we're going to try something different.
Once we know that we need to factor integers, we need to specify an exact Java interface. Before writing any code we think a few minutes and then write the following:
// Decompose integer into prime factors // ------------------------------------ // Upon return result[0] contains the number of factors and // result[1] . . . result[result[0]] contain the factors in ascending order. public static long[] primeFactors(long N)
Having decided upon the above interface, we now know:
// Decompose integer into prime factors // ------------------------------------ // Upon return result[0] contains the number of factors (0 if N is 0), and // result[1] . . . result[result[0]] contain the factors in ascending order. public static long[] primeFactors(long N) {long [] fctr = new long[64]; // Result array long n = Math.abs(N); // Guard against negative return fctr; }Now we know enough to code some test-driver cases if we want to, and if we're confident that we won't want to change the interface when we begin writing the algorithmic code.
Note that we now know:
Although we have in mind dividing by successive primes 2, 3, 5, 7, 11, 13, etc., we choose not to consume the space for a huge table. Instead we'll divide by 2, 3, and successive integers of the form 6N-1 and 6N+1. We'll start with the special cases:
// Decompose integer into prime factors // ------------------------------------ // Upon return result[0] contains the number of factors and // result[1] . . . result[result[0]] contain the factors in ascending order. public static long[] primeFactors(long N) {long [] fctr = new long[64]; // Result array long n = Math.abs(N); // Guard against negative int maximumFactor = 1 + (int) Math.sqrt(n); short fctrIndex = 0; if (n > 0) { // Guard against zero // First do special cases 2 and 3 while (n % 2 == 0) {fctr[++fctrIndex] = 2; n /= 2;} while (n % 3 == 0) {fctr[++fctrIndex] = 3; n /= 3;} } fctr[0] = fctrIndex; // Store number of factors return fctr; }We can now expect to get correct results from test cases for 0, 1, 2, 3, 6, 8, 9, 12, 16, 18, 24, etc., and their negatives. If we like, we can run tests for those cases now, and the results will confirm that we're on the right track.
Proceeding to the main body of the algorithm we insert the loop:
// Decompose integer into prime factors // ------------------------------------ // Upon return result[0] contains the number of factors and // result[1] . . . result[result[0]] contain the factors in ascending order. public static long[] primeFactors(long N) {long [] fctr = new long[64]; // Result array long n = Math.abs(N); // Guard against negative int maximumFactor = 1 + (int) Math.sqrt(n); short fctrIndex = 0; if (n > 0) { // Guard against zero // First do special cases 2 and 3 while (n % 2 == 0) {fctr[++fctrIndex] = 2; n /= 2;} while (n % 3 == 0) {fctr[++fctrIndex] = 3; n /= 3;} // Then every 6n-1 and 6n+1 for (int k = 5; k <= maximumFactor; k += 6) for (int dvsr = k; dvsr <= k+2; dvsr+=2) { while (n % dvsr == 0) {fctr[++fctrIndex] = dvsr; n /= dvsr;} } } fctr[0] = fctrIndex; // Store number of factors return fctr; }
That nested loop will try divisors 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, 37, 41, 43, etc. Is that all right? Most of those trial divisors are prime but a few of them, such as 25 and 35 are not. As the programmer, I know that's not only acceptable but desirable, because:
I know that, but what about the future maintenance programmer? Might he or she be surprised by division by 25 and waste time studying the effect of that action? Such situations call for a courtesy explanation to inform the reader-programmer that the action was intended. I might amend the block comment before the loop like this:
// Then every 6n-1 and 6n+1. NOTE: Some trial divisors will be // non-primes, e.g. 25, 35, 49, 55. They have no effect, however, // since their prime factors will have already been tried.
We might now proceed to run all the test cases. Some of them reveal a bug, however, which we must repair. The final factor may or may not have been stored in the result array. We need to insert the following line upon exit from the main loop:
if (n > 1) fctr[++fctrIndex] = n;
All our test cases, however we may have packaged them, now run correctly.
Alas, our feeling of satisfaction is short-lived. For a while we felt smugly superior to the Software Development author because we knew to quit when the divisor exceeded its maximum possible value, the square root of the original number. For large prime numbers that made a huge difference in performance. But as one perceptive reviewer pointed out, we can quit after the square root of the current quotient, which may be a lot smaller than the original argument.
So, finally (we hope), we get rid of
maximumFactor
, and change the terminating
condition in the controlling
for
statement accordingly. Naturally,
we update the block comment to reflect what the code actually does. Note that we
really need commentary to explain this, not just better data naming or clearer
constructs.
The intent is to compare the current divisor with the square root of the current
quotient, but we don't want to burden the function with actually taking square roots
inside the loop. The extra multiplication will make the function slightly slower for
large prime numbers, but eliminating those unnecessary iterations will greatly speed it
up in other cases.
You can see the final version in IntegerFunction.java.
Our document-first (or think-first) strategy yielded an implementation in which improvements like this last one were easy and straightforward. On the other hand, the test-first process that never revealed the algorithm is less hospitable to incremental improvement.
Our document-first strategy has produced a reasonable program module, while the published test-first version produced an atrocity. But we shouldn't draw from that any conclusions about the relative merit of the two approaches. In fact, they're not mutually exclusive. You can interleave writing documentation with coding test drivers to suit your preferred style.
Independent of the development sequence, you can expect to get a better program from a good programmer than from an incompetent one.
^{2} -- Given 1000000007 it tries over a billion divisors; the version we show here needs fewer than 16 thousand, and you can easily find faster implementations.
^{3} -- Among other sins, using static shared variables to communicate among functions.
Return to IDI home page
Main article
Last modified May 8, 2006