Monday 30 October 2017


Don Knuth [prevadvises young people to follow their own passion and be really wary of trendy and faddy scientific fashion. There was a famous set-to in 1986 between Doug McIlroy and Don Knuth. In parallel they were given a text file and an integer k, and asked write a program to print the k most common words in the file and the number of occurrences, in decreasing frequency.
Knuth wrote ten pages of fully commented Pascal 'literate programming' using an associative data structure and hashing, which on close scrutiny had edge-cases that would cause a crash and some other (trivial) bugs.
McIlroy wrote an effective, if not notably efficient, six lines of shell script -
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1} q
Here's what that telegraphic code means (about 2/3 way through the piece).  The same source has a final slap at The Master "Knuth has shown us here how to program intelligibly, but not wisely. I buy the discipline. I do not buy the result. He has fashioned a sort of industrial-strength Faberg√© egg—intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start." Ouch! Really what McIlroy is saying is "not fit for purpose" in a similar way to my reflections on write far too long an explanation for an every day matter. But make no mistake Knuth is part of the Pantheon of computing like Dennis "dmr" Ritchie and Grace "Cobol" Hopper

My programming was always a kludge, I never believed that the elegant telegraphic code such as McIlroy uses above would work. I couldn't ever get my head around associative arrays: my mind just didn't work that way. So my code was lumbering and inefficient but at least I put comment lines in so that normal people could follow my logic even as they snickered at the inefficient loopy way in which I achieved an end.  Almost all my programs included GOTO statements, which actually caused physical pain to the real coders I used to work with. For the sort of tasks I was engaged upon - analysing the 25,000 protein-coding genes of the human genome in, say, 2002 - the inefficiencies didn't matter much. My program might deliver results in 15 minutes rather than 10. It showed when doing all-against-all high-throughput analyses which took 30 hours to run. I was out of  my depth then and went off to walk in Spain.

But this isn't about me, it is about Donald E. Knuth . . . from the horse's mouth:
  • The whole thing that makes a mathematician’s life worthwhile is that he gets the grudging admiration of three or four colleagues.
  • Science is what we understand well enough to explain to a computer. Art is everything else we do.
  • Beware of bugs in the above code; I have only proved it correct, not tried it.

No comments:

Post a Comment