Friday 19 January 2024

'tis an ICL wind

There's a hoot-nanny scandal fanning up across the water over the last tuthree weeks. ~25 years ago, the UK Post Office procured software to reduce the paper mountain of audit trail and help out the thousands of sub-posties who were selling stamps and doling out pensions across the country. It's big business, the pensions alone turn over £2billion a week. When a new proprietor takes over the franchise they are given a float of £30K about enough for the first 150 pensions. All the cash is one reason why hoods hold up sub-Post Offices. Obvs, much of that throughput is now done by bank transfer, but that was not at all the norm back when the need for accounting software was identified and the contract put out to tender. The winner, with the lowest bid, was an established British >!huzzah!< company called ICL - International Computers Limited.

With 20/20 hindsight, it is clear that ICL over-promised and under-delivered, there was also mission-creep from the client side, bluster from ICL sales and lack of vision from ICL IT Centraal. Doubtless there were competent programmers and engineers in the mix but ICL delivered an evolving kludge: any unexpected results [lots] were patched up until the next anomaly or crash got people off the golf-courses and at their terminals on the weekend. ICL was eventually on its knees and bought by Japanese MegaCorp Fujitsu. The narrative now is about soaking Fujitsu [and good luck with that! - check the fine print in the original contract] for the compo which the unfortunate sub-posties deserve. But the fault ultimately lies at the door of the management of ICL back in the mid-1990s.

Fastbackwards to the mid 1980s . . . 

In 2019, Trinity College Dublin, my old alma, noted the 50th Anniversary of it's IT laboratory. They pushed out an obsessive detail PDF to mark the occasion. It didn't make enough waves for them to invite the likes of me back for sherry-and-canapés although I had quite close ties with the IT Effectives in the mid-90s of the last century. With one of these self-taught gurus, I dramatically busted a young chap who'd hacked into the computer for which I was responsible. We were all self-taught in the early days.

In the mid/late 1980s, ten years before the Post Office procurement fiasco, a committee of senior TCD people was assembled to spec out a new mainframe computer to serve the academics and administrators of the college for the next several years. This committee got their ToDo list in order and put the contract out to tender. They had £1 million to spend because that's what it cost back then. The winner of that process was . . . ICL.

In 1990, I inherited my seat at the IT table from Des Higgins [prev on whales], who had left the country to write software and crunch data at the EMBL computer lab in Heidelberg. Des is an excellent field biologist, who taught himself how to code during his PhD on spiders. GenBank, the database of DNA sequences was launched in 1982 and was doubling in size about every 15 months. In 1987 a research-active TCD Lecturer called Paul Sharp hired Des to compare these sequences in order to get a handle on the pattern and process of evolution. Sequence alignment is famously an example of NP-completeness, which is a class of mathematical problems that can be verified easily - in finite time with a suitable computer - but cannot be easily solved. Solving is a polynomial problem. Pairwise aligning 2 sequences is trivial: you can make a good fist of it with a word-processor but it does not scale. Time taken for a multiple sequence alignment MSA ~= lenN where len is the length of the sequences and N is the number of sequences that need to be included.

Problem? If two sequences takes 1m 40s to complete then len2 = 100 seconds but ten sequences takes len10 seconds which is about 300 years. Clearly, any non-trivial MSA is going to require some hefty compute-chops or a hella long time. Fortunately for the future of bioinformatics, in 1970 Needleman and Wunsch had conjured up a fix for this - effectively reducing any MSA problem to a series of pairwise [ie doable] alignments. It still required compute power but not 300 years of clank-and-whirr. Higgins and Sharp were delighted to push their frontiers with TCD's new high-spec ICL mainframe . . . except that it was often 'down'. It just couldn't be relied on to be working when they needed to run data through it.

Having been turned away by the IT people agane, Des retired to Kennedy's Bar on Westland Row and thought about the problem. The war-story is that he wrote the solution on a beer mat in the back room at Kennedy's but that's a metaphor. What he actually did was break down the MSA problem into 4 chunks, each of which was small enough to be solved by a desktop PC. Coming up with a clever name? Now that could be done on a beer-mat. They named Des's working kludge [cf non-working ICL kludge] Clustal for Clustered Alignment.

To solve a MSA problem you ran the data sequentially through Clustal I then Clustal II then Clustal III then Clustal IV. When I came on the scene a tuthree years later, Des had clagged the four modules into all singing all dancing ClustalV. Later, waggishly treating V [roman 5] as an alphabetical letter, the next version included weighting and was named V . . W  = ClustalW then . . X ClustalX with an X-graphics interface. The current [and probably last] version is Clustal-omega [as in alpha and omega = beginning and end]. It won't have escaped your classical notice that reads clustalω although it is usually written as ClustalΩ. Actually-actually it is usually written clustalo b/c QWERTY keyboard. [probably the last] because Des retired, a bit early, a couple of years ago and there are a gazillion other options for doing MSA: some of which were on Des' watch [T-COFFEE with Cedric Notredame] and others not [MUSCLE, MAFFT, Kalign].

But for many in the field Clustal is Original and Best. Because it was written for personal computers rather than implemented on a  million-dollar mainframe it became the tool of choice for Everyman. The several Higgins et al. Clustal papers have been collectively cited an eye-watering number of times. If ICL had delivered a computer that was reliable and fit-for-purpose, none of this would have happened.

No comments:

Post a Comment