Wednesday, 28 February 2018

Trickle down

Doctors are supposed to sign up to an old saw - Primum non nocere = above all do no harm. I'd like to hope that when I rock up for my final admission to hospital, they'll go easy on the interventions and Do Nothing. But until then, what is a good life? I don't think that doing nothing rather than risk doing harm is the answer. The parable of the talents in the bible tells us not to sit on our assets but get them out into circulation where they can do some good. But if we buy into the doing good schtick surely it's sensible to go with William MacAskill's book Doing Good Better: Effective Altruism and a Radical New Way to Make a Difference and do more good rather than minimal-good. The current justification for giving massive tax-breaks to the super-rich is that they can leverage the money more efficiently. If policy is the alternative of spreading the largesse more thinly then, the argument goes, there isn't enough extra to be noticeable. The benefit accrued by the Rich is supposed to trickle down to the rest of us in a rising tide that will float all boats. Lefty economists deny that the model works, maintaining that it just exaggerates to disparity between rich and poor.

On a few occasions in my life, a peculiar combination of circumstances has seen me masquerading as the first against the wall when the revolution comes The Rich. In 2001, an old friend of ours booked The Beloved and me a long weekend in La Frateria di Padre Eligio just outside of Cetona in Southern Tuscany. The Frateria had a rather peculiar business model. They charged a fabulous amount of money to stay in a 13th monastic site. The food local, authentic and perfectly cooked. You could get a decent bottle of local wine but you could also order up some seriously expensive Brunello di Montalcino. The staff at the hostelry were almost all recovered drug-addicts from the inner city ghettos of Italy.  They worked in an intentional community that was supported most directly by privileged people getting away from their busy business lives and buying some tranquility.

The site was perched near the top of a ridge of hills, with a chapel built by St Francis himself at the top of the complex - nearer to god, I guess. The lodgings a bit further down and a cascade of terraces lurching down to a small stream and a road at the bottom of the property. We had a breakfast each day in a room on the top storey with a view across a wide sweep of a valley to a range of blue hills in the distance. This view was framed and part hidden by floor length white linen curtains billowing in a cooling breeze. Coffee, fruit, bread and pastries. It would have been a gross impropriety to think about an Ulster Fry. If we wanted exercise, we could walk away from the gardens through a hanger of open woodland to visit a metal-working shop where some of the lads worked making sculptures and ornamental gates. There was a good choice of places to recover from the walk or just wait for lunch: lawns and loggias; metal-work benches (bring your own cushion); or low stone walls between pots of geraniums or thyme or rosemary; sun or shade. The gravel paths were always being swept or raked: it seemed to be part of The Practice: clear the path to see your way forward - because looking back to the city was not generally very happy in retrospect.

One evening we dressed up for the 7 course taster menu in the dining room. It was an amazing meal: a succession of bonnes bouches each so different, some challenging (roast quails like micro-chickens); each yummy; each complementing the previous course. We put ourselves in the hands of the sommelier who brought us glasses of this and that to suit the food. The only other people in the room was a middle-aged suit entertaining two significantly younger women. He, of course, wanted an expensive wine; it was brought; the label was inspected and approved; the bottle was taken away and opened; a jigger was poured; the man made a big fuss and then nodded decisively; three glasses were poured; everyone calmed down to eat. Then huge fuss, our friend the sommelier returned with elaborate and profuse apologies: the wine was corked, he'd thought it was okay earlier but now he was having second thoughts. He whisked away the three glasses of decisively approved wine and replaced them with new. I couldn't help but think that the whole charade was set up to humiliate the rich man.

As often on The Blob this story has come rushing back at me like a train full of madelaines because of something more recent that has caught my attention. This was another intentional community in Cleveland Ohio. On a corner of Shaker Square is a French restaurant called Edwins where almost all the workers in the kitchen and front of house have done time banged up in chokey and are now hard at work making a new life for themselves. It was the brain child of Brandon Chrostowski, himself a retired con. Since the place was set up in 2014, more than 200 ex-offenders have been through a gruelling training program to learn how to prep and serve haute cuisine so you can't see the join. The recidivism rate is about 1% - compared to about 40% for other rehabilitation schemes across the USA. They've made a movie called Knife Skills which you can watch if you are Stateside. The rest of us can make do with the trailer. Here's Chrostowski setting out his stall in 2013.

You know what they say "Give a man a fish and he'll eat for a day; teach a man to fish and you can get rid of him for the entire weekend."

Tuesday, 27 February 2018

PAM the matrix

Goddammit.  It would be a challenge to summarise the life and contributions of Margaret Dayoff in a haiku or a tweet-length trib. When she sailed over the horizon of my attention two weeks ago, I resolved to write a regular 500-1000 word Blob for my Women in Science series. But in my first biog attempts I realised that I needed a lot of background to give context to her achievements and suddenly I'd written 600 words on sequence alignment and hadn't mentioned Dayhoff. So I hived that off, started again and wrote 1,000 words on her invention of the one letter code for proteins. Dang! and I'd only started scraping the surface.
Here [L, the chap with the longest hair] she is holding a hank of paper tape to show that she could handle data; and lots of it. Back in those days (1960s) input and output was largely through tree-based media: paper tape or Hollerith cards with holes punched out of them. Having dragooned the amino acids of the growing population of protein sequences into strings of letters, Dayhoff and others started to compare the strings to try to figure out a) where they came from and b) how they worked. Bizarrely, the first 3-D structures of proteins (which we think of now as being intrinsically more difficult to produce than DNA sequences) were worked out in 1958 [Kendrew, Myoglobin and Perutz], before working out the 1-D sequences really got going. Except in peculiar circumstances, nobody sequences proteins directly anymore: protein sequence is inferred from the DNA codons.

A key issue with devising a single 'distance' between two aligned sequences was what to do about the mismatches. Everyone agreed that same vs different was too simplistic a model to have much utility for 'difficult' cases. I looked at two better models to cope with wrong-wrong-almost-right. You can count the number of changes in the underlying DNA and score them from easy-to-change to hard-to-change. OR you can look at the instrinsic physico-chemical properties of amino acids - size, charge, hydrophobicity - and mix them up into a theoretical similarity score. Glycine with no side chain is well different from Phenylalanine which has a gurt big lumpy hydrophobic side-chain; lysine is positively charged and glutamate is negatively charged but they are both charged and they have exactly (+/- 1) the same molecular weight.

Margaret Dayhoff adopted a much more pragmatic wysiwyg approach. She gathered all the sequences in the nascent protein database and aligned them in pairs. For any pair of sequences that were at least 85% identical, she tallied up a) the number of places where the amino acid remained the same b) the nature and number of the differences.
These were gathered into a big 20 x 20 matrix, and after a bit of scaling this was published as a Point Accepted Mutation PAM matrix. From the initial PAM 1 [1% different] matrix a series of PAM 30 . . . PAM 120 . . . PAM 250 matrices were extrapolated to serve as models for more distantly [than 85% ID] related sequences. The top left corner of the PAM30 matrix appears [R]. It was very much a heuristic [good enough] solution to the problem and that fit really well with evolutionary biologists who observe nature's good enough solutions to survival on a daily basis.

It was also approved because is was based on actual observed changes and differences rather that what we-the-scientists thought was likely to occur. PAM matrices were hailed and adopted because they looked right and had considerable internal consistency. For example, all the cells on the main diagonal, coloured old rose in the diagram are positive: in the majority of cases (+85%) the same AA is present in both sequences in the alignment, so there get a positive score. All the off-diagonal cells are negative because when one AA is replaced by another it can go to any one of 19 other possibilities and so each one is rare. EXCEPT the cells coloured caramel which are 'conservative substitutions: two amino acids that look and behave like each other (same size, same charge, same hydrophobicity etc.): D aspartic acid and E glutamic acid is one such case. Such changes are tolerated by the mills of evolution and so become a point accepted mutation. PAM matrices were the de facto gold standard for sequence comparison for a generation. They have largely been replaced by BloSum matrices which were invented by Henikoff and Hennikoff in 1992.

Margaret Dayhoff? Definitely a nett contributor.

Monday, 26 February 2018

Anscombe's quartet

When I started working at The Institute January five years ago, I was given the most absurd workload. A couple of different classes required me to teach Excel - the spreadsheet programme. Knowing this since I was hired in early December, one of the very last things I did in Trinity before I left was to take a half day course in learning Excel. Some weeks, the only thing that kept my nose above the water was the belief that it was up to the students to figure things out and that being told things in detail by me would be counter-productive. Let us all give thanks for spread-sheet software - I had a data-analysis job before such things existed and it was hard work.

One of the things that I have really taken on board is that the first thing to do with any data is to graph it. The human eye is really good at picking out patterns and a graph will pick out any trends, grouping and outliers. After eyeballing the data, then you can carry out a statistical test to see if any of the patterns or comparisons are significant. Apart from anything else, a really good graph will have huge explanatory power for any report you make. If you do the statistics first, you may delude yourself into thinking that something is going on when in reality the data is so noisy that your 'significant' association has no explanatory power. Like my genetic and geographic distance plot. One of the proverbs of statistics is "Correlation is not causation" nevertheless "r" the correlation coefficient between two variables does give a quantified estimate of the strength of their association. r varies between -1 and +1, with values close to zero indicating mere noise. Doing that GenDist v GeogDist analysis from my PhD really brought home to me the value of r2 which is formally the % of the variability in the dataset explained by the supposed relationship between the two variables. A relationship which has r = 0.8 will have a positive trend
Three weeks ago, I picked up a scrap of paper at work from a previous class that had been considering Frank Anscombe's insistence that the first thing you should do with data is to graph it. The four graphs show four datasets with highly significant positive [and suspiciously similar] associations: for each one r = 0.82 and r2 = 0.67. These classic 'fudged' datasets are known as Anscombe's Quartet. They were designed by Anscombe in 1971 to show that a correlation coefficient - for all it's pretensions at objective truth - can be a piss-poor summary of what is actually going on.
  • Is there really a 'trend' the the data shown bottom right? It's the kind of picture that you'd get from plotting height against foot length for 10 mice and one cat. The outlier is driving the whole analysis.
  • The top two pictures really bring home that a correlation coefficient depends both on the slope of the relationship and on how tight it is.
  • The picture bottom left makes clear that a linear trend is a rather woolly approximation for the true nature of the relationship between the two variables.
Frank Anscombe was born in Hove, UK in 1918 but was head-hunted by Princeton and later Yale after WWII, when the US could pay salaries that seemed astronomical to benighted, rationed Brits. You can get the substance of his sermon on the value of graphs and analysis in the first page summary here. The rest of the gospel is behind a [modest] JSTOR paywall.

Sunday, 25 February 2018

A cursory inspection

I was over the water in England at the tail end of last week on the Generation Game. I arrived on Wednesday tea-time to visit The Boy and his family. The parents left almost immediately to have diner à deux leaving me in sole charge of the Gdaus. The instructions were "read three short stories of her choice to Gdau.II [aged 2] and finish the chapter of Arthur Ransome's Winter Holiday . . . it'll be fine". And it was. The airy, obvious to all thinking people, instructions were sufficient. Amazingly, having switched off under my stern eye 👁, Gdau.II slept through the night for the first time in months winning the parents a valuable catch-up on their sleep deficit.

Earlier that day, I'd picked up a hire-car at the airport. From Green Motion [never heard of 'em] as it happens, because that popped up top on the Ryanair Website. Renting cars is a source of (usually unwarranted) anxiety for me, so I also bought some anti-excess-charges insurance on the same run through Ryanair's site. The Help at Green Motion walked me round the car and pointed out two scuff-marks on one of the alloy wheels and a tiny chip on the wing-mirror above. She flagged those on the rental sheet and said they'd charged the previous punter £140 for the damage. She also said the extra insurance was nothing to do with them: they'd charge me up to £984.60 damages and I would have to claim it back through the third party insurance.  I would never have noticed the marks if they had chosen to find them at the end of my contract, and an unscrupulous company or its employees could make a nice little earner out of it: the scratches that keep on giving. Seems that Green Motion has a black hat reputation for this sort of thing.

Having dealt with the outside, The Help waved at the dash-board of the car and said everything worked from that central console and I'd have no difficulty. I muttered something like my trusty 12 year-old Yaris don't have no central console but didn't want to emphasise what a rube I was, in case she found more dints and scratches when I returned the car at the end of the contract. Well, I was in trouble immediately because a fan came roaring on when I turned the ignition which freaks me out because that sort of thing will quickly drain the Yaris battery. The central console had a hierarchy of menus at least partly driven by a touch-screen. The nearest I could get to 'fan' was something called climate control that allowed the driver to set the internal temperature of the car to the nearest half oC, which is good as human physiology can manage for my core body temperature. Over the next 3 days, I eventually worked out that I could switch the fan off by cranking the internal ToC down to zero.

After a few kilometers getting the gears, lights and wipers sorted out, I pressed a four-squares icon on the console and fired up photograph mode. Not being Russian, I had no intention of clocking up hundreds of dash-cam photos on my journey. But I couldn't get back to the original default mode for the console, so was unable to adjust my temperature. The seat was way back but the arrows pointing up and down on the first console screen had no effect on my posture: these eventually transpired to be connected to the air circulating fan: did I want the draft in my face or up me trouser-leg >!frisson!< . For an all computerised car, I was disappointed to find that I had to crank the seat-back forward by hand.  The Help's airy confidence about how her cars worked was a good example of the curse of knowledge of course everyone knows the difference between pressing a button and pressing-and-holding a button. I only discovered that difference when I had to set the clock and tune the radio on the Yaris. In my day buttons were on or off.

I've written about being unable to find reverse gear on a hire-car in the 1980s. More recently I couldn't open the petrol tank on another hire-car - the was a lever on the floor near the driver's feet of course everyone knows that the obvious place to control access to petrol is a lever diametrically opposite to the petrol tank. I returned the car with a little more petrol in the tank than when I accepted it but there was no way they were going to credit my account for that. Avis, Hertz, Budget, Europcar or Enterprise next time I think.

Sunday Sunday 250218

Catch these:

Saturday, 24 February 2018

Maths through cards

I wrote this about 20 years ago for the Home Education Network HEN Newsletter. The small girls mentioned grew up straight and tall as Dau.I and Dau.II, so having many packs of playing cards in the home did them no harm. Playing cards was almost the entirety of their math-education! I was shocked to find out that about a third of my current students had never handled a deck of cards: no wonder innumeracy is so prevalent.

I have idled away hours (probably days) of my life playing solitaire or patience.  It's fun to play these, officially one-person games, with small children, although it requires a certain amount of 'patience' to play at a different (at the moment slower) pace.  I look forward to the days, now not far off, when they will be running circles around my dotage. Klondike is the variety that was originally fitted-as-standard in Windows.

There is a deal of maths in a pack of cards.  All those things beyond +, -, *, /, that appear in the children's maths books: Sets, sums, symmetry, equality, counting, logic....  If you think about it, there isn't much left outside of what they call mathematics nowadays.
Most of the games most people play seem to be competitive, even if, like strip-jack-naked or beggar-my-neighbour they involve neither a jot nor a tittle of skill.  I am sure there are well adjusted families who handle winning and losing with equanimity but we find it all rather stressful.  So here's a tuthree games of patience to play with your kids . . . or self if stuck in departures with a pack of cards.

1. Tens.  a. Shuffle a complete pack.  b. Deal out 12 cards.  c. If any of the 12 are 'face' cards (king, queen, jack) remove them and place at bottom of pack.  d. Fill in the holes.  e. Repeat steps c-d until 12  Ace-10 type cards remain.  Now you're ready to start.  Dealing from the top of the pack, cover up any PAIR of exposed cards that add up to 10 or any single 10s: Ace counts as one.  If you get it out, you'll have the 12 face cards exposed and all others covered.  Occasionally, you cannot complete the process, but usually you can get this one out.  No skill or significant decision making required.  After a few games you'll never forget the pairs of numbers that sum to ten.

2. Up and down.  a. Shuffle a complete pack.  b.  Deal out 36 cards face up in a pyramid - one card in the top row, covered by two in the next row, covered by three ... until the last row has seven cards.  c.  From the stock that remains in your hand, turn over the top card on the table in front of you.  d. If the value of one of the wholly uncovered cards is either one higher or one lower than the card in front of you, then take this card from the pyramid and add it to the pile in front of you.  e.  Repeat step d until you can make no more moves, then repeat step c.  It is rare to get this one out.  Some decision making is required.

There was a time when I knew twenty different games of solitaire, some totally mindless, robotic time-passers, some requiring a good deal of foresight and following a logical path many steps long.  Stop me and I'll tell you a few.  It is useful to have a few in your toolkit because playing one game will effectively shuffle the deck w.r.t. another.

For those who can handle a more competitive edge, our once-upon-a-neighbour Roger recommends Cribbage.  This is good for exposing the logic of permutations, matching, and summing to 15 or 31.  Score is usually kept on a cutey marker board with rows of little holes drilled in it [see R].  You can get the rules out of any book of card games.  That same book should have the rules for Piquet, perhaps the most elegant and sophisticated two-handed card game ever played, which is a cross between rummy, whist and medieval french.
A classical deck of cards is just beautiful to look at.  The detailing of the face cards of the traditional "English" pack is beautiful, conventional, stylised and nearly 500 years old.  The other cards give scope for an investigation of  rotational or reflectional symmetry.  Putting aside the abstraction of the pure maths that is in a deck of cards, you can also cover the applied maths of structural engineering when you build card houses.  Every family should have several packs.

Friday, 23 February 2018

Squeaky the Whale

Conflict drives technology.  RADAR was developed to extend the range of detection for incoming hostile aircraft in WWII. It gives an edge to be able to find planes beyond the limits of human sight and hearing. The Cold War really took the battle underwater as anyone who has seen The Hunt for Red October will verify. In the 1960s, the US developed SOSUS sound surveillance system: a series of static oceanic listening posts to detect the movement of Soviet nuclear submarines. Thousands of hours of tonks, clangs, purrs and rumbles were recorded and sent back to the US for analysis. They picked up a lot of other sounds including the super-low pitched whurps and wwwwls of whale-song.

In the late 1980s the whole project was declassified and the Woods Hole Oceanographic Institution WHOI started to deconstruct the acoustic clutter. In among the regular basso of blue Balaenoptera musculus [R in Monterey Bay] and fin Balaenoptera physalus whales which call at between 10 and 40 Hz [cycles per second] - listen at 10x speed so our ears can up-pick , the Woods Hole people detected a single caller from the Pacific Ocean at 52 Hz  - listen here. The latter was captured by Bill Watkins, who tracked and tricked with the high-pitched caller for the last 20 years of his life. "high-pitched" is relative, of course. For comparison fit young humans can detect sounds over 3 orders of magnitude - 20 Hz to 20,000 Hz. Although it is probably fair to say that they can't distinguish pitches at the extremes of the range. You probably get more sensation through your feet or buttocks than your ears for really low pitched sound.

The 52Hz whale caught the imagination of the chattering classes and the Daily Mail & the BBC, who called 'him' the loneliest whale in the world - romantically searching for conspecifics of the opposite sex but never getting off with anybody. There is no evidence for this tale of unrequited love. For all we knew he was a sensation with the ladies and fathered dozens of whale-spawn - who none of them inherited the call sign of dear old Dad. It was suggested that he might be deaf and was thus unable to pitch his song at normal levels. It is known that whales change their tune to suit local style, presumably because females are choosy but normative.

Another suggestion was that Squeaky could do no other because he was a hybrid between the two big species of Balaenoptera and this led to conflicting developmental instructions for the vocal chords. Hybrids are known between B musculus and B physalus - they both have the same gross karyotype with 22 pairs (2N = 44) of chromosomes - B musculus  vs B physalus and see [comparative karyotypes L]. The chromosomes are photographed as a splat down the microscope and each one is cut out [with scissors in the old days] and reassembled in blocks: metacentrics with the centromere [waist] in the middle, telocentrics with the centromere at the end and for pedantic niceness sub-metas and sub-telos for those chromosomes that between the two extreme states. Both pictures were assembled by Ulfur Arnason albeit at 40 years separation in time, but they are 'different' wrt the split between metacentrics 8 or 7 and sub-metacentrics 6 or 7. I don't believe it. Or rather I think they have  essentially the same chromosome count and maternal B musculus and paternal  B physalus will be well able to line up neatly in the hybrid when it comes to making sperm and egg. Phew! I'm glad that's sorted. If you have an reproductive, fertility or socialisation problems yourself; Uncle Bob will give you a helpful answer without getting up off the sofa.

Thursday, 22 February 2018

New weapon against MRSA

We have squandered out patrimony, consumed our seed corn 
and laid waste the gardens of our delight. 
[don't google that, I just made it up] In the 20thC, science, aided by The Gods of Serendip and St Loads of Money, discovered a range of chemicals that would kill, or inhibit the growth of, harmful bacteria. In 1925 my 9 y.o. father almost died when he was made to go for a winter swim and caught pneumonia. His mother sat, beside herself with anxiety, with him until the fever broke. Exactly 70 years later, 2 y.o. Dau.II caught Christmas pneumonia and spent 2 days in hospital getting rehydrated and antibioticked. The antibiotics made all the difference. So what have we done with these miracle drugs that have seen off child-killers like diphtheria, scarlet fever, meningitis, tetanus, septicaemia, and pertussis, and young adult 'issues' [and discharges] like syphilis, gonorrhoea, and chlamydia. You can see from the links how much this has engaged and enraged The Blob. And what have we done with these miracle drugs? Why, we've sold them to farmers to inject their sheep and to chicken factories as growth promoters. V e r y   s l o w   h a n d   c l a p.

Because bacteria are more numerous than sand on a beach and have a huge collective ability to develop resistance to antibiotics and spread that ability around. Big Pharma argues that there is no market in developing novel antibiotics because it will cost €1 billion and 10 years in testing and licensing and it will be difficult to recoup that money in the few remaining years of patent protection. Somehow that argument does not prevent them developing clever-sexy novel therapies for minority inherited disorders and selling them to the insurance companies of grateful parents for $150,000 a year. Another issue with scaling up the production of natural antibiotics is that only a tiny fraction of bacteria Out There can be grown under controlled conditions in the laboratory and ultimately in a production vat. We just don't know enough about their particular and peculiar dietary requirements.

Accordingly, it is left to academic scientists to dream up new ways of approaching the problem and trying to persuade their governments to take a punt in blue skies research. Two years ago I gave tribs to a couple of scientists who discovered teixobactin the first distinctively different antibiotic in 30 years. What Kim Lewis and Slava Epstein figured was that in a handful of soil there are 10,000 species of bacteria fighting over limited resources in a brutal take-no-prisoners war. Some of the the successful microbes will have produced chemicals that were toxic to rivals. They developed a brilliant, high-throughput, iChip that would identify those bacteria that were thus weaponized.

It's a model for the future development of novel antibiotic therapies. Now Sean Brady [R] and his crew from the Laboratory of Genetically Encoded Small Molecules, Rockefeller University, NY, NY, have done something similar to launch malacidin into the war against MRSA and related diseases. This is so important that Nature Microbiology is exposing the paper beyond its pay-wall. Brady's lab is also working in parallel characterising the bacteria of the human microbiome to better understand the environment in which his novel therapeutics are going to have to work.

I think it's fair to say that Lewis and Epstein were on a massive fishing expedition when they isolated Eleftheria terrae from their multiple soil samples. Sean Brady's project was much more hypothesis driven. Previously identified interferers in cell-wall building were small peptides that were dependent on calcium for their action. Sequence analysis had shown that many of these Ca-dependent peptides contained a Asp-X-Asp-Gly motif: supposed to be were the calcium ions bound to the functional molecule. They were able to tailor their pipeline to reject microbial products that didn't contain that short sequence and enrich those that did. By gathering 2,000 soil samples, each containing 1000s of bacterial species from all over the world Rockfeller is now warehousing one of the largest bacterial collections on the planet. Much good it may do them because most of them won't grow can't grow on a Petri dish in the lab. Whatever about a whole new world of potentially therapeutic bacteria, I've been battered with a whole new world of acronyms in trying to read their paper. AD, BGC, eDBA, eSNaPD, NP, NPST, NRPS, pTARa.

Malacidin [nothing to do with Mal=bad or Malus=apple, it is an acronym: metagenomic acidic lipopeptide antibiotic-cidin] works by interfering in the construction of the thick cell-wall which is an essential building block of Gram-positive bacteria. The diagram details where the targets of three different antibiotics, daptomycin, friulimicin and malacidin, each find a different way to interfere with bacterial cell-wall building.

Bacteria can double their numbers in less than an hour under optimum growth conditions, and that requires twice as much cell-wall. So none of those antibiotics will work against thin cell wall Gram-negatives: these include Bordatella petussis, Campylobacter jejuni, Chlamydia trachomatisEscherichia coliNeisseria gonorrhoeae, Neisseria meningitidis, Salmonella typhi, Treponema pallidum, Vibrio cholerae, etc. etc. However, it seems to be able to knock MRSA on the head and that is what is making the headlines.

Wednesday, 21 February 2018

Mapping the ground

You'll have to take my word for it, I'll never find a reference for this story. Ireland was mapped in the 19thC by teams of land-surveyors: chains, theodolites, triangulation and wet socks. Cost of labour has rocketed since the 1840s, so when the Ordnance Survey of Ireland needed to update their maps 30 years ago - because the pace of change and development was hotting up and the 1911 survey didn't show a rattle of by-passes, bridges and housing estates. This obsolescence of old paper maps benefitted me last year when I purchased the whole SE corner of Ireland at 1:50,000 scale. So the OSI put out to tender a contract to capture the landscape in precise series of overlapping aerial photographs. The company that won the tender had looked at Irish weather, the size and speed of their Cessna and reckoned that they would budget 18 months to be sure to be sure of getting enough cloud-free days to cover the country. They started in early Summer in the midst of a rare anticyclone, flew all the hours of daylight and finished the task 15 days later: Win!

The gossip is that the Department of Agriculture does it all by satellite now, snapshotting every field every five days. That may be just a rumour to stop farmers burning their hedge-clippings and throwing their plastic silage wrappers into the bonfire. You may be certain sure that the Dept Ag doesn't have the manpower to analyse all these data: they are too busy drinking tea and doing sudoku at their desks.

Photo-technology has moved on mightily since the 1990s. You just think Google Streetview. My students capture the view down their microscopes with their smartphones . . . which is a little annoying. One of the most amazing technologies matches the macro with the micro and uses LiDAR to detect minutes variation in height over swatches of countriside - even when the hard surface is masked by vegetation. We know what LiDAR is and how it works but their isn't consensus on what the acronym means; eithee  light detection and ranging OR light imaging, detection and ranging. See The Blob on APGAR for a backronym.
The UK Environment Agency has taken out a contract to map the whole country from the air using LiDAR. They are hoping to discover wonders like the layering of a modern motorway and a 19thC farmstead on top of a Roman fort and its associated access roads [L].  Bloomin' amazin'. The amount of data when you map the entire country at 1 metre resolution is impossible to store, let alone process, without a huge server. But these data are going to be made freely available to industry and Josie Public and you may bet your sweet bippy that the data are going to be put to some quite unforeseen creative use.  there are 11 terabytes of data in it which has been downloaded 500,000 times and stored on another server somewhere else. That's a LOT of bytes. But it still is a microdot compared to the trillion crap photos which are uploaded to The Cloud every year . . . never to be seen again.
In England, even if there are some occluding trees, you can still get access to the interesting sites for field work on the ground. It's a bit different in the Central American jungle where the storied vegetation fills the sky from ground to 50m up and then cascades down again as lianas, vines and thorns.  With LiDAR you can strip away the jungle to reveal a whole network of interconnected Mayan cities where nowadays only people with blow-guns hunt for a bit of bush-meat. And the mapping data doesn't stay on the server; chunks of it can be downloaded onto a laptop to create a virtual reality landscape linked to GPS. The archaeologist can image the next pyramid invisible in the impenetrable jungle and make a direct bee-line for the gods of the Maya. Asombrosamente increíble!

Tuesday, 20 February 2018

One letter code

Margaret Oakley [R after she became Margaret O Dayhoff through marriage] was born in Philadelphia in 1925. You cannot underestimate her importance to the development of the tools for making sense of biological sequences. For Dayhoff, the same claim can be made as for Dennis "C and Unix" Ritchie: without them it would all be different. Grace Hopper, inventor of COBOL, was another women in the right place, at the right time, with the right mindset and toolkit and she has a pretty high profile. Margaret Dayhoff otoh really doesn't get the same press but her contributions have had more impact; not least because she kicked off the area of bioinformatics and molecular sequence analysis which has supported me for almost all my working life. Developing a whole new field is chaotic - in the sense that it is sensitive to initial conditions.

I've riffed before on Pointless - the TV quiz game where success is when you can give a correct answer which nobody else has picked. If the question is "Name a female scientist who contributed to biomedical science in the late 20thC" then Margaret Dayhoff will be a winning Pointless answer. The answer to "Which pair of scientists made the first contribution to cracking the genetic code?" is not "Crick and Watson" - they 'just' gave us the physical structure of DNA. It is rather Nirenberg and Matthaei who in 1961 determined that UUU codes for Phenylalanine. That was the first codon assignment. The rest tumbled into place over the next 4 years, revealing that 20 amino acids are the basic inventory from which all proteins - all the enzymes, all the receptors, actin & myosin, haemoglobin, oxytocin, insulin - are constructed. The trouble is that the 20 amino acids were known and named years before the genetic code was AThing. The smallest, glycine, is from γλυκός glycos because it tastes sweet. I'm not sure about the connexion with soya Glycine max. Serine was first isolated from sericum the Latin for silk etc.

Dayhoff's first qualification was in mathematics which she subsequently started to apply to physical chemistry including the nature of chemical bonds. From there she moved into the structure of proteins and applied her mathematical and computing toolkit to the storage, retrieval and analysis of protein sequences - of which an increasing number were coming on stream. In 1960, she was appointed associate director of the National Biomedical Research Foundation in Maryland. Back then, protein sequencing was running in parallel and quite a way ahead of DNA/RNA sequencing. The first substantive piece of RNA sequencing saw RW Holley take a whole year 1965 to work out the 80ish bases of Alanine tRNA. That would now be knocked off in a μ-second. aNNyway, Dayhoff saw that the inventory of protein sequences was growing exponentially and, albeit from a small baseline, was going to get massive. Writing down each sequence on paper wasn't going to be the answer. Accordingly, she started to record sequences on punched cards [prev] and quickly grew dissatisfied with the convention that each amino acid was represented by a three-letter abbreviation based on its first three letters in English: Phe, Gly, Ser have been mentioned above. Dayhoff realised that with only 20 AAs in the inventory, each could be uniquely identified with one of the 26 letters in the Latin alphabet.

But whoops, here are those 20 amino acids: alanine - arginine - asparagine - aspartic acid - cysteine - glutamine - glutamic acid - glycine - histidine - isoleucine - leucine - lysine - methionine - phenylalanine - proline - serine - threonine - tryptophan - tyrosine - valine - and the first thing you note is that 20% of them begin with A!  So her first pass was to assign the easy [unique initial] ones:
  • C H I M S V 
  • it was also easy to assign F to phenylalanine at this stage which freed up 
  • P for proline
  • 8/20 done
the next decision was to give priority to the first in alphabetical order:
  • A = alanine; [G = glutamine]; L=leucine; T = threonine
  • that allowed K for lysine as the next unassigned letter in the alphabet.
  • 13/20 done
hmm, she thought, there are two cluttering overlaps because of the acid side-chains aspartate and glutamate and their amides asparagine and glutamine so:
  • let's reverse a bit to give G = glycine then
  • D = aspartate, the E = glutamate to fill in the early hole between C = Cys and F = Phe
  • N = asparagiNe and Q = glutamine [G looks a bit like Q] fills a similar later hole.
  • note that D precedes E because Aspartate precedes Glutamate
  • (18-1)/20 done
The rest are assigned by their second letter
  • R = aRginine; Y=tYrosine
  • and W the biggest letter is given to the largest amino acid tryptophan
  • and that's it!
  • 20/20 for Margaret Dayhoff
That, now universally agreed convention, was determined by the contingency that Dayhoff spoke English at home. If she's been born in Tampere, and followed the same algorithm then K would be assigned to cysteine [alaniini - arginiini - asparagiini - asparagiinihappo - kysteiini - glutamiini - glutamiinihappo - glysiini - histidiini - isoleusiini - leusiini - lysiini - metioniini - fenyylialaniini - proliini - seriini - treoniini - tryptofaani - tyrosiini - valiini] and all bets would have been off if she'd come from Kiev [аланін - аргінін - аспарагін - аспарагінова кислота - цистеїн - глутамін - глутамінова кислота - гліцин - гістидин - ізолейцин - лейцин - лізин - метіонін - фенілаланін - пролін - серин - треонін - триптофан - тирозин - валін].

Life has gotten more complex since those idyllic simple early days: we've discovered selenocysteine Sec U and pyrrolysine Pyl U. We finally give B to aspar* and Z to glutam* as ambiguity codes because a lot of the chemical protein sequencing protocols render the acids indistinguishable from their amides. Phew! with U and O we have a full set of vowels to play with.

Now the alphabet is almost full [J and X only unassigned] and we can use protein sequences to write names as a kind of geek-code. If you want to out-geek the geeks you can write your name as a peptide using Peptify a toy developed by Nuritas to stop their employees playing solitaire on their lunch-breaks. Nuritas is the spin-off of Nora Khaldi [bloboprev] an entrepreneurial woman in science. Here's PeptoBob me:

Monday, 19 February 2018

Frozen accident

The diagram above, largely due to Willie Taylor, is perhaps the most important chunk of infrastructural information in molecular biology. I've been here before with The Masters of Imm. It shows the common ground as to size, electrical charge, solubility among the 20 amino acids which make up proteins. A 'conservative substitution': is a change in the inventory of amino acids AAs that will cause least structural change to the constituent protein: roughly called if/when two amino acids appear in any one of the gathering-together circles. But some are more conservative than others! Leucine L and isoleucine I are essentially the same; whereas Glutamate D and Lysine K are both 'charged' but one is negative and the other positive, so they are not quite the same. How to quantify this? One way is to count the number of changes in the underlying DNA that will result in a change of amino acid.
green-for-go arrows are AA changes that require only a single change in the DNA, red-for-harder are examples where two changes are needed and some amino acid substitutions require three changes: there is NO commonality in the codons for TRP and ASP or between CYS and MET. It was long ago noted that conservative substitutions tend to be in the same row or column - they involve only a single change to the DNA.  That has implications for the evolution of the genetic code from a simpler arrangement with fewer amino acids which got more complex through a series of mutations which were found to have utility wrt survival and procreation. Perhaps more importantly it provides a bit of stretchability and robustness: few substitutions are going to create huge waves in the structure and integrity of the protein: UCC = Ser to ACC = Thr just adds a small methyl group to the AA side-chain; UAU = Tyr to UUU = Phe still results in a large aromatic hydrophobic amino acid. In other words, the genetic code is not really a 'frozen accident' but is extremely non-random.

These issues are teased out at length by Koonin & Novozhilov here [for free]. They have bearing on the problem of measuring similarity which is currently engaging some of my students in their final year research project.

Sunday, 18 February 2018

Sunday Misc 180217

Really miscellaneous today:

Saturday, 17 February 2018

Give us a hand

I love my job: it's not too hard on the knees, I have a great deal of autonomy, the work is within my competence but it's possible to embrace greater challenge if I'm bored. Right at the beginning of my career I had another wonderful job: working in Diergaarde Blijdorp aka Rotterdam Zoo. The work was physical, dirty and often soggy [my position was in Afdeling Vissen - aquarium-land] but I really looked forward to each working day. Every day was different but enough routine so that institutionalised me didn't go off the rails. My work-mates were a motley crew: a taxidermist; an amateur herpetologist with a flat full of live reptiles; the foreman was Afrikaans; one fellow couldn't wake to an alarm-clock but had to be phoned; two guys who'd done National Service in signals and talked in Morse. They'd all left school in their mid-teens because they loved animals, but many of them had a deeper knowledge of biology than BSc me. The only bloke with any sort of formal higher education was Chris who worked in the book&gift shop.  If I ever wanted to talk about things other than work or animals, I'd drift in to visit with Chris for a couple of minutes. If I kept a bucket in my hand it could pass for work.

Chris's eccentricity was that he couldn't walk, his limbs were banjaxed by a neuro-degenerative disease and he was delivered to work by his full-time night carer in a wheel-chair van and collected in the evening. At work, if he needed to get to the jacks, he'd flag down one his co-workers for the small amount of help needed. Some were more engaged in the helping than others. A couple of years before I appeared on the scene, when it was proposed that Chris might be coming to work, the management asked The Lads if they were willing to facilitate this stranger's transition to gainful employment. The response was 'mixed': some willing but apprehensive; some feeling 'whatever'; some were proud to be given a chance to give back. The only person who was vocally against the whole project was Jan; he got really cross about the imposition and the unspoken peer pressure and denounced the management for ticking social-inclusion boxes.  Turned out that Jan had been in a desperate traffic accident in his early 20s, spent weeks in hospital, and months in rehab - it was touch-and-go whether he would ever walk again. Clearly, he had some justifiable baggage about Project Chris.

Things had settled down to same-old-same-old routine by the time I rocked up. At a certain time in the middle of the morning, we'd all down buckets and brooms and schlep off for coffee and buns in the staff canteen over by the elephant house. Chris had a joy-stick operated motorised wheel-chair and someone was likely to hop on the back axle to cadge a lift. Equally likely, if the weather was fine and The Lads consequently frisky, someone would steer Chris into the shrubbery <ho ho> in the same way that we might throw snowballs at each other if there was a dusting of snow. As a late-comer on the scene, I witnessed that the most attentive person for Chris's welfare and inclusion was Jan. He had completely changed his relationship with disability; in a way Chris had healed the sick. When I learned the back-story, I was quite unaccountably buoyed up for the rest of the day.
This all came flooding back to me when I saw a short film about disability made by some local lads for the Donal Walsh #LiveLife National Film Competition. Donal Walsh was a Kerry teenager who died from cancer in 2013. The Film competition is to continue Donal's I'm done for but you-all should live life to the full message . . . and don't top yersel' ye daft buggers.  This may remind you of Stephen Sutton another early departer and The Boy's hi-jinks driving a wheelchair. The filmlet cited above is far better than the competition! Better story board, better acting, better lighting, better continuity. Most importantly, from my experiences in Blijdorp [above], the story has the ring of truth. If Cormac Lalor doesn't win the competition, I'll be calling "Fix!"

Friday, 16 February 2018

Where do we III come from?

My sense of identity really hasn't exercised me in any emotional way. Coming from Horse-riding-Protestant stock from King's County, my sense of self, and expectation of entitlement, is bred in my bones. Being straight, white, male and middle-class helps too. We know exactly when our family established its patrimony in Ireland - 1643 - and the manor house in Wales from which we migrated. The family takes this ancestry schtick with a pinch of salt, a dollop of humility and a wry smile knowing that my great grandfather was the 'natural son' of the owner of the Big House. My PhD thesis hinged on the idea that by looking at present day populations we could infer something about their ancestry and therefore inform people about the pattern of colonial migration in New England and the Canadian Maritimes. A similar analysis can be driven by looking at much richer and more extensive data of European human genomic DNA variation.

I've looked at this sort of analysis before I through 23andMe, and II though linguistic analysis, and also meanderings about PIE. We are now a little but more confident about where the Brits come from. We used to do this all the time in population genetics and molecular evolution: we inferred ancestral states from present day DNA because the dead are dead and disintegrated beyond yielding sequencable DNA.

Not any more! The technology for making sense of ancient DNA has moved on really fast and far in this century. The person who has delivered the most quality ancient human material into the public domain could well be Lara Cassidy [R], a 20-something PhD candidate working in Dan Bradley's [bloboprev] Archaeological Genetics lab in Trinity College Dublin. Ancient DNA work is really difficult: you need to be a good pair of hands: dexterous, meticulous, painstaking, tidy in your habits and careful of your data.  Any DNA that exists from hundreds or thousands of years ago is going to be degraded, fragmentary and hard to recover. The least bit of contamination: a fallen eyelash or a fingerprint; something left from your last experiment; a cough from the cleaner; will deliver enough contaminating DNA to swamp out any signal from your current sample. Then you must have a completely different set of skills in computational analysis, number crunching and programming. Ancient DNA is like running a time machine: from a fragment of bone [preferably the 'petrous' bone near the ear-hole] we can see certain attributes of the long-ago dead: their sex; their skin, eye and hair colour; their probable height; their susceptibility to disease. It's as if Achilles or Cúchulainn walks again. Cassidy has knocked off numerous ancient DNA genomes! A life-time's work in 4 years.

One of the most fraught questions in Irish departments of archaeology and anthropology is whether we are the direct, genetic, descendants of the builders of Newgrange and the folk of antient legend. Did those people adopt the cultural practices and borrow the tools of more sophisticated neighbours or were those ancient people displaced by the bearers of those tools and artefacts? 100+ years ago, with the British Empire as the invisible background to cultural discourse, the consensus was that superior migrants had brought culture to the benighted West. The next generation was throwing shapes about national identity after a bloody war of independence and 'migration' became a dirty word. The next generation after that adopted a bit of this a bit of that compromise position. Cassidy and her co-workers have now dumped a sackful of data on the fossil-cluttered desks of archaeologists and shown, maybe uncomfortably, that the colonial invasion / physical displacement model is most likely true. Here's the data graphed out [explanatory background yest]:

The further back you go, the thinner the seam of data gets. The earliest DNAccesible human bones in Ireland were discovered by spelunkers in a limestone cave in the NW. They have been carbon dated to the Mesolithic and their owners /users were probably hunter-gatherers. But in terms of Eurogenetics, those bones are on another planet.
A more recent, and much better preserved skull [L reconstructed head of Our Lady of Ballynahatty] was unearthed at Ballynahatty near Belfast. She is Neolithic and from an era that had embraced farming. The largest cultural artifact of that era, 120km due S but still in Ireland is Newgrange: a mighty pile of engineered stones, some decorated, some sorted by colour, protecting a portal tomb whose access passage precisely aligns with the Winter Solstice. It is older than Stonehenge, older than the Pyramids at Giza.  Her DNA profile [marked Bh above] bares no genetic resemblance to modern Irish people but slides neatly into place between Spain and Sardinia; she was clearly European but not our sort of European.

1,000 years later, another cultural transition appears in the archaeological record. It is fatuous and just wrong to think of the Neolithic society which created Newgrange [and the Ringstone, of which we are Guardians] as "banging two rocks together". That society was cohesive, sophisticated, religious, hierarchical and driven by the aesthetic. But the metalworkers from The East bringing copper and tin together in durable bronze weapons & knives; gold fancy-goods; and distinctive domestic pottery were a different culture altogether. The team from Trinity have shown clearly that they were different genetically as as well. Three Bronze Age skulls from Rathlin Island off the Antrim coast of Northern Ireland have now also had their genomes exposed to the public gaze. They are a bit on the edge of the local modern demographic [marked Ra on the genetic map above] but recognisably of and from these WEA islands. It's all been published in, the prestigious, PNAS.

You might think that Lara Cassidy is lucky to have gotten such a fabulous project with which to get her start in science [7 peer-reviewed pubs; two as 1st author; 1 in Science 2 in PNAS; not to mention all the press coverage]. It is not always like that: with the best will and skill in the world you can sign up to a project which has no hope of working out because it has been poorly conceived or grossly underfunded or has an terrible supervisor. But that project was lucky to get Cassidy because her telegraphic CV indicates quite extraordinary levels of achievement = determination and dedication. I've suggested before that you make your own luck, through finding a good fit to your talents and working damned hard at your craft. You can almost hear a Professor Bracknell echoing Oscar Wilde with "To sequence one ancient genome, Ms Cassidy, looks like fortune; to sequence several looks like carefulness." 
Bob B'godde Bracknell I wish I'd said that.
Bracknell: You will, Bob, you will next [last] time you are invited to Commons at TCD

but it's not about me, it's about More women in science.

Thursday, 15 February 2018

Eurogene - the map

I told y'all that you should go to Dublin on Darwinday to hear Dan Bradley talk about the Genetic Origins of the Irish. But I know that some things you can't delegate: you just have to do them yourself. Accordingly, I leapt into the Little Red Yaris at 1705hrs and drove to Dublin to hear the news from the frontiers of biogeography. But the news is always based on the olds and the most beautiful and informative picture of my 2018 [at top: far better copy] was published in 2008! I may well have been entranced by that map when it came out ten years ago, but I've since forgotten all about it. Heck, I've forgotten my car-keys and where I left my glasses as well.

That map is Fig 1 in a paper in Nature: Genes mirror geography within Europe which sampled the sequenced genomes of 3000 Europeans (and four Turks) and tallied up each person's state at 500,000 different variable sites in their DNA sequence. That's a shed-load of data and you can't make much headway by ticking off (3000 x 3000)/2 x 500,000 cases of Sean is different from Jean here but the same there, while Giovanni is different again. Well actually you can, and that's what John Novembre et al. did in 2008. They put the whole dataset into a hopper called Principle Components Analysis and gave it all a good shake and a jiggle. PCA reconciles all the internal inconsistencies, and calculates the position of each person in N-dimensional hyperspace. No, I too only have a hazy notion of what that really means but in practice it calculates how near or far each person is from each other person in the dataset. It will come as no surprise when it turned out that the quartet of Turks looked really similar to each other genetically and rather different from the Europeans . . . and the Irish too: like each other, quite similar to Brits and Scots and less like Poles and Greeks. A lot of that difference will smooth itself out over the next 100 years as our 200,000 resident Poles make babies with their Irish neighbours.

You can do these sort of studies because the cost of generating the primary data has collapsed over the last 30 years. The first ever chunk of genomic DNA, yeast Saccharomyces cerevisiae chromosome III, was contracted by the EEC (=EU) 30 years ago at 320,000 ecus (=€) or €1 /base. We carried out the first non-trivial added-value analysis of that data - one of my three big ideas in science. With that stepping stone achieved, planners looked to sequencing The Human Genome: it cost €300,000,000 (10c per base) and took ten years. Now you can sequence A human genome for €1,000; it will take a day; and there is enough server power to do many genomes in parallel. So 3,000 genomes is quite affordable in a big science sort of way.

What is most striking about the distribution of genomes across the most explanatory axes of the PCA landscape is how closely it maps onto the geography of Europe.  The pale blue of Greece and the Balkans is nearest to Turkey over on the right; the grey Italian peninsula runs parallel and a little more distant; and further away again is a purple peninsula of Iberian genomes. At the opposite end of the continent, the Irish intercalate with the Brits; the Scandinavians have both shared and separate identity etc. etc. If you look closely, you can see Paddy-No-Pals off on his own in the sea like as sort of Uber-Irish outlier. Maybe she is not Paddy at all but Caitlín Ní Uallacháin. Also note the five rogue ITs in the sea at bottom left of the diagram; they do indeed have Italian passports but they are actually Sardinians. There is no evidence here that the compatriots of Szilárd, Wigner, von Neumann and Teller come from Mars.

I was doing a similar analysis waaaay back in 1980s. I took me 2 years of tramping the streets of towns and cites in New England and the Canadian Maritimes, scoring genetic variation in domestic cats Felis catus to gather a sample of 10,000 cats in 35 different populations diagnosed for 7 genetic variants. (35 x 35)/2 x 7 is quite a bit smaller than (3000 x 3000)/2 x 500,000 !! But it was all my own work. One finding was that genetic distance was correlated (highly significant statistically) with geographic distance but that relationship only explained 16% of the variability in the sample. 84% of the variation was noise - some of which could be accounted for by the history of the patterns of French, English and Dutch colonisation in the 1600s. That was what my PhD thesis concluded aNNyway.

When you cough up your $100 to get your DNA sequenced, 23andMe will compare your DNA to a database like this one and place your genome on the map. Unless you are truly and incestuously descended from the Pharaohs, your genome will be a mess of fragments from the miscegenation of your ancestors. 23andMe will give you a summary sound-byte like "50% Irish; 25% English; 20% French; a toe from the Maghreb and a Neanderthal fingernail". You may take that assessment with a huge pinch of salt because the data will be inherently noisy.

Wednesday, 14 February 2018

Measuring similarity

Sequence comparison and analysis: that's what I do. It's not my day job anymore; for the last five years I've worked in The Institute trying to make sense of science in a much more general sense with / for my students. But any credibility I have in the scientific community hangs upon my small-small contributions to revealing the pattern and process of evolution through the analysis of DNA and protein sequences. One of the key concepts is working out where genes, molecules, biochemical pathways and organisms came from . . . by comparing a bunch of related sequences.
If you can show that two sequences are more similar to each other than either is to a third one, then you have established a tree of relationships. In the simplest-possible-tree [L] A and C are closely related sister 'taxa' while B is only a cousin; and yes, B and D are sisters to each other also. Operational Taxonomic Units OTUs, here A B C and D, could be individuals or species or their genes. These assignments of similarity and relatedness are based on calculating how similar are the sequences when they are aligned together. The gross differences are easy to tally up. Here is a fragment of the protein sequence for beta-haemoglobin from four mammals; two from Order Primates, two from Order Rodentia:
       *:*** .***:********.
Note that for almost all the amino acids (AA the building blocks of all proteins here represented by 20 different letters) are identical in all four species. Yiu can check out the encoding here. The convention is that, when all the AAs at a given site are the same, then a * is put under the column. Next note that for the majority of the other columns, the two rodents have one variant and the two primates have another.  In one place, however, outlined in red, rats look more like primates than their fellow rodents; but that's just a random blip. The easiest way of getting a final answer on who is related to whom is to tally up the number of same AAs and divide by the total length of the sequence [here an arithmetically convenient N=20] to get a % identity and then report that in a matrix or table:
This works out pretty good. If we choose a cut-off between 85% identical and 80% identical we can [correctly] sort the four species into 'same order' vs 'different order'.  For big differences [mouse and baboon had a last common ancestor 100 million years ago (mya) while {rat and mouse} or {baboon and human} are "only" separated by 30 mya] identity vs difference works nicely. For relationships that are closer - ¿are chimpanzees Pan troglodytes or gorilla Gorilla gorilla out nearest relative? - it might be useful to have some gradation of difference rather that the stark black&white; 1 vs 0; same/different. One way to do that is to consult the DNA which makes the protein: because of the redundancy of the genetic code DNA is intrinsically more variable than proteins. In the first column of the alignment everyone has a K = Lys = Lysine. But it might turn out that the rodents make lysine from the codon AAA, while the primates use AAG, in which case the column would lose its identity * and give us some distinguishing information. I'll tell you more about mechanisms for calculating similarity later.

Tuesday, 13 February 2018

Red shoes and Green songs

This year at The Institute about a quarter of my contact hours are doing QM - quantitative methods aka remedial maths. Much of this term is getting to grips with Excel, in which I've forced myself to become a bit of a whizz. Last week one of my classes was plotting out their shhoe size vs height to reveal that humans are clearly sexually dimorphic - not as extreme as Gorilla gorilla but adult males are bigger than females and have bigger feet too. One of the task was to compute the average show-size for the class and I had a bit of a rant when everyone reported the answer as 6.236783. That's crazy I raved what does ...783 mean? how can you have 3/millionths of a shoe size? Much better, I suggested to say that average shoe-size was 6.2 - about halfway between size 6 and size 6-&-a-half. And I showed them the button for controlling the number of significant figures.  The rule of thumb is that, if you cannot detect a difference with the perceptual aids available then you shouldn't report it as a mathematical certainty.

This sort of issue came up the weekend we spent in Cork because Dau.II's bloke is the only person I talk to who knows, and cares about, the difference between a perfect fifth and a fifth of bourbon. This came up in the context of Philip Ball's book The Music Instinct which I eventually threw across the room in a pet. I passed the Grade I Theory exam because at the age of 9 I could count to 8 and divide it in halves and quarters. That's the level Mr Ball starts his book, but he rapidly disappears up his own orifice in a tornado of assumptions about what will be obvious all thinking people after the most cursory explanation . . . not! Anyway in Cork The W!ld Corkonial Boy pointed out that in a typical octave (12 notes on a standard piano) the normal human ear can only detect about 200 gradations of pitch, so it's absolute bloody nonsense to assert that middle C resonates at 261.625565 Hertz. The currebt convention - since about 1939 - is that the A below middle C is tuned to 440 Hz and the A one octave up is at 220 Hz. The fact that an octave results from halving the length of a vibrating string has been agreed since Pythagoras worried about beans. The best you can say about the human perception of middle C is that it is 261.65 because you can only detect any finer difference with an oscilloscope and some fancy electronics.

Same with colour. We only have 3 different photosensitive pigments in our retinas and it is their relative stimulation by different wavelengths of light that allows us to recognise cerise, magenta, viridian, and cobalt. There are actually two limits to perception with the eyes. One is about two colours, which an instrument can differentiate as to the reflected wavelengths of light, but which normal people cannot distinguish. Many people in Ireland paint their interior walls in 'magnolia' which is a super-pale cream: less austere than the flat white of, say, white lead  2PbCO3·Pb(OH)2. But Fleetwood magnolia is indistinguishably different from Dulux magnolia.  The other issue is whether we are capable of resolving two dots / pixels at a given distance from the eye. Up to the age of about 40 my eyes gave stalwart service: reading very small print and discerning small objects on the horizon. As senile degeneration of the retina set in and the shape of my eye-ball slumped, I was only able to read books if held at extreme arm length. When I needed an arm-extension prosthesis to get the text in focus, the letters were too distant to see. That's when I twigged that I needed glasses. Television and colour printing rely on our inability to differentiate pixels if they are sufficiently similar in hue and sufficiently close together. We can 'see' something as grey even if it is made up of pixels covering the entire visible spectrum

Monday, 12 February 2018

Dollar Street

Reminder: Darwinday Today. prev - prevlier
If you haven't seen these compare and contrast photos before, then you haven't been reading the Blog. If you haven't seen pictures like these before, then you should get out and about more. A German family displays a week's food: all that Speck, Wurst, Schinken, Leberkäse and beer :
compared to the sack of rice and a smaller sack of lentils that supports a family from Chad for a similar length of time
Admittedly the lads from Chad are refugees but the local family is not conspicuous for extent of its larder. The last link displays at least a dozen countries by week-food. It may be more difficult to live on $4 / day in NYC than it is to live on $1 a day in Chad. It's well known to obesitologists that it's cheaper to eat badly in America than to eat well.

What about a wider exploration of standard of living round the world - the stuff that folks have is a better indicator of relative wealth than comparing income at the current US$ rate of exchange. We all know food is cheaper in Chad than in German. Dau.II sent us a link to Dollar Street at Gapminder.  You'll be familiar with Gapminder at least in its spokesman Hans "Data-display" Rosling, who died a year ago [obit]. Here is Anna Rönnlund explaining how Dollar Street works. The idea is to send photographers out to 250+ family homes in 50 countries to capture a specific list of material goods which you more or less have to possess as a member of the human race. Most homes have something to keep the rain off; something to keep stray dogs out; a place to sleep and a place to sit; a place to cook food, and something to eat it with; somewhere to shit; a pair of shoes; paper and something to write with; something cherished. Dollar Street is all clickable and comparable.

The quality and abundance of the items enumerated and recorded is dictated far more by economics than by geography. A middle class bedroom is essentially the same in China as in Canada and very different from the same thing in a house on the other side of the tracks in the same town. None of this is exactly surprising but it brought me up short (and grateful) aNNyway.

Sunday, 11 February 2018

International Day of Women and Girls in Science - Today

Charles Darwin casts a long shadow which has occluded the fact that the day before his birthday - Today! - is International Day of Women and Girls in Science IDOWAGIS
aka Международный день женщин и девочек в науке
aka  Dia Internacional de la Mujer y la Niña en la Ciencia
- Irish link - UN link - Official site.

The day was occluded for Darwin-obsessed me but not for my pal Russ who is a little more in tune with the outside world. But really! What ever next will the UN allocate a day for? It's getting as crowded as the Catholic calendar of saints where most days are doubled- and trebled-up with saints you've heard of . . . and a rake of other niche saints of whom only Cardinals have heard.  Today, for example is the feast of Saints: Benedict of Aniane, Blaise, Cædmon, Gobnait, Gregory II and the Feast of Our Lady of Lourdes.

Which of these is a UNreal thing?:
  • International Day of potters and weavers
  • International Day of bankers, accountants and actuaries
  • International Day of dowsers and well-diggers
  • International Day of dancers, skaters and acrobats
  • International Day of jugglers and snake-charmers
You are requested and required to celebrate IDOWAGIS this afternoon. If you know any young women then pick a story, any story, from my Women in Science Series - there are 64=2^6 of them - and read it aloud with her. If 1 out of every 10 or 20 girls thinks "I could do that" then it will have been a huge win. To a close approximate nobody pageviews that list: only about 0.005% of my traffic hits that page. Which is a surprise and a shame because some of those stories are ripping and inspirational.

Providentially and with great synchronicity (me and women-of-science must love each other very much) last week, I downloaded & filleted my list of Blobs about Women in Science and converted it into a PDF. My printer driver at work allows you to print PDFs in 2pages/side booklet format, so I converted it to 14pt to take account of the reduction that entails at printing. It runs to 60,000 words or about 100 pages and 2.3 Mb. I'll send an e-copy to the first 20 people who ask at Or by leaving a comment here. You have until 16th Feb 2018.

Here's the introduction:
I started blogging for Ireland about five years ago, when I secured a new job at an Institute of Technology. That first term I was [t]asked to teach classes in Human Physiology, Water Chemistry, Climate Change, Computing, Remedial Maths, Physics, Chemistry, Molecular Biology, & Food Microbiology. All of which were a long way from bioinformatics, genome sequence analysis and gene discovery: fields in which I had published some contributions. I got the job through a peculiar combination of retirements, maternity leave and consolidation of part-time appointments. In prepping material to keep one step ahead of the students, I have travelled widely down the by-ways of science. I called my blog Science Matters, because it does. There I have recorded my trials and triumphs in the class room; and with hindsight there was lots of don’t do this at home or at work kids. I’ve also noted interesting stuff along the way; some of the material I come across gets incorporated into the curriculum but much more exists only on The Blob. After several months of following my own interests I realised that I had often been surprised and delighted by the contributions to science by women. So I created an executive summary / index of the pieces I written where women had a starring role and here I’ve filleted The Blob to concentrate in one place those essays. I am disappointed at how little traction that list has gained because I think we need to have better stories and better role models for young women in science. If one of the following stories makes one girl choose to dip her toe in the ocean of science, then it will all have been worth while. There are enough accountants and farmers; pop stars and pilots; commis chefs and waitrons . . . but you can never have enough scientists.