Tuesday, 8 November 2016

Class(ify) wars

You can manage quite well without a Theory of Everything, but if your interests are diverse and you have a lot of stuff (doesn't have to be physical, could be Blob posts) it sure is handy to have a way to find things. Especially if, like me, you now have a two-week event horizon. When I was born, indeed right up until about 1980, information was stored as physical objects - books, journals, paintings, photographs - and you didn't have to be OCD to see that it was useful to put similar things in the same bin / shelf / room. If you're in Europe or you live in Hicksville, Iowa, your library will use the Dewey Decimal System DDC to classify its books / the world. If you live in Washington DC your library The Library of Congress will use a different scheme LCC. The terminal C in both cases stands for Classification. Because both schemes were invented and developed by 19thC librarians - Herbert Putnam with help from Charles A. Cutter for LCC; and Melvil Dewey for DDC - they have a particular Encyclopedia Britannica world-view which seemed to think that Civil War Generals all needed a full couple of pages but each of 10 million species of beetle could be dealt with in a single article called Invertebrates [previo-rant on the subject of encyclopedic balance].

Both systems are hierarchical;  but LCC starts with leading letters while DDC is constrained by 9 uber-digits to start the bins rolling. There is, of course, some overlap because in 19thC there was a concept called Geography, almost a tangible thing, and everyone was happy about where geography ended and history or science began. But DDC treats Geography 910 as a subset of History in the 900s; while LCC bins it in the Gs with Anthropology and Recreation. I'm a lot less certain about the boundaries of geography than Putnam or Dewey: where does biogeography stop being science?; heck where do you put ecology? Clearly there wasn't much need to allocate library shelving to space flight in 1902, except to put copies of novels by Jules Verne and HG Wells. Now, libraries need to find room for such material, not only on shelves but also in the catalogue.  Squeezing a lot of completely new post-19thC information in can mean dauntingly long accession numbers, which have to be put on the back of each relevant book. This problem is more pronounced in Dewey because their Universe was set by the 9 leading numbers in 1876. LCC has subsequently created a bin for 'Technology'.

If the Universe is small, like subjects available for the Leaving Certificate, it's easy enough to have a fudge and file Home Economics under both Applied Science and Social Studies. Although, of course the physical book has to go on one shelf or the other: no library is going to buy two copies of All In The Cooking The HomeEc textbook because of a classification problem. And the librarian has to point the punter at the right shelf; at least until everything is transferred to Kindle. But with larger datasets you have to make a choice and that is one of the fundamental skill-sets of librarians.

Philosophically there is a difference. Dewey imagined that he was setting up a system to classify all knowledge (the little we knew then) whereas the LCC is just a catalog of what's in the LoC.  That is pretty stonking big - nearly 25 million books and 15 million serials, newspapers, monographs and sheet music.  With Google's plan to scan all the books in all the libraries of the world, this heap of writing is now searchable on-line.  What do we know that isn't written down? Is it possible to classify [make sense of; put into context; relate to other data] it? Not so much?  So the universe of knowledge to a degree is the contents of the LoC - with a little help from other very large libraries.

To a certain extent the war between DDC and LCC is artificial and superficial but so much is invested in both systems now that it would be hard to merge them or ditch one in favour of the other. A researcher at LoC in the 60s developed MARC a digital electronic cataloging system which made the life of regional librarians much easier - they could use MARC wholesale.  That could have given LCC a VHS vs Betamax or EBCDIC vs ASCII advantage to sweep Dewey into oblivion but it ain't happened yet. So we're stuck with both and you'll need a conversion chart:

000 QA, A Computers, gen. knowledge
100 B Psychology & Philosophy
200BL Religion
300 H, J, K, L Social sciences
400 P Language
500 Q Science and Maths
600 T, R, S Technology, Appl Science
700 N, GV Arts
800 P Literature
900 G, D, F Geography & History
I'm a Dewey Boy, myself, so I've put the table in Dewey order. And just to illustrate the hierarchical nature of these things, let's drill down one level in the 500s:
500 Q Natural Science
510 QA Mathematics
520 QB Astronomy
530 QC Physics
540 QD Chemistry
550 QE Geology
560 QE701 Paleontology
570 QH301 Biology
580 QK Botany
590 QL Zoology
Modern science wouldn't agree with this 19thC worldview at all at all. Dumping microbiology as subset 576 of biology rather than giving it parity of esteem with animals and plants is essentially giving the microbial world parity of esteem with birds 598. Now, birds are very pretty but there are more species of microbe in my intestine than there are birds in the whole world. Even the most objective criteria for any classification are thus clearly constrained by the time and place they were dreamed up. Which is fine, but don't leave here thinking that DDC or LCC are objective and value-neutral.

