In a few previous posts I’ve stressed the difference between information and meaning (which I picked up from Claude Shannon, the father of information theory) and some of its implications. For example, in this post I pointed out that Shannon’s separation of meaning and information is compatible with structuralist and post-structuralist theories which maintain that there is no inherent meaning in the text. (I’ve also had to deal with it in the course of digitizing a book – see here). Work on Artificial Intelligence has tended to reinforce this distinction: computers are very good at processing information but not very good at understanding meaning.
But last week Bill Turkel wrote a post which turned my understanding of the meaning/information dichotomy on its head. This isn’t such a new development as it’s following on from a post he wrote in March 2006, and that was inspired by an article by Rudi Cilibrasi and Paul Vitányi published in 2005. There’s a lot of mathematical stuff about compression algorithms which I can’t claim to understand, but the schwerpunkt is that without understanding anything about meaning, computers can compare similarities in the information content of texts and cluster them accordingly. The result is patterns that make sense to humans who can understand the meaning of the text. Bill’s example used entries from the Canadian Dictionary of National Biography, finding geographical and chronological clusters of entries.
Despite the attention grabbing title of my post, the distinction between information and meaning isn’t a false one. However, these experiments show that in practice the relationship between information and meaning within the context of a particular linguistic/cultural system is not as arbitrary and unpredictable as theorizing might suggest. Does this mean that structuralism could make a comeback against post-structuralism? Or do we need to move beyond both of those things and find a new way to think about text? Whatever the implications for theory, this is an exciting development which promises to be very useful in practice.
Last week I posted some thoughts in response to the discussions at A Historian’s Craft and Civil War Memory about history and philosophy. In that post I took some of the philosophical problems that affect history and tried to restate them in scientific terms. As Brett pointed out, this really amounted to stating the obvious in fairly uncontroversial terms, but I think that was worth doing in order to bypass the unproductive hostility between both extremes in the postmodernism wars (although the extent to which those extremes even exist is debatable). Whether the major problems we face as historians are philosophical, scientific, or a bit of both, the question remains: how much time should we spend thinking about these problems? In this post I’ll be discussing that question, but I have to warn you in advance that I can’t answer it. So there might not be much point reading any further…
[posted by Gavin Robinson, 4:49 pm, 5 February 2007]
In my previous post about theories of digital text, I used Shannon’s communication theory to divide text into information and meaning, and then talked exclusively about text as information: a sequence of characters selected from a finite set. That allowed me to concentrate on one part of the problem, while excluding the more difficult problems associated with meaning. In this post, I’ll be trying to tackle some of the problems of meaning, while still trying to avoid as many as I can. I will also continue to avoid offering concrete definitions of “text” and “a text”, mainly because I haven’t found any satisfactory definitions yet, but I won’t be able to avoid using the word “text”.
[posted by Gavin Robinson, 5:07 pm, 2 February 2007]
As the next stage of my Digital History Projects I’ve been doing background reading and thinking about the theory of text. This week I’ve read Schreibman, Siemens, and Unsworth A Companion To Digital Humanities (2004); Burnard, O’Brien, O’Keeffe, and Unsworth Electronic Textual Editing (2006); Susan Hockey Electronic Texts in the Humanities (2000); and C. E. Shannon ‘A Mathematical Theory of Communication’ (1948). I can’t say that I understood everything (especially Shannon’s equations and Jerome McGann’s pretentious jargon) but it’s given me a lot to think about, and things are nowhere near as simple as I first assumed.