Data, we have no shortage of in digital communications – meaning can be harder come by much of the time. In listening to online conversations, very often sentiment analysis is the stand-in for meaning, even though it is flawed and hard to verify without human intervention.
Reading about Stanford Literary Lab‘s distant reading method today got me thinking about that problem. Distant reading is data analysis of literature – computers can learn to spot genres for instance:
People recognize, say, Gothic literature based on castles, revenants, brooding atmospheres, and the greater frequency of words like “tremble” and “ruin.” Computers recognize Gothic literature based on the greater frequency of words like . . . “the.” Now, that’s interesting. It suggests that genres “possess distinctive features at every possible scale of analysis.” More important for the Lit Lab, it suggests that there are formal aspects of literature that people, unaided, cannot detect.
Naturally they look at networks and relationships between words – the method…
…turns characters into nodes (“vertices” in network theory) and their verbal exchanges into connections (“edges”). A lot goes by the wayside in this transformation, including the content of those exchanges and all of Hamlet’s soliloquies (i.e., all interior experience); the plot, so to speak, thins. But Moretti claims his networks “make visible specific ‘regions’ within the plot” and enable experimentation. (What happens to Hamlet if you remove Horatio?)
It looks like CrisisVu, a Twitter monitoring service may also be thinking along these lines.
: : Also worth reading is the Los Angeles Review of Books article Literature is not Data: Against Digital Humanities (hat tip to Andrew Sullivan).