Text-mining

Over the last two decades, text-mining became part and parcel of many academic disciplines, but largely failed to have a fundamental scholarly impact on History. Admittedly, Digital History (and its methods) is gaining traction, but the historical mainstream remains overall unimpressed by the enthusiasm of the computational evangelists. This applies all the more to the branches of History which involve textual sources.

The discrepancy between History and many of the Social Sciences, partly lays in the way historians craft their arguments. Quantification and pattern extraction clashes with traditional hermeneutics, which builds its interpretation of texts on close.

And why, one can rightfully assert, should historians prefer an exegesis of completely decontextualized text-statistics over the rigorous, manual scrutiny? Finding a convincing replying to this assertion forms the core of our research: what do we actually learn from counting (the most basic operation in text mining)? How can we relate the methods and findings of computational history to the existing historiography?

Each of the projects outlined below combines methodological reflection with substantive historical research questions:

– Representation and Gender, a Digital Humanities Approach: Over last the decennia, the rising tide of gender equality lifted many boats. In most democracies, more women gained access to national legislatures, with some countries achieving almost gender balanced parliaments. But to what extent is the increasing numerical presence conducive to women’s “substantive” representation, the advancement of women’s issues and perspectives? Measuring “representation” via data-mining remains a complicated methodological issue, as capturing such complex phenomena by extracting discursive patterns needs to be theoretically well-informed: can “women’s” issues be subsumed by feminist policy goals? If yes, how to detect these? Or should we prioritize the actual legislative behaviour of women MPs? What is the place of ideology in women’s substantive representation? Should conservatives and progressives be given an equal voice?

– Historical Affect Computation: Emotion Mining asserted itself as a field of growing importance at the interface of Natural Language Processing and Machine Learning. Unfortunately, most of the existing tools are pitched towards the present and would fare poorly on, let’s say, Victorian political discourse. The aim of this project is twofold: Firstly to adapt sentiment-mining techniques to the past; invent strategies that make algorithms sensitive to the specific historical contexts. Secondly, to properly validate these emotion measures and assess whether these really track what historians expect.

– Conceptual Change: An oft-repeated objection against text-mining is that “words are not ideas”. The last couple of years, however, a stream of literature demonstrated how distributional semantics can offer efficient representations of word meaning, maybe even provide a model for tracking ideas over time. Moreover, these models can computationally identify semantic shifts over time. This project investigates how distributional semantics ties in with “Begriffsgeschichte”, i.e. how computational models of meaning align with interpretations based on close reading. As a case study, we focus on the history of “democracy” in parliamentary and media discourse. The case should establish the best practices, and define the benefits and limitations of a computational “Begriffsgeschichte”.