“The role of computational techniques for cultural analytics in the arts and humanities, paying particular attention to the affordances and limitations of macro-analysis”.
The arts and humanities finds itself entangled in the era of “Big Data” bringing about the need to critically assess how technology affects our understanding of knowledge and how it alters our epistemic processes. The challenge is how can we embrace and employ technologies in scholarly investigation with the proviso it adds to the process and doesn’t over complicate it.
Some humanists are perhaps overcautious in this regard and are worried about an over reliance of computational techniques with others having inflated or perhaps unrealistic expectations of what computational techniques can offer the arts & humanities. This belief is echoed by Schäfer & Van Es who cite Anderson 2008; Schmidt & Cohen, 2014 and “Although data sets can provide new insights that offer opportunities for fine-grained detail previously not available, their possibilities are frequently overestimated (Schäfer & Van Es, 2016).
The balanced approach to take is to proceed with caution as with any research and plan the processes thoroughly. If the digital tool will add to the research, use it, if it doesn’t then don’t, it really is that simplistic. Hoover states “Decisions about what to count can be obvious, problematic or extremely difficult, and poor initial choices can lead to wasted effort and worthless results.” (Hoover 2008).
The concept Cultural Analytics developed in 2005 can be defined as “Cultural Analytics is the analysis of massive cultural datasets and flows using computational and visualization techniques” (Manovich, 2007).
The process of looking at culture and representing it in a quantitative dataset is a relatively new concept. Traditionally qualitative methods were used for cultural research projects. Now the opportunity is there to use computational techniques to transform these corps to quantitative datasets.
This raises important questions for epistemology when it comes to knowledge production and analysis. It in some cases can offer opportunities to answer old questions, an example being research carried out in UCC to discover who wrote Wuthering Heights. Many claim it was not Emily Brontë but her brother Branwell. McCarty and Sullivan used Stylometric Analysis tools. Sullivan explain stylometry as a statistical technique which indicates likely authorship, forming an impression of how the author writes by counting the frequency of the 100 most used words over sample texts. A series of visualisations were used to show the results. I found the research to be thorough. It used the first edition of Wuthering Heights to avoid any editing changes skewing the results. They also referenced gender biases which strengthened the findings that Branwell was not the author (McCarthy, O’Sullivan, 2021).
The digital humanities try and answer these questions using a wide range of methods such as Distant Reading, Short form Humanities+, Cultural Analytics, Data Analysis, Cluster Analysis, Data mining and Digital Methods (Berry & Fagerjord, 2017). The term Digital Methods applies to text which was “Born Digital” as well as traditional humanities text.
The humanities have seen an emphasis towards ‘Datafication’ ie. transforming everything into data that can be quantified. Traditionally it used qualitative data this does not dimmish the worth of quantitative data. “Quantitative approaches … represent elements or characteristics of … texts numerically, applying the powerful, accurate, and widely accepted methods of mathematics to measurement, classification, and analysis’ (Hoover, 2013). It is necessary to look at the opportunities this offers.
It is necessary to introduce the concept Macroanalysis. In the digital humanities this is the intersection between close reading and distant reading. Close reading is as it sounds reading the piece of text eg. a book to find some deeper meaning. The meaning derived from the close reading is the reader’s interpretation of the passage or text.
Distant reading is when computational techniques are used to analyse the data. Franco provides a useful definition is “Distant reading … allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems…..less is more. If we want to understand the system in its entirety, we must accept losing something” (Franco, 2013). Whilst acknowledging we may loose something from the text we have the potential to find something new.
Humanists need think about ways to carry out on their research. Many factors will need to be considered eg. time, resources and when analysing a corps you wish turn it into quantitative dataset it is essential to ask some key questions:
- What is the technology enabling me to do?
- Is the technology adding to the text?
- Is this something that my peers would recognise as worthwhile?
- Have I the necessary skillset to use the tool?
- Is this tool well supported by the community?
- Is it sustainable technology, will it become obsolete and research lost?
And a final piece of advice is if you don’t get the answers you expect, don’t assume its factual check for errors.
There is little or no doubt that humanists should consider using computational techniques for cultural analytics. Manovich stated “We are living through an exponential explosion in the amounts of data we are generating, capturing, analysing, visualizing, and storing – including cultural content.” (Manovich, 2017). In the world of datafication realistically how many texts can be close read. Computational techniques have become essential in these projects to narrow down datasets to a representative sample, impossible without technology.
The arts and humanities have traditionally tended to make qualitative claims, computational techniques has offered the opportunity to provide quantitative data to support qualitative data claims by providing new forms of empirical evidence.
Manovich writes about the opportunities provided for cultural analytics to quantitatively analyse the structure of datasets and visualise the results “revealing the patterns which lie below the unaided capacities of human perception and cognition”(Manovich,2017). This can be achieved by using techniques like most frequent word, concordance, topic modelling, first level coding, stylometry using tools like Voyant, Mallet and R.
Another key advantage is the options computational techniques offer in regards interpretation and representation of findings. Text analysis tools for example Voyant offer a user-friendly platform to create charts like a distribution chart, KWIC or word cloud. These visualisations can be altered to show different scenarios based on topics of your choosing, offering multiple means of engagement with the data.
Despite the affordances of using analytic tools it is important to note that these tools do not replace the human interpreter but offer the opportunity to allow us to think through the data (Rockwell and Sinclair 2016). This significant truth can sometimes get lost with an overreliance on such tools.
The opposite is true also and there is a danger when not choosing to use computational techniques and relying solely on close reading is that it really is time consuming and may lead to a narrowing of text as they are open to interpretation of the reader. That of course is not to say distance reading has no place it does, but it is just something humanists need to be conscious of.
We should be aware that computers cannot read the text, rather they parse the text by analysing it into logical syntactic components eg. verbs. Distance reading is not reading, it is a contextless approach, something that doesn’t come natural to humanists.
Rockwell and Sinclair believe “Computers can do much more than ‘crunch numbers’. They can help us formalise claims and test them.” It’s essential the research question causes us to choose the technology and not the other way around. There is a real danger this could become a reality and something we need to avoid.
The arts & humanities has no prescriptive set of ‘laws’ unlike its scientific counterparts this is both an affordance as well as a limitation. This can bring about the existence of ‘Bad Data’ Pidd believes that most ‘bad data’ is actually created by ourselves and gives the example of British Newspapers 1600-1950. He states in his blog talking humanities “It has a low accuracy rate but remains one of the most frequently used resources according to Jisc” (Pidd, 2015). It we want or research to be data-driven we need to be mindful of the datasets we use, and allow time for fixing data to make it usable.
Sustainability is a major limitation to the use of computational techniques is the lack of platforms and those that are available will they be available to us in a decade what then will happen the research? Some Digital humanists take a cautious approach and only use tools that allow them to export datasets to an xml file for example. This I believe is where computer scientists can help by collaborating with DH and using their extensive programming skillsets to create new and sustainable technologies to support their research questions. Berry & Fagerjord points to a need for bringing together expertise from other disciplinary areas and states “Digital humanities is not and should not be understood as a ‘one size fits all’ discipline” (Berry & Fagerjord, 2017). There is no denying it, DH employ techniques from computer scientists to preform data analysis and we should welcome the collaboration in this regard.
Examining culture and indeed the humanities is far too complex of an area to rely solely on digital methods to answer its questions. However computational techniques offer the ability to ask different questions. Using computational techniques will unfortunately not allow you to answer your research questions faster but it will support you in this process so are worth including in your research projects when appropriate. You might even be lucky enough to answer an old research question using some of these tools for example the research around authorship of Wuthering Heights (McCarthy, O’Sullivan, 2021).
References
Berry, D., & Fagerjord, A. (2017). Digital Humanities: Knowledge and Critique in a Digital Age (1st ed., pp. 40-59). Cambridge: Polity Press.
Cummings, J., (2018). A world of difference: Myths and misconceptions about the TEI. Digital Scholarship in the Humanities, [online] 34(Supplement_1), pp.i58–i79. Available at: <https://doi.org/10.1093/llc/fqy071> [Accessed 19 February 2022].
Gartner, D., (2022). Metadata: giving new access to old books and images – Talking Humanities. [online] Talking Humanities. Available at: <https://talkinghumanities.blogs.sas.ac.uk/2022/02/04/metadata-giving-new-access-to-old-books-and-images/> [Accessed 19 February 2022].
Hoover, D., (2013). Textual Analysis | Literary Studies in the Digital Age. [online] mla.commons.org. Available at: <https://dlsanthology.mla.hcommons.org/textual-analysis/> [Accessed 19 February 2022].
Hoover, D., (2013). Quantitative analysis and literacy studies | A companion to digital literacy studies, ed. Ray Siemens and Susan Schreibman. Malden: Blackwell Publishing.
Jockers, M., (2013). Macroanalysis | Digital Methods and Literacy History. 1st ed. Illinois: University of Illinois Press.
Manovich, Lev. (2016). “The Science of Culture? Social Computing, Digital Humanities and Cultural Analytics.” [online] Journal of Cultural Analytics. Available at: <https://culturalanalytics.org/article/11060> [Accessed 20 February 2022]
Manovich, Lev (2017). Cultural Analytics, Social Computing and Digital Humanities. In: Mirko Tobias Schäfer, Karin van Es (Hg.): The Datafied Society: Studying Culture through Data. Amsterdam: Amsterdam University Press S. 55–68. DOI: <https://doi.org/10.25969/mediarep/12514> [Accessed 20 February 2022]
McCarthy, R., O’Sullivan J (2021). Who wrote Wuthering Heights?, Digital Scholarship in the Humanities, Volume 36, Issue 2, June 2021, Pages 383–391, DOI: <https://doi.org/10.1093/llc/fqaa031> [Accessed 01 February 2022].
Pidd, M., (2015). Re-using bad data in the humanities – Talking Humanities. [online] Talking Humanities. Available at: <https://talkinghumanities.blogs.sas.ac.uk/2015/11/05/re-using-bad-data-in-the-humanities/> [Accessed 19 February 2022].
Rockwell, G., & Sinclair, S. (2016). The Measured Words: How Computers Analyze Text. Hermeneutica, 25-44. doi: 10.7551/mitpress/9780262034357.003.0002
Schäfer, M., & Van Es, K. (2017). The Datafied Society | Studying Culture Through Data (1st ed.). Utrecht, Netherlands: Amsterdam University Press.
Schöch, C., (2013). » Big? Smart? Clean? Messy? Data in the Humanities Journal of Digital Humanities. [online] Journalofdigitalhumanities.org. Available at: <http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/> [Accessed 19 February 2022].
Underwood, T., (2015). Seven ways humanists are using computers to understand text. [Blog] Using large digital libraries to advance literary history, Available at: <https://tedunderwood.com/2015/06/04/seven-ways-humanists-are-using-computers-to-understand-text/> [Accessed 19 February 2022].