Let them Speak

The peer-review project I have chosen is titled ‘Let them Speak’, it is a collaborative project using collections from three institutions: Yale Fortunoff Video Achieve, USC Shoah Foundation & Visual History Archive and the United States Holocaust Memorial Museum.

Dr. Gabor Mihaly Toth was the lead on this project. He used computational techniques to represent the mental, emotional and physical experiences that victims were likely to have experienced in the holocaust with the belief this would give insight into how those who did not survive felt at the time of the atrocity (Mihaly Toth 2021).,

The goal of the project was to give a voice to those murdered in the Holocaust by creating tangible documentation and representation of the victims suffering. To achieve his aim Toth used 2,700 oral history testimonies from holocaust survivors from the three collections. A tree visualisation was created whereby you click on the word to explore an experience domain e.g. fear.  This then leads onto believes containing textual fragments from testimonies of survivors, you can explore the fragments by reading or click to hear the fragments read or click links to videos testimonials.

The project was funded by Fortunoff Video Archive, USC Shoah Foundation, USC Viterbi School of Engineering and built-in collaboration with the Yale Digital Humanities Lab.

This I believe is a worthwhile project and has delivered its aim. The ‘Let them Speak’ project gives a unique insight into the atrocities during the Holocaust. It can be viewed here https://lts.fortunoff.library.yale.edu/

As with any credible research project that meets best practice there is a section on the project website called methodology, giving a comprehensive insight to the methods.  Three datasets were used as mentioned earlier, data was then processed to make them unified and cross searchable. The metadata categories were amended and, in some instances, created to make them unified across all three datasets. Some Phyton code was applied to streamline this down to five metadata categories. It was acknowledged that the original datasets were not amended therefore there may be some minor errors in this process.

Some examples include removing some strings from the datasets e.g. concentration camp and ghetto, these were removed using computational techniques, creating a more searchable dataset. Another example of an issue when using multiple datasets are orthographic variants in the metadata for example this camp name “Litomerice”, “Litoměřice”, “Litoměrice”. It was decided to keep the one used in the English language Wikipedia entry. The datasets including metadata are made available on the project website which demonstrates a well thought out planning process.

Toth used linguistic annotation when creating the linguistic corpus to allow for optimal searching. This involved four main steps, sentencing splitting, tokenisation and part of speech tagging, and corpus construction. The Stanford Parser was used to support this by assigning a noun to the first sentence and a verb to the second. The dataset was then saved in xml format.

Testimonial fragments were addressed using data mining techniques. Issues had arose around non-native speakers using different vocabulary and the age old retrieving different sentences with the same meaning. Some strategies were used e.g. frequency of verbs, identifying verbs representing physical or emotional experience. Other methods included embedded word model and topic modelling.

A comprehensive description of the computational techniques used to deploy the project and create the end-user experience.  The bespoke digital platform is empowered by Docker technology on Linux servers. The code is made available on Github which is in keeping with the ethos of collaboration and sharing in this project. It uses BlackLab, a java-based search engine for the corpus and this also offers the REST interface. Toth includes the reasons for his technology choices and links to further information.

It is welcome to see such transparency in regard to computational techniques and platforms used. There is a lot that can be learned by novices to digital humanities research a category to which I include myself. The inclusion of essays outlining the processes is a welcome support for those wanting to learn more about the process of developing digital experiences such as this.

I believe this project was very successful. It collated multiple datasets into an easy-to-use interface for the end user. It managed to get the data linked to set of 27 keywords achieving an environment that is clean and not overwhelming to browse, it is in my opinion sustainable. This was achieved using extensive computational techniques which I believe will stand the test of time, no more so that an xml dataset available on the site for download. The project owners have full control of servers and created their own platform all of which support my belief this is a project which has used sustainable technologies.

The topic was worthy of more investigation and the title was well chosen and imparts the aura of the project ‘Let them Speak’.

Some other notable things that make this project notable are the supplemental addition of a searchable linguistic corpus. The fact you can simply choose to read or explore further and watch videos or listen to voice memos makes for good interaction opportunities for the end user. The cross-institutional co-operation is an example of how collaboration should be done. Limitations were acknowledged which is important to note when using coding on large datasets, linguistic notation and the Stanford Parser. The transparency of all computational techniques, down to how the platform was build and availability of datasets adds to the credibility of this project.

For recommendations I debated making some up as I really don’t think this project could have been carried out any better. If I had one bit of critique it is on the home screen where the font of the body text being in italics, it would be more inclusive to perhaps look at this choice so as not to exclude viewers for example those with dyslexia “italic fonts decreased reading performance” (‘6 Surprising Bad Practices That Hurt Dyslexic Users’ 2011).

This project is an example of how digital humanities projects should be done. It showcases best practice and has been well planned and implemented, showcasing digital humanities at its best. It provides us as humans an opportunity to honour those who lost their lives no matter how horrific and provides an opportunity to reflect and remember that these people lived, loved and had a voice and this project does indeed let them speak. It gives a voice to the voiceless through our interaction with pieces of collective suffering. To the drowned…


‘6 Surprising Bad Practices That Hurt Dyslexic Users’. 2011. 24 January 2011. https://uxmovement.com/content/6-surprising-bad-practices-that-hurt-dyslexic-users/.

Anderton, Lea. n.d. ‘Subject Guides: Digital Humanities: Examples’. Accessed 29 March 2022. https://subjects.library.manchester.ac.uk/digitalhumanities/examples.

Mihaly Toth, Gabor. 2021. ‘Let Them Speak’. Let Them Speak. 2021. https://lts.fortunoff.library.yale.edu/.