I have spent a good portion of the last week rehearsing and performing Handel’s Messiah with the Duke University Chapel Choir, so it’s on my mind for obvious reasons.  During one rehearsal, our director was encouraging us to emphasize the word “glory” and made the assertion that glory is the most frequently used word in Messiah.  Given that my library science studies — and more specifically, a digital humanities class — exposed me to the existence of software programs that facilitate word frequency counts, I decided to verify this assertion.  So I scanned the libretto into Adobe Acrobat Pro, cleaned up the text that came through its OCR process, and plugged it into the Character and Word Counter with Frequency Statistics Calculator.  Below is a link to the results in a PDF file, along with a word cloud generated at Word Cloud Generator.  (NOTE: a libretto is printed to help an audience follow the words that are being sung, but it does not generally include every occurrence of a sung word, as the text might differ slightly from one voice part to another)

Messiah words  Unique words: 483  Total words: 1485

Messiah word cloud

Messiah word cloud

In addition to being an interesting exercise for a performance week, doing a word frequency count on the text of Messiah made me stop to think about the various ways that data can be reused.  Obviously, those creating the program for the concert did not anticipate my doing this exercise, but with a clear copy of a program that could be scanned, I was able to complete this task with relatively little effort.  But what about the items that are housed in archives of various sorts — do archivists have a responsibility to make these items available for reuse?  And is it sufficient for archives to react to requests for use, such as those from the digital humanities community, or should archives be proactive in anticipating and/or suggesting other uses for the records which they house?  (For an example of how digital humanists have presented online items that were originally analog records in an archives, see the William Blake Archive.)  Given that most archives have excessive backlogs that they can’t afford to process, I don’t imagine that archivists will be devoting a lot of time to brainstorming different ways to utilize their records.  But I think it warrants a look by special collections archivists to what scientists and social scientists are doing to preserve and share datasets in data repositories like the Inter-university Consortium for Political and Social Research (ICPSR), the Odum Institute Dataverse Network, and the Dryad Digital Repository, to name a few.  Encouraging scholarship has always been at the heart of the work of special collections archives, so it’s time to embrace the possibility of new ways to facilitate that scholarship.

Advertisements