An Exploratory Tour of Thomas Gray's English poems

Introduction

This is a Spyral notebook, a dynamic document that combines writing, code and data in service of reading, analyzing and interpreting digital texts, in our case the English poems of eighteenth-century poet Thomas Gray (1716-1771). Researchers, students, and teachers can use Spyral notebooks to combine textual notes, commentary, and ideas with live code, thus facilitating hybrid computationally-augmented analyses and interpretations.

Spyral notebooks are web-based documents that offer a literate programming model of writing combining text cells (like this one) and code cells (like the one below). You can click on the cells to edit them and add new cells by clicking add icon that appears in the left column when hovering over a cell. At any point a click on a content item (word, phrase, line) will take you into the Voyant Tools environment.

In this Spyral notebook, we use corpus analytical techniques for the study of a particular poem or corpus of poems, i.e. intra-textually rather than across corpora, in order to facilitate our analysis and description. It represents one of a number of complementary quantitative and qualitative approaches to the study of literary texts.

The purpose of this exercise is not to represent finished analyses of individual texts or the corpus as a whole, but to hint at exploratory avenues that can be adapted and pursued at the reader's own pace and guided by his or her own specific research questions and interests.

Corpus

We have imported Gray's 49 English poems into Voyant Tools as a specialized corpus (literally, a body of text), i.e. containing a collection of texts of a particular type and a particular language. Let's load it and give it a more easily readable ID "gray":

 
loadCorpus("45ebe1b0483492b4f4dc4279f35b8486").assign("gray");

loadCorpus("45ebe1b0483492b4f4dc4279f35b8486").assign("gray");

gray: Object{1} corpusid: "45ebe1b0483492b4f4dc4279f35b8486"

Summary

Now that we have the corpus as an object, we can use any of the Spyral notebook functions to traverse and analyse the corpus. We can for example ask for a brief summary of our corpus, a list of the constituent texts, or even an initial visual representation as a wordcloud in which the relative size of terms will express frequency in the document.

xxxxxxxxxx
 
gray.summary()

gray.summary()

"This corpus (45ebe1b0483492b4f4dc4279f35b8486) has 49 documents with 16,575 total words and 4,594 unique word forms."

xxxxxxxxxx
 
gray.tool('Documents', {width:"100%"})

gray.tool('Documents', {width:"100%"})

You can jump back into the Archive's version of the poems by simply adding the 4-letter poem code to this URL:

https://www.thomasgray.org/texts/poems/{code}, e.g. https://www.thomasgray.org/texts/poems/elcc for the "Elegy Written in a Country Churchyard".

xxxxxxxxxx
 
gray.tool('Cirrus', {width:"100%",height:"500px"})

gray.tool('Cirrus', {width:"100%",height:"500px"})

Reading

Next, let's explore the corpus by calling up the Reader tool, which helps us navigate through the 49 constituent poems of the corpus (using the panel at the bottom), and let's us (as the name of the tool indicates) read them. Conveniently, it also allows us to highlight any of the words in the corpus. This is a first exploratory avenue into a text and into the corpus as a whole.

Whole corpus

To launch the Reader tool for the entire corpus, we can simply use the assigned corpus ID:

xxxxxxxxxx
 
gray.tool('Reader', {width:"100%",height:"650px"})

gray.tool('Reader', {width:"100%",height:"650px"})

It is always a good idea to investigate the available options in each of the tools offered in Spyral. You can find a tool's configuration options in the top right corner, and any modification options to the queries or display options at the bottom. In the above Reader tool try out the Highlight Entities option at the bottom right of the tool.

Individual poems

If at any point you want to zero in on a particular poem (or group of poems), you can simply select them from the corpus via the Catalogue tool.

xxxxxxxxxx
 
gray.tool( "Catalogue", {height:"650px",width:"100%"})

gray.tool( "Catalogue", {height:"650px",width:"100%"})

The Microsearch tool is a useful starting point to obtain a first impression about the distribution of terms in individual poems of a corpus. Each document in the corpus is represented as a vertical block where the height of the block indicates the relative size of the document compared to others in the corpus. The location of occurrences of search terms is located as red blocks (the brightness of the red further indicates the relative frequency within the corpus). Multiple search terms are collapsed together. Let us try it for instances of (direct) address in Gray's poems:

xxxxxxxxxx
 
gray.tool("Microsearch", {query:"you,ye,thou,thee", width:"100%",height:"300px"} )

gray.tool("Microsearch", {query:"you,ye,thou,thee", width:"100%",height:"300px"} )

Analyzing

Analysing a corpus computationally can highlight trends, patterns, and usages in ways that would be more difficult or virtually impossible to perceive as a human reader and can thus enhance the analysis of literary discourse. A small corpus like Gray's English poems has the advantage of allowing for a much closer link between the corpus and the contexts in which the constituent documents were produced and highlighting patterns of language use specific to Gray. That said, you will gain more from the observations provided by these tools by having a solid understanding of the type and language of the texts in the corpus (thus partially mitigating the loss of social and discursive context in many of the tools). If you are not familiar with Gray's poems, we would strongly encourage you to read some of his verse before progressing: Gray is not only an exceptionally versatile poet but also, on occasion and contrary to his reputation, very funny.

Topics

One approach into a corpus of possibly less-familiar texts is via a topic model. As a method topic modelling treats the poems as bags of words. Its main strength is to trace thematic clusters in a corpus through the co-occurrence of words in these bags. The Topics tool provides a rudimentary way of generating term clusters from a document or corpus and then seeing how each topic (term cluster) is distributed across the document or corpus. In Spyral's topic modelling implementation you can browse the topics that were identified on the left and the dominant topics (here a score is given to each topic) by poem on the right. Feel free to adapt the number of terms displayed in each topic as well as the overall number of topics computed in the tool options below.

xxxxxxxxxx
 
gray.tool("Topics", {topics:32,width:"100%",height:"500px"} )

gray.tool("Topics", {topics:32,width:"100%",height:"500px"} )

Frequency

In the initial stages of a quantitative exploration of texts and corpora, frequency plays an important role. Frequency is most useful when comparing corpora, but in can also reveal interesting aspects of vocabulary use, register, or subject matter. By default, Spyral's CorpusTerms tool list only contains lexical words. Functional (or grammar) words are filtered out by a stop words list. You can change this behaviour in the settings at the top right of the tool: select "Stopwords: None" to get a complete list of words.

xxxxxxxxxx
 
gray.tool('CorpusTerms', {width:"100%"})

gray.tool('CorpusTerms', {width:"100%"})

In a small corpus like Gray's English poems, the number of hapaxes (words that occur only once) and ratio of types (different words or vocabulary) to tokens (total number of words) is comparatively higher than it would be in a bigger corpus. Basic calculations like these can offer a starting point for more sophisticated analyses that combine quantitative with qualitative approaches. In poetic texts, repetition for example constitutes a frequently employed rhetorical device and needs to be appraised in this context.

Frequency wordlist can be easily scanned for interesting phenomena, and in combination with concordance searches help to contextualize individual instances by their contexts of use.

Most tools can be used on either the corpus as a whole or any (selection) of the constituent poems. Simply use the document number (zero-based) to specify the poem you want to focus on. E.g. to see the DocumentTerms for just the "Elegy", you can use:

xxxxxxxxxx
 
gray.tool('DocumentTerms', {docIndex:10,stopList:"None",width:"100%"} )

gray.tool('DocumentTerms', {docIndex:10,stopList:"None",width:"100%"} )

Concordance

Another fruitful way of exploring a document or a corpus is via a concordancing tool, which allows us to zero in on the instances of use of individual lexical items or phrases in a text or corpus. This analysis, which provides us with co-occurrence information and is often referred to phraseology, allows us to detect any regularities or lexical and grammatical patterns (within the limitations of a small specialized corpus like ours). Below is a Key Word In Context (KWIC) concordance (context here in the sense of the immediately surrounding text to the left and right of the search term) for the word-forms starting with "care".

xxxxxxxxxx
 
gray.tool('Contexts', {query:"care*",stopList:"None",width:"100%"})

gray.tool('Contexts', {query:"care*",stopList:"None",width:"100%"})

The Contexts tool has a slider to adjust the amount of "context" that is shown for each occurrence, though we are still limited by the space available in the rows. If we want a true sense of the word's context, we can click on the plus sign in the left-most column, which will expand that particular occurrence.

Feel free to experiment with different words: simply enter them in the input field above. Notice that KWIC is implemented here as a stream-based rather than line-based concordance. For many of the tools in this notebook, there will be a choice of more specialized applications that offer many advanced features, e.g. the AntConc corpus analysis toolkit for concordancing and text analysis.

Word trees are another wonderful way of exploring the use of key words in different phrases in a corpus. The WordTree tool provides an interactive way to give a more compact view of words in their context of use than a traditional concordance (KWIC) does. You can traverse the branches of the tree on either side of a keyword you choose and apply any options to limit the number of concordance entries that are fetched, the number of branches that are shown on each side of the keyword, or the length of the context retrieved for branches.

xxxxxxxxxx
 
gray.tool('WordTree', {query:"eyes",stopList:"None",width:"100%"})

gray.tool('WordTree', {query:"eyes",stopList:"None",width:"100%"})

Collocation

The calculation of collocation, or the statistical tendency of words to co-occur, is a useful complementary method to concordances, as both reveal similar information, but the latter has the advantage of a statistical operation which can aid and focus the human observation. Spyral's CorpusCollocates tool outputs a table view of which terms appear more frequently in proximity to keywords across the entire corpus:

​x
 
gray.tool('CorpusCollocates', {width:"100%"});
​

gray.tool('CorpusCollocates', {width:"100%"});

Some of these collocates are instances of n-grams, i.e. uninterrupted sequences of words that tend to co-occur and that are syntactically and semantically meaningful. Longer phrases, often less significant as semantic units, but important as discourse indicators that can span across syntactic phrases are referred to as lexical bundles.

The CollocatesGraph tool represents keywords and terms that occur in close proximity as a force directed network graph. As such it represents a more suggestive way into a corpus, but requires greater scrutiny of individual occurrences to ensure they are meaningful for analytical purposes. You can fetch more collocates by increasing the value of the "Context" slider.

xxxxxxxxxx
 
gray.tool('CollocatesGraph', {width:"100%",height:"400px"});

gray.tool('CollocatesGraph', {width:"100%",height:"400px"});

Trends

Beyond term frequencies and collocates, it can be interesting to explore how evenly terms are distributed in a corpus or in a document – is a given term more present at the beginning, middle or end? As with term frequencies, we have access to a convenient visualization tool for term distributions with the Trends tool. We can apply Trends to the corpus in which case the term distributions across the whole corpus is shown, or to a single document (e.g. the "Elegy", see below), which will show term distributions within that document.

xxxxxxxxxx
 
gray.tool( "Trends", {query:"eye*, see, watch*, look*",width:"100%",height:"400px"})

gray.tool( "Trends", {query:"eye*, see, watch*, look*",width:"100%",height:"400px"})

xxxxxxxxxx
 
gray.tool( "Trends", {docIndex:10,width:"100%",height:"400px"})

gray.tool( "Trends", {docIndex:10,width:"100%",height:"400px"})

By default Trends will show the distribution of the top 5 terms in the "Elegy". In single document mode, we see the raw frequencies for 10 equal segments of text: Spyral divides the text into 10 equal parts and then counts frequencies in each part. We can use the "Segments" slider to get a finer-tune look at distribution, dividing the document into 20 segments, for instance.

Another useful visualization for distribution is TermsRadio since it shows more terms than the Trends tool. The TermsRadio is intended to scroll across the distributions, but we can request to see all of the bins at once using the "bins" and "visibleBins" parameters.

xxxxxxxxxx
 
gray.tool('TermsRadio', {bins:10,visibleBins:10,height:'600px',width:'100%'});

gray.tool('TermsRadio', {bins:10,visibleBins:10,height:'600px',width:'100%'});

Next Steps

There are many more tools and visualizations to explore in the complete list of tools offered in Spyral and we cannot demonstrate all of them in this application. However, as we continue to apply Spyral notebooks to different corpora, we hope to be able to demonstrate the real potential of the integrated notebook environment for analytical and interpretative processes. Please feel free to fork and adapt this notebook, which you can find in the Spyral catalogue of available notebooks.

Outlook

We are considering several avenues to expand and deepen our engagement with Spyral notebooks as an analytical tool:

Gray's oeuvre: expansion of the notebook to include Gray's non-English poems (Latin and Greek), his prose, and his correspondence, and adapting/adding new analytical tools tailored to different genres;
New corpora: expanding the notebook to include works by other (mid-)eighteenth-century poets, e.g. from the Eighteenth-Century Poetry Archive;
Parallel corpora: using the 'Gray's "Elegy" in translation'-project as a starting point to study apparent translation equivalents in two or more languages as well as changes over time;
Advanced analysis: implementing bespoke analytical tools in Spyral's JavaScript language.

Feedback

Please do not hesitate to feedback any thoughts, corrections, or ideas for improvements or expansion to the editor <huber@thomasgray.org>

Select Bibliography

The Routledge handbook of corpus linguistics / O'Keeffe, Anne and Michael McCarthy, eds. London: Routledge, 2010.

Corpus, concordance, collocation / Sinclair, John. Oxford: OUP, 1991.

Corpora in applied linguistics / Hunston, Susan. Cambridge: CUP, 2002.

Introducing electronic text analysis: a practical guide for language and literary studies / Adolphs, Svenja. London: Routledge, 2006.

The Art of Literary Text Analysis with Spyral Notebooks / Sinclair, Stéfan and Geoffrey Rockwell. www.voyant-tools.org 2019-2021.