A network analysis of the texts in books of hours

The research project aims to answer the following questions:

Which texts co-occur in books?

Collect all texts they share (through an analysis of the paths; betweenness centrality)

Which texts are unique to specific books?

Are there differences per century?

Handschriften die samen voorkomen: kortste pad van 3 Tel het aantal paden Maak gephi-bestand Voor totaal en per eeuw

Visualisatie in heatmap

Identificeer alle handschriften die maar een keer voorkomen

Creating the data set

To address this question, we firstly select the data we work with. The data have all been exported from the BNM-i, a database contructed by the Huyghens ING. The texts have all been saved as separate JSON files. 41902 files have been dowbloaded in total. The code below selects the texts which have been assigned a category containing the words 'getijden' or 'gebeden'. This is the case for 5437 texts.

A further selection takes place. We focus exclusively on the texts that have been assigned a standardised title.

The code below navigates across all the texts, and establishes the carriers (or the books) these texts are in.

To be able to perform network analysis, we create an edges file and a nodes file. The nodes in this multimodal network are the texts and the books these texts are in. Note that the edges are directed: they represent the notion that a text occurs in a book.

The collections of nodes are represented as Pandas data frames.

Network analysis

Now that we have all the nodes and the edges, we are ready to perform the network analysis. We firstly create a network of all the nodes. Texts are shown in orange, and the books are shown in blue. The visualisation reveals that there are a number of texts that appear in many different books. This is the case for 'Teksten op naam van Bernard Clairvaux' and 'Teksten op naam van Augustinus'.

We can analyse the networks in Python using the networkx package.

We want to establish the texts that co-occur in a book.

This network can be plotted. The visualisation displays all the texts that cooccur in one or more books. It looks as if there are a number of 'cliques' consisting of texts that appear together.

The information about the intensity of the cooccurrences (i.e. how often often do two diffent texts cooccur?) can be visualised by varying the thickness of the edges. Such a visualisation can also created in Gephi. The network should be imported as a non-directed graph.

The cell below generates the CSV files that can be used for this purpose.

The cell below identifies the texts that occur most frequently with other texts.

The following texts are unique in the network.

Which books do these texts appear in?

Some other analyses