Booktrade Correspondence Project:

Guidelines for the Transcription and Encoding of Primary Sources


 


Procedure:

1. Transcribe the text of the source as meticulously as possible (see General remarks about the transcription below).
2. Proofread the completed transcription against the source very carefully.
3. Provide editorial annotations (see Editorial annotations below).
4. Create a summary of the text.
5. Encode the transcription, using the instructions below for the TEI Header and the text of the letters. Note that these contain merely the briefest of abstracts, and only those most germane to the project, of the comprehensive instructions detailed in the fifth edition of the guidelines of the Text Encoding Initiative ("P5"). The following two chapters of the TEI Guidelines are the most directly relevant to the project: 11. Representation of Primary Sources (especially from 11.3 onwards), and 13. Names, Dates, People, and Places.
6. Save the file using the file naming conventions below.
 

A. General remarks about the encoded transcription

  • Please be as precise as you can: the result of your work will be used for research purposes.
  • Words are joined or hyphenated as they are found in the source. Punctuation is left as found, except that sentences always end with a full stop, even if you would have to insert it in a <supplied> element, or if it looks a little like a dash or comma.
  • If you need to use non-keyboard characters, please use the Unicode character charts. For example, "à" ("Latin small letter a with grave") is rendered as "&#xE0;", "é" ("Latin small letter e with acute") as "&#xE9;", and ß ("Latin small letter sharp s", or "Ringel-S" or "sz") as "&#xDF;".
  • Remove unused elements in the <body> of the TEI file, but not from the <teiHeader>.

B. The encoding

NB: by convention, in what follows, attribute names are preceded by the @-sign.

The TEI Header

The file description (fileDesc)

In the title element within the titleStmt in fileDesc, provide the title of the document in the following form:

[letter, draft letter, form letter (=Dutch 'circulaire'), postcard, picture postcard, prospectus, note, or telegram] from [Sender] to [Recipient], [date in English in the form dd Month yyyy]: a machine readable transcription

Personal names of senders and recipients are written using the following form: Firstnames, or initial(s) with full stop(s) not separated from each other by a space, followed by a space, followed by the surname.

Each letter that is transcribed for the Booktrade Corespondence Project must be given an ID. This file ID is created by combining the identifier of the sender of the letter with the letter's date. You need to place an underscore in between the identifier and the date. If possible, use a WikiData identifier for the sender of the letter. See below, under "Names". The letter's need to be given date in the form YYYYMMDD.

Record the file ID of the TEI File in idno, under <publicationStmt>. The file ID must also be captured in the @xml:id attribute of the TEI root element.

Example:
Sender: William Blackwood
Date: 7 Sept. 1892
File ID: WBLA_18920907

When a correspondent has sent more than one letter on the same day, use letters to distinguish the files. Examples: EFBO_18760613a.xml and EFBO_18760613b.xml were written by De Erven F. Bohn on the same day.

If you have difficulties creating a file ID using these instructions, please send a mail the BCP team at bdms.staff@gmail.com

Note that the file ID must also be used as the name of the file! The file is simply the file ID, combined with the .xml extention.

Example:
File ID: WBLA_18920907
File name: WBLA_18920907.xml

The source description (sourceDesc)

The title element within sourceDesc contains a bibliographic description of the document that has been transcribed, in the following form:

[letter, draft letter, form letter (=Dutch 'circulaire'), postcard, picture postcard, prospectus, note, or telegram] from [Sender] to [Recipient], [date in English in the form dd Month yyyy]

This strongly resembles the title element within fileDesc. One difference is that it omits the phrase 'A machine-readable transcription', and a second difference is that it encodes the names of the correspondents using the orgName or the persName element, and the date using a date element with @when attribute with the value "yyyy-mm-dd". Be careful to avoid leading or trailing spaces inside persName and orgName elements: they may be part of the sentence, but not of the name.

<title>Letter from <persName key="KFUR">K. Fuhri</persName> to <persName key="AKRUR">A.C. Kruseman</persName>, <date when="1850-11-18">18 November 1850</date>.</title>

In the source description, the names of persons and organisations need to be identified using existing Wikidata or BTC codes, using the @key attribute. See below, under "Names"

In the orgName or the persName element, and the date using a idno element under sourceDesc, enter the shelfmark of the source. Examples:

<idno type="callNo">UBA Fu 8-25</idno>
<idno type="callNo">UBL Ltk 1795 8, nr. 1</idno>
<idno type="callNo">UBL BOH C17, fol. 231</idno>

The text

Front

Under front, include a short paragraph summing up the contents of the letter in a div type="summary". The summary must be in English.

Body

The opener

The opener contains all information about the recipient (name, address), and the information about when the letter was sent. Note that the recipient's name goes within a <persName> or <orgName> element within an <addrLine> element. The Opener does not include the sender's name and address details (though the sender's placeName is part of the dateline). These go in the closer. The following example is of a letter sent by an Amsterdam bookseller to Sijthoff publishers in Leiden:

<dateline>
<placeName>Amsterdam</placeName>
<date when="1883-01-19">19 Januari 1883</date>
</dateline>
<address>
<addrLine><persName>J. La Bree</persName></addrLine>
<addrLine>c/o <orgName key="ASYT">Sijthoff</orgName></addrLine>
<addrLine>Doezastraat 1, 3 & 5</addrLine>
<addrLine>Leiden</addrLine>
</address>
<salute>Mijne Heeren</salute>

If the letter contains a preprinted letterhead, the text printed on the letterhead is indicated by means of {} surrounding the text in the appropriate elements. Example:

<dateline>
<placeName>{'s Gravenhage}</placeName>
<date when="1848-01-16">16 Januarij {184}8</date>
</dateline>

The opener of the letter may also contain a line indicating the general subject of the letter, or other text fragments which cannot easily be dscribed using the categories listed in this section. Such miscelaneous text fragments can be marked up using a general seg element. Nevertheless, try to supply a @type attribute which indicates what type of information we are dealing with. You may supply 'subject' as a value.

The closer

The closer contains information about the sender (name and address). The name of the sender must be encoded in a signed element. If the letter has been signed using a name which you can read, it ought to transcribed and encoded in the TEI file. If the signature is illegible, it can be disregarded. If the signature represents the name of the sender in an incomplete manner, it is best to standardise this name using <choice>, <orig> and <reg>.

Use address to encode the address of the sender if present in the source.

If a closing salute consists of two parts, use two salute elements. Examples:

<salute>Na vriendelijke groeten</salute>
<salute>Uw vriend</salute>

Abbreviated honorifics such as "ZEd" "Zed.", "Ued", "Yrs" are expanded using the abbr and expand elements (see below).

The name and address of the sender may or may not be present in the form of a preprinted letterhead. In the following example the letterhead only contains the address, but not the name:

<signed>R. de Tracy Gould</signed>
<address><addrLine>{4, Garden Court,}</addrLine>
<addrLine>{Temple,}</addrLine>
<addrLine>{London. E.C.}</addrLine>
</address>

If a letter contains a postscriptum, it is encoded using a seg element. This element takes a @type attribute, with the value "postscript"

<closer>
<salute>your obedient servants</salute>
<signed>de Erven F. Bohn</signed>
<seg type="postscript">A sending by post of the sheets immediately after been printed would be much better to <lb/> us. </seg>
</closer>

Page breaks and line breaks

Line and page breaks in the source document are indicated using the (empty) <lb/> and <pb/> elements. They are preceded by a space, except where a space would not normally occur if the <lb/> was not present, e.g. in case of a hyphenated word. Note that no <lb/> is used immediately beforethe closing </p>, </addrLine>, </salute> tags, etc., as these already indicate that a line ends there.

Book titles

The titles of all books, journals or articles must be encoded using the <title> element. If, in the original letter, the title is surrounded by quotes, these quotes are not transcribed.

Tables

Information that is presented in tabular form must be encoded using the <table> element. Rows are added using the <row> element, and cells on these rows may be added using <cell>. No attributes are needed for these latter two elements.

Omitted and supplied text

When it is impossible to transcribe a certain section of the text as a result of illegible handwriting or damaged text use the <gap> element. This element must remain empty. The @reason attribute specifies the reason for the omission., e.g. "illegible", or "cancelled". The @extent attribute gives an indication of the number of words or characters that are omitted.

Dear sirs, in reaction to your quote of <gap reason="ink stain" extent="1 word"/> we are forced to ...

If new text has been supplied for any reason, use the supplied element. For example, if the text that is transcribed is a copy of a letter that would have been sent out on a sheet of preprinted letterhead, the preprinted information contained in the letterhead is absent from the copy. In such a case some of that information supplied element, as follows:

<dateline>
<placeName><supplied reason="on preprinted letterhead missing from copybook">Haarlem</supplied></placeName>
<date when="1848-01-16">16 Januarij <supplied reason="on preprinted letterhead missing from copybook">184</supplied>8</date>
</dateline>

Unclear text

Use unclear to indicate a section which is difficult to read in the source. The @reason attribute indicates why the source is difficult to transcribe. For instance, the text may be damaged, the ink may be faded, or the author may have used a very unclear handwriting. Example:

Tell <unclear reason="bad handwriting" cert="medium">Harmer</unclear> that he can come tomorrow.

Always use a @reason attribute for unclear, gap and supplied. In addition, unclear and supplied take a @cert attribute to indicate the encoder's degree of certainty that the text given is correct (possible degrees of certainty in attribute value: "high", "medium", "low", "unknown"). The gap takes an @extent attribute (value: any string of letters and numbers).

Additions

Following the initial completion of the source text, other agents, such as librarians, recipients of letters or editors may have added certain letters, words or sentences. Such added texts can be marked up using the <add> element. Use the @place attribute to explain how the texts have been added. As values, you can use 'above' or 'below', for example. If the source text is a typoscript, and if the additions are in handwrting, also use the attribute @type with value 'handwrting'. You may also explain the motivation for the addition n a <note>.

Deleted text

If parts of the text have been deleted by means of a strikethrough, the text often continues to be readable. The deleted words can be marked up using the <del> element. In this particuar situation, you also need to use a @rend attribute with the value "strikethrough".

Graphically distinct text

Underscores and italics used for stress or emphasis for linguistic or rhetorical effect are rendered through the emph element, in combination with the @rend attribute. For example:

... of which he understands <emph rend="underlined">nothing</emph>

The following values can be used with the @rend attribute: "bold", "italic", "bold-italic", "underlined", "double-underlined".

If in the MS a book title is enclosed in quotation marks or underlined, this is not indicated in the encoding. That it is a book title is indicated solely by the <title> tag, which replaces any typographic indication in the source.

Correction, abbreviation, regularisation

Book titles are often shortened or otherwise referred to rather impressionistically. In such cases the written title is given in an orig element and the full title is given in a reg element. These two elements are together enclosed in turn within a choice element. Example:

<choice>
<orig>Haafner's reize</orig>
<reg>J. Haafner's reize te voet door Ceylon</reg>
</choice>

Abbreviations are expanded as follows:

Your <choice><abbr>obdt.</abbr><expan>obedient</expan><choice> servant

Uw <choice><abbr>dw.</abbr><expan>dienstwillige</expan></choice> dienaar

Misspellings (which occur frequently in letters not written by native speakers, as in the letters by Dutch publishers with their foreign correspondents) are corrected using the choice, sic and corr elements:

It would be very <choice><sic>agreable</sic><corr>agreeable</corr></choice> to us ...

Unusual spellings that were current at the time are maintained without adding <sic>.

Figures

Figures (e.g. money) are always given with a full stop as a decimal marker. Sums of money may be preceded by a currency marker. E.g. in Dutch guilders the guilder sign (ƒ, represented as the entity &#x192;); in English pounds the pound sign (£, represented as the entity &#xA3;).

Foreign words

Use <foreign> to identify a word or phrase as belonging to some language other than that of the surrounding text. Identify the language used by means of the codes of ISO 639-2 (note that 3-letter codes are in lower case; 2-letter codes are in capitals). E.g.:

Wij hebben inmiddels de <foreign xml:lang="eng" >proof sheets</foreign> uit Londen ontvangen

Dates

Dates are standardised with the @when attribute, using the ISO 8601 standard, which represents the date in the form YYYY-MM-DD. In the event of uncertainty, digits can be left off from the end. For example, if a letter is known to date from 1854, while the day and month are unknown, the date would be given as "1854".

<date when="1848-01-16">16 Januarij {184}8</date>
<date when="1864-08-06">6 Aug. '64</date>

Names

Names encountered in the text should always be marked up according to their type. The available elements are persName (for persons), placeName (for geographical names) and orgName (for organisations)

The persons and the organisations referred to in the text also need to be identified. This project uses the identifies that have been defined within the WikiData project. To find the correct identifier for the person or the organisation that you have found in the letter, follow the steps below.

  • Visit WikiData at https://www.wikidata.org/
  • In the search bar on which website (which is normally labelled "Search Wikidata"), type in the name that you have found
  • If you have found the person or the organisation you were looking for, open the page that lists all the data that is available for this person or this organisation.
  • From the main header on this page, copy the identifier for this named entity. On WikiData, identifiers mostly consist of the letter "Q", following by a number of digits.

The WikiData identifier must be given in the @key attribute of the <persName> or the <orgName> element, as follows:

<persName key="Q703935">Leonard Woolf</persName>

If you are unable to find the name that is mentioned in the letter, please send an email mentioning the full name to bdms.staff@gmail.com. In this situation, a local identifier will be minted for this name. Please consult the full list of all the existing local identifiers before you request a new identifier.

Honorifics (e.g. academic titles), added names or phrases indicating specific roles or function (e.g. 'Baron' or 'King') are assumed to be part of the names. Names must always be encoded fully, so including those additional words.

<orgName type="recipient" key="EFBO">Messrs de Erven F. Bohn</orgName>

If the body of your letter contains multiple references to the same person or organisation, it is not necessary to add @key attributes with identifiers for each of these references. If this person or organisation has been identified once, this is sufficient.

 

Editorial annotations

Editorial annotations can be used to provide more contextual information on the text that is transcribed. When deciding what information to annotate, think of what is typically annotated in a printed edition of correspondence. For example, identify bibliographic details of publications mentioned (author, full title, place of publication, publisher, and year), and identify people and organisations.

For your research, use the Leiden library catalogue (and other catalogue depending on the nature of the bibliographic item), Google, Wikipedia, biographical dictionaries and other reference works intelligently. For example: in BOH C107 fol. 1. the author (J.G.C. Schlenker) asks De Erven F. Bohn:

would you be so kind as to send the us the 125 copies of the Tijdschrift, the ones for the exchange journals, to Professor Van Haren Noman [...]

The search for 'tijdschrift bohn' in Leiden University's Catalogue yields 79 results, but Wikipedia has an article on Van Haren Noman, which mentions Tijdschrift two times: 'Tijdschrift der Nederlandse dierkundige vereniging', and 'Ned. Tijds. v. Geneesk.'. The query 'tijdschrift bohn dierkundige' yields no result in the Leiden OPAC, but 'tijdschrift bohn geneeskunde' does. A Google search for 'tijdschrift geneeskunde bohn "Haren Noman"' yields a number of interesting results. So the Tijdschrift in question must be the Nederlands tijdschrift voor geneeskunde.

Include the note element directly after the text the note applies to. The text of the annotation must be give in a p element. A note may contain various paragraphs.

<p> ... the calendering <note><p>Calendering, or satinizing, is a process of treating paper.</p></note>
is carried out by the foreman ...</p>

Written by: Peter Verhaar, Adriaan van der Weel and Andrew Stevens.