Booktrade Correspondence Project:
Guidelines for the Transcription and Encoding of Primary Sources
Procedure:
1. | Transcribe the text of the source as meticulously as possible (see General remarks about the transcription below). |
2. | Proofread the completed transcription against the source very carefully. |
3. | Provide editorial annotations (see Editorial annotations below). |
4. | Create a summary of the text. |
5. | Encode the transcription, using the instructions below for the TEI Header and the text of the letters. Note that these contain merely the briefest of abstracts, and only those most germane to the project, of the comprehensive instructions detailed in the fifth edition of the guidelines of the Text Encoding Initiative ("P5"). The following two chapters of the TEI Guidelines are the most directly relevant to the project: 11. Representation of Primary Sources (especially from 11.3 onwards), and 13. Names, Dates, People, and Places. |
6. | Save the file using the file naming conventions below. |
A. General remarks about the encoded transcription
- Please be as precise as you can: the result of your work will be used for research purposes.
- Words are joined or hyphenated as they are found in the source. Punctuation is left as found, except that sentences always end with a full stop, even if you would have to insert it in a <supplied> element, or if it looks a little like a dash or comma.
- If you need to use non-keyboard characters, please use the Unicode character charts. For example, "à" ("Latin small letter a with grave") is rendered as "à", "é" ("Latin small letter e with acute") as "é", and ß ("Latin small letter sharp s", or "Ringel-S" or "sz") as "ß".
- Remove unused elements in the <body> of the TEI file, but not from the <teiHeader>.
B. The encoding
NB: by convention, in what follows, attribute names are preceded by the @-sign.
The TEI Header
The file description (fileDesc)
In the title
element within the titleStmt
in fileDesc
, provide the title of the document in the following form:
Personal names of senders and recipients are written using the following form: Firstnames, or initial(s) with full stop(s) not separated from each other by a space, followed by a space, followed by the surname.
Each letter that is transcribed for the Booktrade Corespondence Project must be given an ID. This file ID is created by combining the identifier of the sender of the letter with the letter's date. You need to place an underscore in between the identifier and the date. If possible, use a WikiData identifier for the sender of the letter. See below, under "Names". The letter's need to be given date in the form YYYYMMDD.
Record the file ID of the TEI File in idno
, under <publicationStmt>. The file ID must also be captured in the @xml:id attribute of the TEI root element.
Example:
Sender:
William Blackwood
Date:
7 Sept. 1892
File ID:
WBLA_18920907
When a correspondent has sent more than one letter on the same day, use letters to distinguish the files. Examples: EFBO_18760613a.xml and EFBO_18760613b.xml were written by De Erven F. Bohn on the same day.
If you have difficulties creating a file ID using these instructions, please send a mail the BCP team at bdms.staff@gmail.com
Note that the file ID must also be used as the name of the file! The file is simply the file ID, combined with the .xml extention.
Example:
File ID: WBLA_18920907
File name: WBLA_18920907.xml
The source description (sourceDesc)
The title
element within sourceDesc
contains a bibliographic description of the document that has been transcribed, in the following form:
This strongly resembles the title element within fileDesc
.
One difference is that it omits the phrase 'A machine-readable transcription', and a second difference is that it encodes the names of the correspondents using the orgName
or the persName
element, and the date using a date
element with @when attribute with the value "yyyy-mm-dd". Be careful to avoid leading or trailing spaces inside persName
and orgName
elements: they may be part of the sentence, but not of the name.
In the source description, the names of persons and organisations need to be identified using existing Wikidata or BTC codes, using the @key attribute. See below, under "Names"
In the orgName
or the persName
element, and the date using a idno
element under sourceDesc, enter the shelfmark of the source.
Examples:
<idno type="callNo">UBL Ltk 1795 8, nr. 1</idno>
<idno type="callNo">UBL BOH C17, fol. 231</idno>
The text
Front
Under front
, include a short paragraph summing up the contents of the letter in a div type="summary"
. The summary must be in English.
Body
The opener
The opener contains all information about the recipient (name, address), and the information about when the letter was sent. Note that the recipient's name goes within a <persName> or <orgName> element within an <addrLine> element. The Opener does not include the sender's name and address details (though the sender's placeName
is part of the dateline)
. These go in the closer. The following example is of a letter sent by an Amsterdam bookseller to Sijthoff publishers in Leiden:
<placeName>Amsterdam</placeName>
<date when="1883-01-19">19 Januari 1883</date>
</dateline>
<address>
<addrLine><persName>J. La Bree</persName></addrLine>
<addrLine>c/o <orgName key="ASYT">Sijthoff</orgName></addrLine>
<addrLine>Doezastraat 1, 3 & 5</addrLine>
<addrLine>Leiden</addrLine>
</address>
<salute>Mijne Heeren</salute>
If the letter contains a preprinted letterhead, the text printed on the letterhead is indicated by means of {} surrounding the text in the appropriate elements. Example:
<placeName>{'s Gravenhage}</placeName>
<date when="1848-01-16">16 Januarij {184}8</date>
</dateline>
The opener of the letter may also contain a line indicating the general subject of the letter, or other
text fragments which cannot easily be dscribed using the categories listed in this section.
Such miscelaneous text fragments can be marked up using a general seg
element. Nevertheless, try to supply a @type
attribute which indicates what type of information we are dealing with. You may supply 'subject' as a value.
The closer
The closer contains information about the sender (name and address). The
name of the sender must be encoded in a signed
element.
If the letter has been signed using a name which you can read, it ought to transcribed and encoded in the TEI file. If the signature is illegible, it can be disregarded. If the signature represents the name of the sender in an incomplete manner, it is best to standardise this name using <choice>, <orig> and <reg>.
Use address
to encode the address of the sender if present in the source.
If a closing salute consists of two parts, use two salute elements. Examples:
<salute>Uw vriend</salute>
Abbreviated honorifics such as "ZEd" "Zed.", "Ued", "Yrs"
are expanded using the abbr
and expand
elements (see below).
The name and address of the sender may or may not be present in the form of a preprinted letterhead. In the following example the letterhead only contains the address, but not the name:
<address><addrLine>{4, Garden Court,}</addrLine>
<addrLine>{Temple,}</addrLine>
<addrLine>{London. E.C.}</addrLine>
</address>
If a letter contains a postscriptum, it is encoded using a seg
element. This element takes a @type attribute, with the value "postscript"
<salute>your obedient servants</salute>
<signed>de Erven F. Bohn</signed>
<seg type="postscript">A sending by post of the sheets immediately after been printed would be much better to <lb/> us. </seg>
</closer>
Page breaks and line breaks
Line and page breaks in the source document are indicated using the (empty) <lb/> and <pb/> elements. They are preceded by a space, except where a space would not normally occur if the <lb/> was not present, e.g. in case of a hyphenated word. Note that no <lb/> is used immediately beforethe closing </p>, </addrLine>, </salute> tags, etc., as these already indicate that a line ends there.
Book titles
The titles of all books, journals or articles must be encoded using the <title> element. If, in the original letter, the title is surrounded by quotes, these quotes are not transcribed.
Tables
Information that is presented in tabular form must be encoded using the <table> element. Rows are added using the <row> element, and cells on these rows may be added using <cell>. No attributes are needed for these latter two elements.
Omitted and supplied text
When it is impossible to transcribe a certain section of the text as a result of illegible handwriting or damaged text use the <gap> element. This element must remain empty. The @reason attribute specifies the reason for the omission., e.g. "illegible", or "cancelled". The @extent attribute gives an indication of the number of words or characters that are omitted.
If new text has been supplied for any reason,
use the supplied
element.
For example, if the text that is transcribed is a copy of a letter that would have been
sent out on a sheet of preprinted letterhead, the preprinted information contained
in the letterhead is absent from the copy. In such a case some of that information
supplied
element, as follows:
<placeName><supplied reason="on preprinted letterhead missing from copybook">Haarlem</supplied></placeName>
<date when="1848-01-16">16 Januarij <supplied reason="on preprinted letterhead missing from copybook">184</supplied>8</date>
</dateline>
Unclear text
Use unclear
to indicate a section which is difficult to read in the source. The @reason attribute indicates
why the source is difficult to transcribe. For instance, the text may be
damaged, the ink may be faded, or the author may have used a very unclear handwriting. Example:
Always use a @reason attribute for unclear
, gap
and supplied
. In addition, unclear
and supplied
take a @cert attribute to indicate the encoder's degree of certainty that the text given is correct (possible degrees of certainty in attribute value: "high", "medium", "low", "unknown"). The gap
takes an @extent attribute (value: any string of letters and numbers).
Additions
Following the initial completion of the source text, other agents, such as librarians, recipients of letters or editors may have added certain letters, words or sentences. Such added texts can be marked up using the <add> element. Use the @place attribute to explain how the texts have been added. As values, you can use 'above' or 'below', for example. If the source text is a typoscript, and if the additions are in handwrting, also use the attribute @type with value 'handwrting'. You may also explain the motivation for the addition n a <note>.
Deleted text
If parts of the text have been deleted by means of a strikethrough, the text often continues to be readable. The deleted words can be marked up using the <del> element. In this particuar situation, you also need to use a @rend attribute with the value "strikethrough".
Graphically distinct text
Underscores and italics used for stress or emphasis for linguistic or rhetorical
effect are rendered through the emph
element, in combination with
the @rend attribute. For example:
The following values can be used with the @rend attribute: "bold", "italic", "bold-italic", "underlined", "double-underlined".
If in the MS a book title is enclosed in quotation marks or underlined, this is not indicated in the encoding. That it is a book title is indicated solely by the <title> tag, which replaces any typographic indication in the source.
Correction, abbreviation, regularisation
Book titles are often shortened or otherwise referred to rather impressionistically. In such cases the written title is given in an orig
element and the full title is given in a reg
element. These two elements are together enclosed in turn within a choice
element. Example:
<orig>Haafner's reize</orig>
<reg>J. Haafner's reize te voet door Ceylon</reg>
</choice>
Abbreviations are expanded as follows:
Uw <choice><abbr>dw.</abbr><expan>dienstwillige</expan></choice> dienaar
Misspellings (which occur frequently in letters not written by native speakers,
as in the letters by Dutch publishers with their foreign correspondents) are
corrected using the choice
, sic
and corr
elements:
Unusual spellings that were current at the time are maintained without adding <sic>.
Figures
Figures (e.g. money) are always given with a full stop as a decimal marker. Sums of money may be preceded by a currency marker. E.g. in Dutch guilders the guilder sign (ƒ, represented as the entity ƒ); in English pounds the pound sign (£, represented as the entity £).
Foreign words
Use <foreign> to identify a word or phrase as belonging to some language other than that of the surrounding text. Identify the language used by means of the codes of ISO 639-2 (note that 3-letter codes are in lower case; 2-letter codes are in capitals). E.g.:
Dates
Dates are standardised with the @when attribute, using the ISO 8601 standard, which represents the date in the form YYYY-MM-DD. In the event of uncertainty, digits can be left off from the end. For example, if a letter is known to date from 1854, while the day and month are unknown, the date would be given as "1854".
<date when="1864-08-06">6 Aug. '64</date>
Names
Names encountered in the text should always be marked up according to their type. The available elements are
persName
(for persons), placeName
(for geographical names) and orgName
(for organisations)
The persons and the organisations referred to in the text also need to be identified. This project uses the identifies that have been defined within the WikiData project. To find the correct identifier for the person or the organisation that you have found in the letter, follow the steps below.
- Visit WikiData at https://www.wikidata.org/
- In the search bar on which website (which is normally labelled "Search Wikidata"), type in the name that you have found
- If you have found the person or the organisation you were looking for, open the page that lists all the data that is available for this person or this organisation.
- From the main header on this page, copy the identifier for this named entity. On WikiData, identifiers mostly consist of the letter "Q", following by a number of digits.
The WikiData identifier must be given in the @key attribute of the <persName> or the <orgName> element, as follows:
If you are unable to find the name that is mentioned in the letter, please send an email mentioning the full name to bdms.staff@gmail.com. In this situation, a local identifier will be minted for this name. Please consult the full list of all the existing local identifiers before you request a new identifier.
Honorifics (e.g. academic titles), added names or phrases indicating specific roles or function (e.g. 'Baron' or 'King') are assumed to be part of the names. Names must always be encoded fully, so including those additional words.
If the body of your letter contains multiple references to the same person or organisation, it is not necessary to add @key attributes with identifiers for each of these references. If this person or organisation has been identified once, this is sufficient.
Editorial annotations
Editorial annotations can be used to provide more contextual information on the text that is transcribed. When deciding what information to annotate, think of what is typically annotated in a printed edition of correspondence. For example, identify bibliographic details of publications mentioned (author, full title, place of publication, publisher, and year), and identify people and organisations.
For your research, use the Leiden library catalogue (and other catalogue depending on the nature of the bibliographic item), Google, Wikipedia, biographical dictionaries and other reference works intelligently. For example: in BOH C107 fol. 1. the author (J.G.C. Schlenker) asks De Erven F. Bohn:
The search for 'tijdschrift bohn' in Leiden University's Catalogue yields 79 results, but Wikipedia has an article on Van Haren Noman, which mentions Tijdschrift two times: 'Tijdschrift der Nederlandse dierkundige vereniging', and 'Ned. Tijds. v. Geneesk.'. The query 'tijdschrift bohn dierkundige' yields no result in the Leiden OPAC, but 'tijdschrift bohn geneeskunde' does. A Google search for 'tijdschrift geneeskunde bohn "Haren Noman"' yields a number of interesting results. So the Tijdschrift in question must be the Nederlands tijdschrift voor geneeskunde.
Include the note
element directly after the text the note applies to. The text of the annotation must be give in a p
element. A note may contain various paragraphs.
is carried out by the foreman ...</p>
Written by: Peter Verhaar, Adriaan van der Weel and Andrew Stevens.