Procedure:
1. | Transcribe the text of the source as meticulously as possible (see General remarks about the transcription below). |
2. | Proofread the completed transcription against the source very carefully. |
3. | Provide editorial annotations (see Editorial annotations below). |
4. | Create a summary of the text. |
5. | Encode the transcription, using the instructions below for the TEI Header and the text of the letters. Note that these contain merely the briefest of abstracts, and only those most germane to the project, of the comprehensive instructions detailed in the fifth edition of the guidelines of the Text Encoding Initiative ("P5"). The following two chapters of the TEI Guidelines are the most directly relevant to the project: 11. Representation of Primary Sources (especially from 11.3 onwards), and 13. Names, Dates, People, and Places. |
6. | Save the file using the file naming conventions below. |
NB: by convention, in what follows, attribute names are preceded by the @-sign.
In the title
element within the titleStmt
in fileDesc
, provide the title of the document in the following form:
[letter, draft letter, form letter (=Dutch 'circulaire'), postcard, picture postcard,
prospectus, note, or telegram] from [Sender] to [Recipient], [date in English in the form dd Month yyyy]: a machine readable transcription
Personal names of senders and recipients are written using the following form: Firstnames, or initial(s) with full stop(s) not separated from each other by a space, followed by a space, followed by the surname.
Each letter that is transcribed for the Booktrade Corespondence Project must be given an ID. This file ID is created by combining the identifier of the sender of the letter with the letter's date. If possible, use a WikiData identifier for the sender of the letter. See below, under "Names". The identifier for the sender must be followed by the letter's date in the form YYYYMMDD.
Record the file ID of the TEI File in idno
, under <publicationStmt>. The file ID must also be captured in the @xml:id attribute of the TEI root element.
Example:
Sender:
William Blackwood
Date:
7 Sept. 1892
File ID:
WBLA18920907
When a correspondent has sent more than one letter on the same day, use letters to distinguish the files. Examples: EFBO18760613a.xml and EFBO18760613b.xml were written by De Erven F. Bohn on the same day.
If you have difficulties creating a file ID using these instructions, please send a mail the BCP team at bdms.staff@gmail.com
Note that the file ID must also be used as the name of the file! The file is simply the file ID, combined with the .xml extention.
Example:
File ID: WBLA18920907
File name: WBLA18920907.xml
The title
element within sourceDesc
contains a bibliographic description of the document that has been transcribed, in the following form:
[letter, draft letter, form letter (=Dutch 'circulaire'), postcard, picture postcard,
prospectus, note, or telegram] from [Sender] to [Recipient], [date in English in the form dd Month yyyy]
This strongly resembles the title element within fileDesc
.
One difference is that it omits the phrase 'A machine-readable transcription', and a second difference is that it encodes the names of the correspondents using the orgName
or the persName
element, and the date using a date
element with @when attribute with the value "yyyy-mm-dd". Be careful to avoid leading or trailing spaces inside persName
and orgName
elements: they may be part of the sentence, but not of the name.
<title>Letter from <persName key="KFUR">K. Fuhri</persName> to <persName key="AKRUR">A.C. Kruseman</persName>, <date when="1850-11-18">18 November 1850</date>.</title>
In the source description, the names of persons and organisations need to be identified using existing Wikidata or BTC codes, using the @key attribute. See below, under "Names"
In the orgName
or the persName
element, and the date using a idno
element under sourceDesc, enter the shelfmark of the source.
Examples:
<idno type="callNo">UBA Fu 8-25</idno>
<idno type="callNo">UBL Ltk 1795 8, nr. 1</idno>
<idno type="callNo">UBL BOH C17, fol. 231</idno>
Under front
, include a short paragraph summing up the contents of the letter in a div type="summary"
. The summary must be in English.
The opener contains all information about the recipient (name, address), and the information about when the letter was sent. Note that the recipient's name goes within a <persName> or <orgName> element within an <addrLine> element. The Opener does not include the sender's name and address details (though the sender's placeName
is part of the dateline)
. These go in the closer. The following example is of a letter sent by an Amsterdam bookseller to Sijthoff publishers in Leiden:
<dateline>
<placeName>Amsterdam</placeName>
<date when="1883-01-19">19 Januari 1883</date>
</dateline>
<address>
<addrLine><persName>J. La Bree</persName></addrLine>
<addrLine>c/o <orgName key="ASYT">Sijthoff</orgName></addrLine>
<addrLine>Doezastraat 1, 3 & 5</addrLine>
<addrLine>Leiden</addrLine>
</address>
<salute>Mijne Heeren</salute>
If the letter contains a preprinted letterhead, the text printed on the letterhead is indicated by means of {} surrounding the text in the appropriate elements. Example:
<dateline>
<placeName>{'s Gravenhage}</placeName>
<date when="1848-01-16">16 Januarij {184}8</date>
</dateline>
The opener of the letter may also contain a line indicating the general subject of the letter, or other
text fragments which cannot easily be dscribed using the categories listed in this section.
Such miscelaneous text fragments can be marked up using a general seg
element. Nevertheless, try to supply a @type
attribute which indicates what type of information we are dealing with. As values, you can use terms such as 'subject' or 'subject'.
The closer contains information about the sender (name and address). The
name of the sender must be encoded in a signed
element.
If the letter has been signed using a name which you can read, it ought to transcribed and encoded in the TEI file. If the signature is illegible, it can be disregarded. If the signature represents the name of the sender in an incomplete manner, it is best to standardise this name using <choice>, <orig> and <reg>.
Use address
to encode the address of the sender if present in the source.
If a closing salute consists of two parts, use two salute elements. Examples:
<salute>Na vriendelijke groeten</salute>
<salute>Uw vriend</salute>
Abbreviated honorifics such as "ZEd" "Zed.", "Ued", "Yrs"
are expanded using the abbr
and expand
elements (see below).
The name and address of the sender may or may not be present in the form of a preprinted letterhead. In the following example the letterhead only contains the address, but not the name:
<signed>R. de Tracy Gould</signed>
<address><addrLine>{4, Garden Court,}</addrLine>
<addrLine>{Temple,}</addrLine>
<addrLine>{London. E.C.}</addrLine>
</address>
If a letter contains a postscriptum, it is encoded using a seg
element. This element takes a @type attribute, with the value "postscript"
<closer>
<salute>your obedient servants</salute>
<signed>de Erven F. Bohn</signed>
<seg type="postscript">A sending by post of the sheets immediately
after been printed would be much better to <lb/> us. </seg>
</closer>
Line and page breaks in the source document are indicated using the (empty) <lb/> and <pb/> elements. They are preceded by a space, except where a space would not normally occur if the <lb/> was not present, e.g. in case of a hyphenated word. Note that no <lb/> is used immediately beforethe closing </p>, </addrLine>, </salute> tags, etc., as these already indicate that a line ends there.
The titles of all books, journals or articles must be encoded using the <title> element.
When it is impossible to transcribe a certain section of the text as a result of illegible handwriting or damaged text use the <gap> element. This element must remain empty. The @reason attribute specifies the reason for the omission., e.g. "illegible", or "cancelled". The @extent attribute gives an indication of the number of words or characters that are omitted.
Dear sirs, in reaction to your quote of <gap reason="ink stain"
extent="1 word"/> we are forced to ...
If new text has been supplied for any reason,
use the supplied
element.
For example, if the text that is transcribed is a copy of a letter that would have been
sent out on a sheet of preprinted letterhead, the preprinted information contained
in the letterhead is absent from the copy. In such a case some of that information
supplied
element, as follows:
<dateline>
<placeName><supplied reason="on preprinted letterhead missing from copybook">Haarlem</supplied></placeName>
<date when="1848-01-16">16 Januarij <supplied reason="on preprinted letterhead missing from copybook">184</supplied>8</date>
</dateline>
Use unclear
to indicate a section which is difficult to read in the source. The @reason attribute indicates
why the source is difficult to transcribe. For instance, the text may be
damaged, the ink may be faded, or the author may have used a very unclear handwriting. Example:
Tell <unclear reason="bad handwriting" cert="medium">Harmer</unclear>
that he can come tomorrow.
Always use a @reason attribute for unclear
, gap
and supplied
. In addition, unclear
and supplied
take a @cert attribute to indicate the encoder's degree of certainty that the text given is correct (possible degrees of certainty in attribute value: "high", "medium", "low", "unknown"). The gap
takes an @extent attribute (value: any string of letters and numbers).
Following the initial completion of the source text, other agents, such as librarians, recipients of letters or editors may have added certain letters, words or sentences. Such added texts can be marked up using the <add> element. Use the @place attribute to explain how the texts have been added. As values, you can use 'above' or 'below', for example. If the source text is a typoscript, and if the additions are in handwrting, also use the attribute @type with value 'handwrting'. You may also explain the motivation for the addition n a <note>.
If parts of the text have been deleted by means of a strikethrough, the text often continues to be readable. The deleted words can be marked up using the <del> element. In this particuar situation, you also need to use a @rend attribute with the value "strikethrough".
Underscores and italics used for stress or emphasis for linguistic or rhetorical
effect are rendered through the emph
element, in combination with
the @rend attribute. For example:
... of which he understands <emph rend="underlined">nothing</emph>
The following values can be used with the @rend attribute: "bold", "italic", "bold-italic", "underlined", "double-underlined".
If in the MS a book title is enclosed in quotation marks or underlined, this is not indicated in the encoding. That it is a book title is indicated solely by the <title> tag, which replaces any typographic indication in the source.
Book titles are often shortened or otherwise referred to rather impressionistically. In such cases the written title is given in an orig
element and the full title is given in a reg
element. These two elements are together enclosed in turn within a choice
element. Example:
<choice>
<orig>Haafner's reize</orig>
<reg>J. Haafner's reize te voet door Ceylon</reg>
</choice>
Abbreviations are expanded as follows:
Your <choice><abbr>obdt.</abbr><expan>obedient</expan><choice> servant
Uw <choice><abbr>dw.</abbr><expan>dienstwillige</expan></choice> dienaar
Misspellings (which occur frequently in letters not written by native speakers,
as in the letters by Dutch publishers with their foreign correspondents) are
corrected using the choice
, sic
and corr
elements:
It would be very <choice><sic>agreable</sic><corr>agreeable</corr></choice> to us ...
Unusual spellings that were current at the time are maintained without adding <sic>.
Figures (e.g. money) are always given with a full stop as a decimal marker. Sums of money may be preceded by a currency marker. E.g. in Dutch guilders the guilder sign (ƒ, represented as the entity ƒ); in English pounds the pound sign (£, represented as the entity £).
Use <foreign> to identify a word or phrase as belonging to some language other than that of the surrounding text. Identify the language used by means of the codes of ISO 639-2 (note that 3-letter codes are in lower case; 2-letter codes are in capitals). E.g.:
Wij hebben inmiddels de <foreign xml:lang="eng" >proof sheets</foreign>
uit Londen ontvangen
Dates are standardised with the @when attribute, using the ISO 8601 standard, which represents the date in the form YYYY-MM-DD. In the event of uncertainty, digits can be left off from the end. For example, if a letter is known to date from 1854, while the day and month are unknown, the date would be given as "1854".
<date when="1848-01-16">16 Januarij {184}8</date>
<date when="1864-08-06">6 Aug. '64</date>
Names encountered in the text should always be marked up according to their type. The available elements are
persName
(for persons), placeName
(for geographical names) and orgName
(for organisations)
The persons and the organisations referred to in the text also need to be identified. This project uses the identifies that have been defined within the WikiData project. To find the correct identifier for the person or the organisation that you have found in the letter, follow the steps below.
The WikiData identifier must be given in the @key attribute of the <persName> or the <orgName> element, as follows:
<persName key="Q703935">Leonard Woolf</persName>
If you are unable to find the name that is mentioned in the letter, please send an email mentioning the full name to bdms.staff@gmail.com. In this situation, a local identifier will be minted for this name. Please consult the full list of all the existing local identifiers before you request a new identifier.
Honorifics (e.g. academic titles), added names or phrases indicating specific roles or function (e.g. 'Baron' or 'King') are assumed to be part of the names. Names must always be encoded fully, so including those additional words.
<orgName type="recipient" key="EFBO">Messrs de Erven F. Bohn</orgName>
If the body of your letter contains multiple references to the same person or organisation, it is not necessary to add @key attributes with identifiers for each of these references. If this person or organisation has been identified once, this is sufficient.
Editorial annotations can be used to provide more contextual information on the text that is transcribed. When deciding what information to annotate, think of what is typically annotated in a printed edition of correspondence. For example, identify bibliographic details of publications mentioned (author, full title, place of publication, publisher, and year), and identify people and organisations.
For your research, use the Leiden library catalogue (and other catalogue depending on the nature of the bibliographic item), Google, Wikipedia, biographical dictionaries and other reference works intelligently. For example: in BOH C107 fol. 1. the author (J.G.C. Schlenker) asks De Erven F. Bohn:
would you be so kind as to send the us the 125
copies of the Tijdschrift, the ones for the exchange journals, to Professor
Van Haren Noman [...]
The search for 'tijdschrift bohn' in Leiden University's Catalogue yields 79 results, but Wikipedia has an article on Van Haren Noman, which mentions Tijdschrift two times: 'Tijdschrift der Nederlandse dierkundige vereniging', and 'Ned. Tijds. v. Geneesk.'. The query 'tijdschrift bohn dierkundige' yields no result in the Leiden OPAC, but 'tijdschrift bohn geneeskunde' does. A Google search for 'tijdschrift geneeskunde bohn "Haren Noman"' yields a number of interesting results. So the Tijdschrift in question must be the Nederlands tijdschrift voor geneeskunde.
Include the note
element directly after the text the note applies to. The text of the annotation must be give in a p
element. A note may contain various paragraphs.
<p> ... the calendering <note><p>Calendering, or satinizing,
is a process of treating paper.</p></note>
is carried out by the foreman ...</p>
Written by: Adriaan van der Weel, Andrew Stevens and Peter Verhaar.