Home | View | Search | Introduction | Editorial | Resources

Technical Policy by Stella Wong

This document provides a summary description of the technical practices and procedures involved in the creation of this edition of the workdiaries. A comprehensive account is provided in the internal project documentation (including an alphabetical list of all XML elements, attributes, and values, annotations on their usage and exceptional cases as well as full documentation of the transformation).

Creation of the XML

The Workdiaries of Robert Boyle were encoded in TEI-XML by Dr Charles Littleton between 1997 and 2001 for a project funded by the Welcome Trust. The original output of this project was an online edition of the workdiaries published on the web site of the Birkbeck Boyle Project [new window] which incorporated HTML versions of the workdiaries transformed from the XML using XSLT. The AHRB Centre for Editing Lives and Letters 2004 edition was created from a revised and re-transformed version of the TEI-XML produced by Dr Littleton.

Transformation of the XML for the CELL edition

The XML was transformed into XHMLT ('Transitional' and 'Frameset') using a customised utility written in Python. 8 different types of output were generated from the revised XML workdiary transcriptions:

  • Text-only diplomatic transcript (1 file per workdiary)
  • Text-only editorial transcript (1 file per workdiary)
  • Print-friendly diplomatic transcript (1 file per workdiary)
  • Print-friendly editorial transcript (1 file per workdiary)
  • Page-by-page diplomatic transcript (1 file per manuscript page)
  • Page-by-page editorial transcript (1 file per manuscript page)
  • Editorial Headnotes (1 file per workdiary)
  • Editorial Entry Notes (1 file per workdiary)

In addition, the four XML reference files (Places.xml, Bibliography.xml, Biographical.xml, BoyleWorksRef.xml) were tranformed into XHMLT 'Transitional'.

XML document structure

Each workdiary has been encoded as a separate XML file with the root element <TEI.2>. Metadata and general workdiary information is contained within the <teiHeader> for each workdiary XML file. The workdiary transcriptions are enclosed within the parent element <text>. Multiple occurrences of <div> are used to mark up structural divisions within the manuscript and individual workdiary entries, the basic units of each workdiary. Each workdiary entry comprises at least two basic parts: an editorial note, which precedes the text of the transcription proper, and the text of the entry itself. A summary breakdown of all elements and attributes is provided below.

Structure of the <teiHeader>

The TEI header provides information about the source manuscript, a record of revisions, responsibilities and contributions to the transcription, publication details and the hands and languages present in the text. <teiHeader> is the level 1 parent element which encloses three level 2 parent elements <fileDesc>, <profileDesc>, and <revisionDesc>. The use of these level 2 parent elements, their children, and attributes are as follows.

Elements and attributes enclosed within <fileDesc> which is a bibliographic description of the text:

<titleStmt>
Child elements: <title> (title of workdiary), <author> (Robert Boyle), <funder> (funding body), <respStmt> (used for each individual contributors to the project and their responsibilities; 'id' attribute). Child elements of <respStmt>: <name> (full name of individual), <rsp> (list of responsibilities and contributions of the individual to the project).
<publicationStmt>
Child element: <publisher> which identifies the publisher of the electronic edition.
<noteStmt>
Child elements: <rs> (referencing string, used to link to the biographical register), <hi> (used to describe formatting), and <note> with the attributes 'resp' (indicating responsibility) and 'type' (which has the possible values 'content' giving a brief description of the workdiary content, 'length' giving the total number of entries for the workdiary, 'format' relating to the physical format of the manuscript source, and 'note' giving general notes and a commentary on the workdiary and detailing problems with its transcription or the physical manuscript source or layout).

Child elements of <note> are <add> (for words inserted in the workdiary text during Boyle's lifetime), <bibl> (for bibliographic references; 'id' attribute links to references in the bibliography), <title> (with 'level' attribute which describes the physical format of the reference, possible values are 'a' for article, 's' for series', 'm' for monograph), <biblescope> (describes the scope of the reference with attribute 'type' which refers to the type of reference, possible values are 'volume', 'date', 'pages'). Child element of <biblScope>: <num> (with the attribute 'value' giving the pages numbers of a volume or the volume number of a bibliographic reference).
<sourceDescDesc>
Child element: <sourceDesc> (the manuscript's physical location, has the child element <bibl>, see above).

Elements and attributes enclosed within <profileDesc> which is used to makrup compositional features of the workdiary:

<creation>, workdiary date of composition.
<langUsage>, describes the language(s) used in the workdiary.
Child element: <language> (with the attributes 'id', with the possible values 'en' for Enlgish, 'fr' for French, 'it' for Italian, and 'gr' for Greek, and 'usage' giving the number of entries ascribed to that language).
<handList>, describes the list of 'hands' or amanuenses which contributed to the composition of the workdiary
Child element: <hand> (the permitted values of the 'id' attribute are shorthand names for the scribes, e.g. 'Boyle', 'Hand A', 'Slare', the 'scribe' attribute gives the full name of the scribe, 'resp' gives the editor responsible for identifiying the hand, 'character' lists the entires ascribed to that particular hand).

Elements and attributes enclosed within <revisionDesc> which is used to markup revisions to the the workdiary and the individual responsible:

<change>.
Child element: <date> (given in the form 'YYYY-MM-DD') <respStmt>. Child elements of <respStmt>: <name> (name of the individual responsible for the revision to the XML), <resp> (role and responsibilities of the individual behind the revisions), <item> (a description of hte revision carried out to the transcription text).

Structure of the XML transcription text

Workdiary entries are enclosed within <text><body></body></text> The basic unit of each workdiary is the entry which is encoded using <div> with the value 'entry' as the 'type' attribute, i.e. <div type="entry">.

The attributes of the <div> element assign a unique number to each workdiary entry: <div type="entry" n="1" id="WD1-1"> where the value of the 'n' attribute is the entry number within the workdiary and the 'id' value is a unique value for that entry relating to that workdiary, i.e. 'WD1-1' refers to entry no.1 of Workdiary 1. Archival reference numbers are attributed within the first <pb> (page break) element at the start of each workdiary. Each <div> is followed by editorial notes, marked up as <note resp="editor">.

Marginal comments in the workdiary manuscripts are encoded using <note>. They are divided into items that were written at the same time as the entry they accompany (e.g. a date, or the source of a quotation) and ones which are retrospective, numbers, summaries of the content of the entry, etc. There are two types of marginalia:

'integral'
Integral marginalia is indicated by <note resp="author" type="integral"> . For integral marginal comments the possible 'type' values are: 'integral', 'integral/date', 'integral/number', 'integral/reference'.
'retrospective'
Retrospective marginalia is indicated by <note resp="author">. For retrospective marginal comments the possible 'type' values are: 'n' (a number that appears in the margin), 'note' (for any miscellaneous or stray marginal memoranda that appear to be in the hand and writing medium of the original scribe), 'reference' (where the bibliographic details of a work referenced are provided), 'date' (where <date resp="author"> and is an authorial date for the workdiary entry), 'endorsement', and 'mark' (for any symbols, ticks, crosses, circles that appear in the marginal text)

The text of the workdiary entry itself is contained within a paragraph element, <p>. Four types of textual emendment are represented in the textual encoding:

Insertions and deletions
All insertions to the text are marked up with <add>. Deleted and altered words and passages are marked up with <del>. If an insertion is in the margin or in-line the 'place' attribute is assigned with the value of either 'margin' or 'line'.
Replacements
Where an insertion replaces a deleted text this is marked up as a replacement with <rep> as the parent element and the child elemnets <add> for the replacing word and <del> for the replaced word.
Alterations
Where a letter or letters of a word have been changed then the entire word, in its final form, is marked up as a correction with <corr>. The description of the alteration is provided in the 'sic' attribute.

Other features of the texts that have been encoded are as follows (a comprehensive account, listing all attributes and values and with a details concerning usage and exceptional cases, is included in the internal project documentation):

<abbr> and <expan>
Abbreviations and the expanded form of an abbreviation.
<damage>, <gap>, <space>, <unclear>, and <supplied>
To encode damage in the manuscript, a gap due to loss or illegibility, a partially illegible section of text, or a section of white space. The <supplied> element encodes words and/or letters supplied by the editor.
<figure> and <figDesc>
To mark the point at which a graphic or diagram occurs in the manuscrpt.
<foreign>
Identifies a word in a different language from the surrounding text.
<handShift>
Indicates a change of scribe in the text.
<head>
Heading of a workdiary or entry.
<hi>
To encode text which is typographically distinct from the body of the text or inline text.
<lb>
To mark up a line break in a prose section of heading.
<lg> and its child <l>
To mark up groups of lines and individual lines of verse.
<list> and its child <item>
To encode lists.
<pb>
Denotes a page break.
<table>, <row>, <cell>
To encode material ordered in the form of a table.
<rs>
To mark up items to be linked to external indexes, such as the bibliography and register of place names.

Image digitisation

The Royal Society Workdiaries manuscripts were digitized by HEDS [new window]; the British Library Workdiary, BL MS Additional MS 4293, was digitized by Reproductions at the British Library [new window]. Royal Society manuscript images were digitized at a resolution of 600dpi in a TIFF lossless image format. British Library manuscripts were digitised at 300dpi also in TIFF format.

Image preparation

The images were batch optimised for the web as compressed JPEG files, 700 pixels wide and at 72dpi using Adobe© Photoshop CS software. A filename for each image was assigned according to its archival reference and to match the references encoded in the XML files, e.g. 'bp027_0001r' which refers to Boyle Papers 27, folio 1 recto. Copyright statements were consistently inserted at the foot of each scanned image according to the following format: British Library Additional MS 4293, fols. 50-53. By permission of the British Library. The Robert Boyle Workdiaries. © 2004 The Royal Society.

CELL web site

Royal Society  web site Birkbeck web site