Default Transcription Guidelines

Every new community is established with a default document type definition (DTD), javascript and css files, which control how you transcribe and how the transcription appears in the editor 'preview' window.  See the entry 'Default files' for information on these pre-provided files.

This section defines a set of transcription guidelines designed to work with the default files.  It is a subset of the 'P5' Text Encoding Initiative guidelines,  and any transcription made according to these guidelines will be TEI conformant.  

Text structure #

Textual Communities expects, as a default, transcriptions to be contained in TEI <text> elements, made up of <front><body> and <back> elements. <body> is compulsory, and should normally contain the main transcription; <front> and <back> are optional, and might be used (for example) to encode the introductory and end matter of a document.

<front> <body> and <back> may contain either <div1> or <div> elements (and, all the other elements the TEI permit here).  In a standard implementation, <div> and <div1> might contain the common TEI text-holding elements, <head> <p> <ab> (for prose) or <l> (for poetry, often within <lg> for poetic stanzas.). We support all these in elements in our default encoding, and many more: in essence, all the elements listed in the TEI   P5 chapter 'Elements available in all TEI documents'. Thus, we could display the following text fragment: 

         <div n="Chapter 1">
              <head n="h1">This is a heading</head>
              <p n="p1">This is a paragraph</p>

See "Encoding of entities" for the importance of the 'n' attributes, as used on the <div><head> and <p> elements here. In the terminology of the Textual Communities project, all the elements in this fragment (and, indeed, all content-holding elements in the TEI system) are potentially 'entities': that is, elements which contain a single communicative act.  For the importance of 'communicative act' in the Textual Communities system see "Entities and communicative acts".

Document structure #

The texts we transcribe all exist on physical objects, most commonly on pages of print, or manuscript, with the pages themselves divided sometimes into columns, and almost always into lines. Textual Communities uses the standard TEI elements for page, column and line to represent these: thus <pb/> <cb/> and <lb/>.

Notice the difference between the encoding of the <p> elements above (thus: '<p n="p1">This is a paragraph</p>') and these document structure elements, <pb /> etc.  The <p> element has a 'start tag' (<p n="p1">) and an 'end tag' (</p>).  You can place content between the start and end tags, in this case the sentence 'This is a paragraph'. However, the document structure elements combine the start and end tag into one: thus <pb />. Accordingly, these are 'empty' elements: they may have attributes (<pb n="1" />) but cannot contain any text. Rather, they are used as 'milestones', inserted into the stream of text to make a page-break, a line-break, a column-break.  Thus, one might add page-break and line-break information to the sample above, as follows:

        <pb n="1r" facs="1r.jpg" />
        <lb />
        <div n="Chapter 1">
              <head n="h1">This is a
                     <pb n="1v" facs="1v.jpg" />
                    <lb />heading</head>
                    <lb />
              <p n="p1">This is a paragraph</p>

Thus: the text starts on page 1r, with the words 'This is a' as a new line on the page.  A new page (1v) starts after these words, with 'heading' on a line on its own and 'This is a paragraph' appearing on another line. 

One might ask: why not make the <pb> <cb> and <lb> elements content-containing too? This would give encoding as follows:

        <pb n="1r" facs="1r.jpg">
        <div n="Chapter 1">
           <head n="h1">This is a </lb></pb>
              <pb n="1v" facs="1v.jpg" >
              <p n="p1">This is a paragraph</p></lb></pb>

However, the rules of XML forbid this approach. In XML, elements cannot 'overlap'.  Here, we see that the first <pb> element overlaps with the <div> element (and its contents) so that the document structure is <pb><div>...</pb> ...</div>.  That is: the <pb> element opens, then the <div>, and the <pb> closes before the <div> does.  One can evade this problem by making one of the sets of elements (either the 'text' or the 'document' set) empty, so that it does not contain anything.  The TEI makes this easy by declaring elements such as <pb /> <cb /> <lb /> as empty, and that is how we use them.

Line breaks #

The most common situation is a simple line break occuring between words: use simple <lb />.

Where the line break occurs within a word, with no indication in the text that this falls within the word: use <lb break="no" />.

Where the line break occurs within a word, with a hyphen or other device used to indicate that this falls within a word: use <lb rend="hyphen" break="no"/>

The use of the 'break="no"' attribute tells the processor (and hence the collation system) that the letters each side of the <lb> element belong to the same word.  Thus, 'line<lb break="no"/>break' is understood as 'linebreak'; 'line<lb />break' is understood as 'line break'.  'line<lb break="no" rend="hyphen"/>break' is rendered as 'line-break' in preview, and understood as 'linebreak' in the collation.

Text appearance: italic, bold, underlining, underdotting, etc. (<hi>) #

Now we have the basic structure of the text encoded, we will want to add detail about the text we are transcribing.  Particularly, we want to encode the appearance of the text on the page. We use the standard TEI <hi> tag for this, with attributes as follows:

  •  italic:  <hi rend="ital">italic</hi>
  • bold<hi rend="bold">bold</hi>
  • superscript: <hi rend="sup">superscript</hi>
  • underdot: <hi rend="ud">underdot</hi>
  • strike through: <hi rend="strike">strike through</hi>
  • underline:  <hi rend="ul">underline</hi>    Note that you can combine attributes, as follows:
  •  bold italic:  <hi rend="bold italic">bold italic</hi>
  • italic strike through: <hi rend="strike italic">strike through</hi>

See alse "Preview, TEI rend and HTML classes'.

Abbreviations, expansions, corrections, regularized spellings #

By default, Textual Communities supports the system prescribed by the P5 version of the TEI guidelines (post-2007), using the <choice> element:

  • Abbreviations/expansions: for 'with' -- <choice><abbr>wt</abbr><expan>with</expan></choice> OR w<choice><am>t</am><ex>ith</ex></choice>
  • Corrected/uncorrected text: for 'friend', mis-spelt 'freind' --<choice><sic>freind</sic><corr>friend</corr></choice>
  • Regularized/original spelling text: for 'friend', regularized from 'freend' -- <choice><reg>friend</reg><orig>freend</orig></choice>

For concision, the <choice> element may be omitted.  Thus, abbreviated 'with' can be represented by w<am>t</am><ex>ith</ex>, using <am> to indicate the mark of abbreviation, <ex> the expansion.  Note that you cannot place a <hi> element within <am>. If you want to indicate that the 't' is superscript, use <am rend="sup">t</am>.

If you use this system, the default preview will give you a choice between 'Diplomatic' and 'Edited' views, toggling between the text with abbreviations, uncorrected and in original spelling, and the text with abbreviations expanded, corrected and in regularized spelling. (Textual Communities, like TEI P5, does not support the older 'Janus'-style elements, eg w<abbr expan="ith">t</abbr>).

Unreadable, unclear, supplied or damaged text #

  •  <gap quantity="4" unit="chars" reason="illegible" />: you cannot read the text at all: (four characters unreadable>; <gap quantity="4" unit="lines" reason="illegible" /> (four lines unreadable)
  •  <unclear reason="damage">damaged text</unclear>:You can read the text, but there is some degree of uncertainty about the transcription
    • <supplied reason="illegible">supplied text</supplied>:Text is supplied
  • <damage agent="water"><gap quantity="4" unit="chars" /></damage>:The document is damaged, and you cannot read it at this point
    • <damage agent="water"><unclear>damaged text</unclear></damage>:The document is damaged, but you can read with some degree of certainty 
  • <damage agent="water"><supplied>supplied text</supplied></damage>: The document is damaged, and you have provided some text at this point

Space #

To indicate empty space in the source text (for example, left for a word to be filled in later: <space quantity="1" unit="chars"/>

The default value of "unit" is "chars", so one could simply write <space quantity="1"/>.  If you want to indicate that an empty space of a number of lines, use <space quantity="1" unit="lines"/>.

Editorial notes #

<note type="ed" resp="PMR">An editorial note</note>

Editorial notes will appear as footnotes in the preview window, with a hypertext link to the note appearing in the text.

Tables #

Textual Communities supports standard TEI <table> elements, with <row> and <cell> elements.  In Preview, these are mapped to HTML <table><tr> and <td> elements.  The TEI attributes 'cols' and 'rows', indicating that a cell should occupy a given number of columns or rows, is mapped to the HTML colspan and rowspan attributes.  Thus:

           <row><cell cols="2">occupies two columns</cell></row>

will be rendered in Preview as <table><tr><td colspan="2">occupies two columns</td></tr></table>.

The default css file used by preview provides special renditions for the td (=cell) element, thus:

  • circsmall: puts the cell in a small circle (usually, one row and one column)
  • circlarge: puts the cell in a large circle (usually, two rows and two columns, declared using the cols and rows attributes)
  • squareborder: places the cell in a rectangular bordered box.

Thus, <cell cols="2" rows="2" rend="circlarge"> places the cell in a large circle, extending over two columns and rows.

Marginalia #

In left or right margins:

  • <note place="margin">Marginalia</note>: text written in the right margin of the source text
  •  <note place="margin-left">Marginalia</note>: left margin
  •  <note place="margin-right">Marginalia</note>: right margin

Preview will attempt to place the marginal text in the left or right margin, beside the adjacent source text.

In top or bottom margins:

  • <note place="tm">Marginalia</note>: text written in the top margin of the source text, centre
  • <note place="tl">Marginalia</note>:top margin, left
  • <note place="tr">Marginalia</note>:top margin, right
  • <note place="bm">Marginalia</note>:bottom margin, centre
  • <note place="bl">Marginalia</note>:bottom margin, left
  • <note place="br">Marginalia</note>:bottom margin, right

Preview will place top or bottom marginal text above or below the source text.

Running headers, footers, catchwords, signatures #

We use the <fw> ('forme work' -- a forme is a body of type assembled for printing) element for these:

  • <fw type="sig" place="bm">Signature</fw>: a signature, in the bottom margin, centre
  • <fw type="header" place="tm">Header</fw>: a header, in the top margin, centre
  • <fw type="catch" place="br">Catchword</fw>: a catchword, bottom right
  • <fw type="footer" place="bl">Footer</fw>: a footer, bottom left
  • <fw type="pageNum" place="tr">1</fw>: a page number, top right

Scribal changes #

Where a scribe has changed the text, we want to record:

  • Exactly how the source text appears
  • How the text is to be read before and after the change

We use the TEI <app> element, with <rdg> elements:

    <rdg type="lit"><hi rend="ud">original</hi><hi rend="il">\changed/</hi></rdg>
    <rdg type="orig">original</rdg>
    <rdg type="c0">changed</rdg>
  • <rdg type="lit">: encode how the text of this sequence appears
  • <rdg type="orig">: record how the text read before the change
  • <rdg type="c0">: record how the text read after the change

Type=c0: a change by the original scribe, made during the original writing.

Type=c1: a change by corrector 1 (and, c2, c3 c4 etc.)

Note that we can chain a series of revisions together using this mechanism. Suppose the original scribe leaves a space, a corrector supplies a reading, and a second corrector deletes it and provides an alternative between the lines:

    <rdg type="lit"><space quantity="8"/><hi rend="strike">supplied</hi><hi rend="il">\changed/</hi></rdg>
    <rdg type="orig"></rdg>
    <rdg type="c1">supplied</rdg>
    <rdg type="c2">changed</rdg>

Preview will  create an option menu which allows you to see the text with any one of the layers of readings declared in the type attribute.  In default view, it will present the 'lit' rendition of the text, with other views (here, 'orig', 'c1', 'c2') available by the option menu at the top of the Preview screen.

Note that we do NOT recommend use of the TEI <add><del> and <subst> elements.  These confuse appearance with meaning, where the suggested encoding (as originally developed by Barbara Bordelejo for Shaw's edition of Dante's Commedia) cleanly separates meaning from appearance.

Preview, TEI rend and HTML classes #

In the Preview view, Textual Communities automatically maps the universal TEI 'rend' attribute to a 'class' attribute on the element. Thus, TEI <p rend="center"> becomes <p class="center">.  The default css file used by Preview defines  classes, as follows:

 .italic {   font-style: italic;}
 .ud {    border-bottom: dotted;}
 .ital {   font-style: italic;}
 .ul {    text-decoration: underline;}
 .ol {    text-decoration: overline;}
 .strike {    text-decoration: line-through;}
 .bold {    font-weight:bold;}
 .b {    font-weight:bold;}
 .left (text-align: left}
 .sup  {    font-size: 50%; vertical-align: top;}
 .superscript  {    font-size: 50%; vertical-align: top;}
 .right (text-align: right}
 .justify (text-align: justify}
 .il {position: relative;  display: inline-block; width: 0; font-size: 12px;  white-space: nowrap; margin-top: 15px; top: -14px;} 
0 Attachments
Average (0 Votes)
No comments yet. Be the first.