First transcription page (2)

Line 33:

We ignore the dot over the y in all cases (purely decorative)

Line 52:

You transcribe: pheb<am>ȝ</am><ex>us</ex>.  I think this is a capital P -- but the treatment of the abbreviation is perfect

Line 53:

You transcribe: plesa<am>n̄</am><ex>un</ex>ce. Not quite right! see the special treatment of u/n abbreviation in the quick start guide.  This should be plesa<am rend="n̄">ıı̄</am><ex>un</ex>ce (the abbreviation is of two minims with a macron, appearing as n̄).

line 54:

You have: gou<am>̉</am><ex>er</ex>nance.  The first abbreviation is correct, but the macron over n needs different treatment: gou<am>̉</am><ex>er</ex>na<am>n̄</am><ex>un</ex>ce

line 55:

You transcribe: grace.  But there is abbreviation of /ra/ here.  So g<am>ᵃ</am><ex>ra</ex>ce (also enbrace on the next line)

line 56:

quite correctly: you ignore the tail on /it/, as it is attached to the t and therefore decorative. However, you transcribe woot/  instead of woot /. Our practice is always to separate the virgule from the preceding or following word if it is separated and hence punctuation, not decoration.

line 65:

As per our guidelines: the first letter of each line is always a capital (except in a very few manuscripts). So this should be Ȝet


First transcription page

line 142

You transcribe as my<am>̄</am><ex>n</ex>.  Well spotted! yes, this very likely is abbreviation of n over y.  Perfect

However: note the space between the two words.  Not my<am>̄</am><ex>n</ex>herte but my<am>̄</am><ex>n</ex> herte

line 144

You have: q<am>ᵈ</am><ex>uod</ex>. Perfect!

line 149

You miss this.  Here is a classic case of the treatment of u/n + macron. The final letter+macron could actually be u+macron (with the u written as a n) abbreviating -un-.  Hence, this transcription:

reputacio<am rend="n̄">ıı̄</am><ex>un</ex>.  Looks bizarre! but see the discussion in this blog. The same is true of comparison, in the next line, and Scorpion/confusion, later on.

line 152

You have: wyff/  However, we separate the mark of punctuation from the preceding letter, so wyff / (see on line 176 below)

line 160

You have: þ<am>̉</am><ex>er</ex>(0309)-Inne.  Almost correct! but we don't need the (0309) -- I've removed that from the quick start page now. And we don't need the modern hyphen.  One cannot be sure whether there are two words or one here: either  þ<am>̉</am><ex>er</ex>Inne or  þ<am>̉</am><ex>er</ex> Inne would be correct.  Similarly the next line: y-slayn should be just y slayn

line 176

here, the tail/virgule is definitely attached to the final t of the preceding word.  In that case, we treat as decoration and ignore.


New transcribers!

A key part of the design of this new phase of the project is that we would open it up so that instead of all our transcribers coming from academic partnerships (such as those we have had in the past at Brigham Young, or New York University, or Münster, etc), now they can come from anywhere.  Especially, we want to let anyone who wants to contribute, from anywhere at all, join the project. That is the whole reason we designed Textual Communities the way we did (which only took five years work -- lol, as they say).

Now, we have taken a great step to making this actually happen.  On Saturday 18 April I gave a talk at the one day annual Chaucer Canada conference at the University of Toronto. Effectively, this was the official launch of this new, Canadian-based/Belgium partnered, phase of the project: the first time I have stood in public and said, this is what we are doing.  Those who know me, know that I hate the all-too-common digital humanities "prototype game": where you stand in public and show a mock-up of something you might like to make, in some dream or other, and pretend you might actually do it. So we did not tell people about this until it was really working and ready.

The reaction was all I could have asked for.  People were blown away (as they should be!) by Colin Gibbings' marvellous performance as Geoffrey Chaucer. And lots of people expressed curiosity about the crowd-sourcing approach.  Especially, Daniele Cybulski of approached me and said she would like to do an article on us.  And indeed she did, and the result "Be a part of Chaucer's Tale"  is really excellent.  

Following publication of this article: several people wrote to me and asked about becoming part of the project.  Welcome, then, Stephen Yeager, Kris Kobold, Ken Fasano! more are coming...


Second USask meeting -- macrons and more

On Friday 21 November we had, as promised, a marathon discussion of transcription principles, followed by a transcription session as we tried to make the system work. Present were: Kyle Dase, Murray Melymick, Peter Robinson (the irrelevant), Brendan Swalm, Aaron Thacker, Adam Vazquez, Megan Wall. Barbara Bordalejo skyped in from Leuven for a time.

From left: Peter Robinson, Murray Melymik, Megan Wall, Adam Vazquez (gesturing), Kyle Dase, Brendan Swalm, Aaron Thacker (photo by Jon Bath)

PR began by outlining the problems we have with abbreviations, and (more generally) the shifts in thinking underlining our movement towards a new transcription policy. He focussed on final u/n+macron, explaining the following three cases:

  1. In the vast majority of instances, the n+macron simply means final n. So (thousands and thousands of times) in words like on, upon, slepen, and spellings with final -oun, like condicioun, gypoun, etc etc
  2. In a few cases, u+macron (which might actually appear as n+macron) is certainly an abbreviation of final n.  Thus in the adverb "doun", spelt dou+macron (which might actually appear as don+macron)
  3. In a large number of cases, final u/n+macron might or might not indicate u+n. Thus "condicion+macron", where the final letter might be n, in which case the macron indicates nothing (case 1) or might be u, in which case the macron indicates final n (case 2)

There appear to be three options for dealing with this range of situations:

  1. Ignore all macrons. But this would not deal with cases in 2, where abbreviation is present.
  2. Ignore all macrons except those in case 2.  But this would give the completely misleading impression that scribes only use u/n macron to indicate abbreviation
  3. Transcribe all macrons the same way.  This would be simple and consistent, and represent what is in the manuscripts, but would tell the reader little except that there are a lot of macrons in the manuscript.
  4. Figure out some way of transcribing the manuscripts which takes account of the different situations.

It appears that 4 is the way to go. That is: we want to distinguish the three cases we outline above: when n+macron means just n; when it definitely indicates abbreviation; when it might indicate abbreviation.  There is a further complication, which Barbara has argued we incorporate in our transcription.  The two letters n/u are commonly written as two minims.  These two minims are written in one of three ways:

  1. They are joined at the top, in which case they look like an n;
  2. They are joined at the bottom, in which case they look like a u;
  3. They are not joined at either, in which case they look simply like two minims.

In a perfect world, we might find case 1 every time the context demands "n", case 2 in every case the context demands "u", and we would never see case 3. This is not a perfect world. A glance over any page of any manuscript shows that we find everywhere two minims which look like "n" where the context demands "u", two minims which look like "u" where the context demands "n", and two minims which look like neither. Usually we do what transcribers have always done: we transcribe by context, so if the context demands "n" we transcribe it as "n" even if what is written is clearly a "u". We make a similar decision in cases of e/o and y/thorn, which in some scribes are often completely indistinguishable.

One could argue that we should give this information, about how EVERY n/u are written, in our transcription.  This would take immense resources: it would slow up transcription; it would likely increase our error rate as transcribers look closely at every u/n and fail to see other things. However, there is a strong case for recording exactly how the minims are written in the instances where there is clear ambiguity: case three above, and (possibly) case two above.

Here is how it could work in case three. Here's what we see in the manuscript (Hg):

This could either be u+macron, which would be abbreviation of final n, or n+macron, which is nothing. And, it appears to be written as two minims, joined neither at top or bottom. We could encode this something like:


In Bo1, the same word appears as:

This we could encode as:

<am rend="n̄">ıı̄</am><ex>un</ex>

In Fi, we see:

This we could encode as:

<am rend="ū">ıı̄</am><ex>un</ex>

There are alternative ways of achieving the same encoding. One might use, instead of the constructs <am rend="u">ıı̄</am> and <am rend="n">ıı̄</am>, <am>ū</am> and <am>n̄</am>.  While more compact than the use of the "rend" attribute, this has the problem of asserting that what is written is "u" or "n" where our point is that this is two minims joined together at the top or bottom: a subtle but critical distinction. Alternatively, Barbara suggests use of the TEI <glyph> and <g> mechanism. This has two parts:

  • <glyph>: which defines a character, or combination of characters, not available on the standard unicode/etc character set.  Hence: <glyph xml:id="umac"><glyphName>Two minims joined at the base with a macron above</glyphName></glyph>
  • <g>: which inserts the character, thus: <am><g ref="umac">u</g></am><ex>un</ex>

Note that all three methods (<am rend="ū">ıı̄</am>; <am>ū</am>; <am><g ref="umac">ū</g></am>) are completely interchangeable, with no loss of information in conversion from one to the other.

The arguments for using this mechanism with case 3 above seem clear: one might expect analysis of the distribution of the graphetes to case light on the ambiguity. It could be argued that we should use this mechanism too in case 2, for purposes of comparison with the distribution of these graphetes with case 3. On the other hand: we may argue that the advantages of using this mechanism for case 1, where the macron is simply redundant, do not warrant the effort of encoding these many thousand cases.

Exactly what are we transcribing?

In the course of the marathon transcription meeting: several participants asked searching questions about exactly what it is we are doing. Kyle, Aaron and Adam, particularly, focussed on the authority of the transcript, and their relationship to the manuscripts we are transcribing. So here are some thoughts. The fundamental document here is the description of what we are trying to do written by myself and Elizabeth Solopova, first published in the first Project Occasional Papers in 1993 and available at 

In the meeting, I drew everyone's attention to the key account of the encoding of the Commedia manuscripts, written by Barbara Bordalejo and available at (user name and password DRCUSASK). Here is the crucial sentence, where Barbara describes "the text of the document":

In this article, I use the phrase the “text of the document” to refer to the sequence of marks present in the document, independently of whether these represent a complete, meaningful text. That is: the reader sees a sequence of letters, occurring in various places in relation to each other (perhaps between the lines or within the margins) and carrying various markings (perhaps underdottings or strikethroughs). These make up what I here refer to as the text of the document.

That is: our first task as transcribers is to record the potentially meaningful marks in the document: together these constitute what she calls "the text of the document". Note that this requires an initial filtering.  We will ignore those marks we do not regard as "potentially meaningful". Accordingly we do not transcribe tails on letters, bars through h and l, or a multitude of decorative marks and other scribal writings. We considered regarding macrons as not potentially meaningful too: but decided in the event that these are potentially meaningful and so must be transcribed. 

However, our transcription does more than simply record potentially meaningful marks.  What we are presenting are not just marks but an act of communication: a "text". Accordingly (adapting the Robinson/Solopova phrasing), we "decode" as well as "encode". There are three fundamental elements in any act of communication: the message the original author (or scribe) intended; the medium used for the communication; the reader/listener/audience receiving the communication. In our case: we do not have any access to Chaucer or his scribes. One of the three elements is missing.  However, we do have the medium, in the actual manuscipt pages and their digital images.  And we have ourselves. Accordingly: we may never claim that we are presented "what Chaucer intended" or "what the scribe meant". We may claim that we are presenting our interpretation of those potentially meaningful marks. 

Accordingly, our transcriptions are not statements about what Chaucer or his scribes intended.  They are our attempts to create acts of communication which might be useful to ourselves and others. This is how Solopova and I put it:

Transcription for the computer is a fundamentally interpretative activity, composed of a series of acts of translation from one system of signs (that of the manuscript) to another (that of the computer). Accordingly, our transcripts are best judged on how useful they will be for others, rather than as an attempt to achieve a definitive transcription of these manuscripts. Will the distinctions we make in these transcripts and the information we record provide a base for work by other scholars? How might our transcripts be improved, to meet the needs of scholars now and to come? 

And later:

Transcription is both decoding and encoding; the text in the computer system will not be the same as the text of the primary source. Accordingly, transcription of a primary textual source cannot be regarded as an act of substitution, but as a series of acts of translation from one semiotic system(that of the primary source) to another semiotic system (that of the computer). Like all acts of translation, it must be seen as fundamentally incomplete and fundamentally interpretative.

After all these years, these formulations appear to put it well. In essence, our aims are heuristic and pragmatic.  Like a translation, we do not aim to be complete, or to act as a surrogate for the original. We aim to be useful. We mean several things by "useful". First, we want to be useful for ourselves:

  •  For the collation of all the manuscript and incunable versions. To be efficient, we need to encode each line, using the TEI/XML <l n="x"> mechanism. This will allow us (or anyone else) to locate all versions of (say) line 1 of the General Prologue of the Tales in all the witnesses. 
  • It is helpful, as we transcribe, to be able to see the transcription and original page image line by line. Hence, we include page, column and line information, using the TEI/XML <pb/> <cb/> and <lb/> elements.  Later, Textual Communities will align text and images much more closely, so we can see individual lines in transcript and image.

However: we want other people to make the best use they can of our transcripts. For example: for our purposes, most (if not all) of the information about exactly how manuscripts are written is superfluous to the central goal of our research, determining the stemmatic relations of the manuscripts. So, all the precise information about how this word or that is spelt, or exactly what is abbreviated and how, is removed in the collation process. But representation of the exact spelling of the manuscripts is crucial for many other kinds of analysis. Accordingly, we do include far more information about the spellings than we need for our own immediate purposes. Thus:

  • We now identify all cases where we think there is abbreviation, record the abbreviation, and provide the expansion: thus "<am>ꝑ</am><ex>per</ex>fect".  This is a significant shift in this phase of the project: we used to show only the mark of abbreviation, without saying whether there was an expansion or not (thus, "ꝑfect"). We made this change because we believe transcriptions which show both original spelling (as we have always done) and also expand abbreviations so they are easier to read will be more useful.
  • We have always included, and will continue to include, information about manuscript layout: ornamental capitals, running heads, scribal notes, catchwords, etc. Again, we have little use for this information ourselves.  But other scholars are interested in this.

At one point in the meeting on Friday 21 November, Kyle Dase produced a very sharp one sentence summary of what we are trying to do.  Here is my guess as to what he said: that our transcripts give our sense of how these manuscripts might be usefully read.  (Please correct my memory, Kyle).

First USask meeting

The first USask team meeting was held on Friday 10 October, in the DRC in USask/Volmolenlaan 9, Leuven. Present were: Kailey Christianson, Kyle Dase, Murray Melymick, Megan Wall, Adam Vasquez (in the DRC), Barbara Bordalejo and Peter Robinson (Leuven).
Brief minutes...
PR warned of the dangers to relationships of obsessive discussion of manuscript spacing and abbreviation marks. BB and PR then proceeded to demonstrate these dangers with an animated discussion of the problems of transcribing final u/n with a macron. (is this an abbreviation of final n, so that the minims are really a u, not an n? or just a decorative mark, so the n is really an n? or a bird? or a plane?). To be continued.
BB will assign additional pages for transcription to all present. All will register problems, questions, etc, using this bulletin board.
Next meeting -- DRC Friday 17 October.

— 20 Items per Page
Showing 6 results.
Peter Robinson
Posts: 2
Stars: 1
Date: 1/9/16
Peter Robinson
Posts: 6
Stars: 0
Date: 5/6/15