Managing Whitespace in Transcriptions

2024-03-20

In diplomatic and edited transcriptions, default Epidoc whitespace handling may not always produce expected results. Problems arise in several situations, the most common of which is spaces that appear before or after a phrase-level element. XML treats whitespace differently when it occurs inside an element that is defined to contain only subelements and no free text, and elements whose definition allows a mixture of text and other elements. In the first case, whitespace around the contained elements will be ignored. In the second case, spaces and returns will be treated as actual text, and will appear in formatted output as a single space. This is complicated by the way that editors such as oXygen XML handle wrapping of lines - lines that are wrapped for readability do not have space or linefeed characters separating them, they just appear to.

In the examples below, expan takes subelements abbr and ex as well as free text. In the second case, spaces before or after the subelements result in spaces appearing in the output.

        <expan><abbr>Aug</abbr><ex>ustus</ex></expan>     

Transformation using the EpiDoc Reference stylesheets:

  • Default (Panciera) style: Aug(ustus)
        <expan>             <abbr>Aug</abbr>             <ex>ustus</ex>         </expan>     

Transformation using the EpiDoc Reference stylesheets:

  • Default (Panciera) style: Aug (ustus)

The following example will have no spaces around any of the parts of the abbreviation. However, if there were any line feeds inside expan they would be preserved.

        <expan>Καρ<ex>ανίδι</ex></expan>     

Transformation using the EpiDoc Reference stylesheets:

  • Duke Databank style: Καρ(ανίδι)
(DDbDP: bgu.1.154)

If spaces appear in formatted Epidoc output where they don't belong, the first thing to do is to check the XML source files for spaces or linefeeds have been inserted during the encoding process, often for readability.

In some cases, for example when encoding inscriptions in Hebrew or Arabic that use a right-to-left writing system, it is necessary to use line feeds before tags, in order to preserve directionality. When preserving whitespace exactly as it is entered is important, you must set the <xsl:preserve-space> property at the beginning of your xsl transformation file and identify the elements to which this applies. Then, you can manage the whitespace explicitly during the formatting process.

Responsibility for this section

  1. Elli Mylonas, author
  2. Gabriel Bodard, author

EpiDoc version: 9.6

Date: 2024-03-20