TEI/EpiDoc provides mechanisms for encoding languages and scripts (writing systems) as they relate to the contents of an EpiDoc file and to the text(s) described and transcribed therein. In doing so, we make use of Internet standards for the identification of these languages and scripts. This portion of the Guidelines addresses all relevant aspects.
TEI and EpiDoc follow the best current practice outlined in the Network Working Group's RFC 5646: Tags for Identifying Languages, which establishes the norms for same on an Internet-wide basis. The RFC and supporting documents define a syntax for creating short strings of characters (‘language tags’) that function as unique identifiers for any desired combination of language and script. These tags are composed of ‘subtags’ for language qua language, writing sysem (script), and regional and dialectical variation. The RFC also establishes a process for registration and maintenance of these subtags by the Internet Assigned Numbers Authority.
A valid EpiDoc file must make use of subtags recorded in the IANA Language Subtag Registry. Many EpiDoc creators will already be familiar with some of these codes from other digital projects, for example:
When the IANA registry does not provide appropriate codes, then an EpiDoc project may devise "private use subtags", so long as they are internally defined in the EpiDoc file as outlined in the following paragraph and so long as they conform syntactically to the specifications laid out in RFC 5646, sections 2.1: Syntax and 4.6: Considerations for Private Use Subtags. For example, the Campā Inscriptions team determined that the two Cham language subtags (cja = Western Cham and cjm = Eastern Cham) and the associated script subtag (Cham) were substantively different from the ancient Cham language and script represented in the inscriptions. Therefore the private use subtag "x-oldcam-latn-ci" was invented and given the project-specific meaning "Old Cam language in Old Cam script transliterated in Latin characters." Whenever possible, EpiDoc projects and practitioners should undertake to register new subtags with the IANA for the benefit of others. A procedure for same is set out in RFC 5646 Section 3.5.
tba
tba
The primary and secondary language(s) of an inscription should be declared in the textLang element contained in the TEI header; the primary language should be contained within the mainLang attribute and any other languages should be listed in the otherLangs attribute. Note that the languages included here should only refer to the languages or writing systems contained within the text itself, rather than those throughout the edition. If the script differs from the default script of the language (e.g., Greek transliterated into Latin), this should also be indicated using the appropriate language codes, where available.
The language·s for a given translation of a text should not be included in the textLang element; instead, they should be included as an xml:lang attribute on <div type="translation"> or nested <div type="textpart">s within it.
A single language should be indicated as the default for a text by including it in the mainLang attribute on the textLang element. The default script for a given language is assumed unless otherwise indicated (i.e. Greek will be written in the Greek alphabet, etc.). When transitions can between languages and/or script are limited in scope (i.e. a single word, a short phrase, etc. in a language differing from that declare in the mainLang attribute in the header), the word or words should be contained within a foreign element (see Multi-Language Texts). Language and/or script may be identified by including the attribute xml:lang. In the following brief example from the US Epigraphy Project, the inscription begins in Latin but transitions to Greek which is written in the Latin script:
If longer passages of the text are in different languages and/or scripts, the default language may be further specified with an xml:lang attribute on the <div type="edition"> and if other sections are in different languages/scripts, these should be declared by adding a xml:lang attribute to the block-level containers of that text (e.g. ab, lg, seg, or <div type="textpart">. (Compare the examples given at Multi-Language Texts under point 2.) One may also indicate shifts in script in the same fashion; that is, a hypothetical inscription that is entirely in Greek but also includes Greek transliterated into the Roman alphabet would declare its default language ("grc") in the xml:lang attribute and transliterated passages would be marked with the xml:lang attribute with a value of "grc-Latn". It may also be desirable to indicate a change in hand when the script and/or language changes using the handShift element, if such a shift is discernible; further details about the script can be included in the handNote elements of the manuscript description. Compare the following example from the US Epigraphy Project, which begins in Latin and then contains a translation of the text in Greek:
Other pages describing <langUsage>:
Other pages describing <language>:
Other pages describing <textLang>:
Other pages describing <foreign>: