The concepts behind EpiDoc bring together, for epigraphers, traditional and entirely modern editorial methods and conventions.
Over the last century, epigraphers have wrestled with the issues involved in representing non-verbal information within their written texts. Until the end of the 19th century publishers could be expected to produce a facsimile of the text, but this became decreasingly common, and publishers did not demonstrate a parallel willingness to provide a full photographic record of every text. The conventions which have been painfully developed to indicate missing text, abbreviations, etc. have been more or less generally agreed since the 1930s and overlap, to some extent, with those used in papyrology and palaeography. All epigraphers have had to deal with the issues involved in moving this to an electronic environment—for example, finding a font which will permit underdotting; but most of us have now adjusted to these new constraints.
The difficulty of rendering such conventions and, in particular, Greek characters in consistent fonts and on the Web has tended to delay the publication of full epigraphic texts online; instead, enormously rich search collections have been created, most notably:
See also: Conformance (EpiDoc Compatibility).
All these developments have therefore been determined by the state of the existing technologies, as these have evolved over the 20th century. The object of EpiDoc is to exploit new and rich technologies for the traditional purposes of epigraphy. Many of the processes described above have involved struggling agaist the technological standards—for example, in print publications—in order to accommodate as many of our requirements as possible. Over the period it has become steadily more difficult to persuade conventional publishers to meet our requirements for inserting meta-textual information, unless at prohibitive expense. At the same time, the expectations as to the volume of information which should accompany a text have risen greatly; as well as information about physical circumstances, photographic illustration has become standard.
In the last 15 years scholars generally have been dealing with similar requirements to incorporate meta-data within texts in their electronic form, and tools have emerged which make this increasingly easy and make the results increasingly valuable. The word-processing software that has been familiar since the 1980s allows us to control the formatting of our texts, using markup which by now is invisibly embedded by the software. The more demanding requirements of large-scale document collections—legal papers, industrial documentation, commercial publishing—led in the 1980s to the investigation of ways to insert a wider range of information and instructions within electronic texts. At first the emphasis was on inserting formatting instructions, but there soon emerged methods of including more complex semantic information concerning document structure and even content. A simple example is marking up a book title as a title, rather than simply marking it as being italicized. The use of this more abstract markup permits a separation of structure and presentation, where structure is comparatively fundamental to the document's genre, while presentation may be varied depending on the form of publication. In a way, this shift represents a return to an earlier mode in which authors dealt with the substance of the text and all details of presentation were handled in the publishing process—a distinction which has been lost in the days of camera-ready copy.
The protocols which emerged from this latter effort were standardized in the late 1980s as the Standard Generalized Markup Language, and more recently were given a simpler and more flexible form for use in the world-wide web as XML: the Extensible Markup Language. XML is now widely used by scholars in a broad range of humanistic disciplines to capture/represent and preserve research materials for a wide range of purposes.
The attractions of XML for epigraphers are therefore considerable. For example: missing material can be marked up as such, and then presented within square brackets; at the same time, a search can be instructed only to interrogate text which has not been marked up as missing (so only definitely attested terms). Uncertain letters can be marked up as such, and a decision made later as to whether to render them with an underdot or in another way. Words can be lemmatised during editing, to create indices which grow as the collection grows. What is important, however, at this juncture is to repeat the 'Leyden' exercise; that is, to agree electronic equivalents of the various sigla that we use. Firstly, this is valuable simply in order to save time and trouble; but also consistency—without imposing uniformity—continues to be valuable. Not only does it support the user, as on the printed page; but documents edited in this way and published electronically will then be exploitable together, even if they have been prepared separately.
The need for agreed standards is not limited to epigraphy. Since 1987 an international consortium of scholars principally in the humanities has been working together to develop and refine a set of guidelines for describing the structure and content of documents. The results of this endeavour have produced an encoding language, realized in XML and described by the name of the group—TEI, the Text Encoding Initiatve.
The Text Encoding Initiative is a research effort aimed at defining an encoding language that encompasses the needs of humanities scholars very generally. There are two essential goals motivating the development of the TEI. The first is to enable scholars to represent their research materials in digital form using a descriptive language that mirrors the kinds of analytical terms and concepts that are familiar and essential to humanistic scholarship. The second goal is to enable scholars to share the resulting materials intelligibly, by using a common descriptive language.
We can think of the TEI encoding language as resembling a human language: a core of shared terms at the center, surrounded by less widely shared vocabularies including local usage, specialized terminology, and other variations. At the core of the TEI are the common terms and concepts that are broadly shared by scholars in most disciplines: features like paragraphs, generic textual divisions, headings, lists, and so forth. More specialized elements are grouped together according to their applications: for instance, elements for detailed encoding of names, elements for representing features of manuscripts, elements for capturing the structure of dictionaries, and so forth. The TEI is intentionally organized into modules in this way, so that scholars working in specific disciplinary areas may use only the modules relevant to their work, and omit the others. The TEI can thus achieve a great deal of breadth without burdening individual scholars and projects with the necessity of mastering a very large domain, much of which is only relevant to other disciplines. On the contrary, the TEI encoding language can be very directly targeted at a specific domain or task, and can be limited to what is essential to the individual project's work.
Like a human language, TEI can be used in a way that draws on a rich and nuanced vocabulary, with detailed encoding that describes a great many textual phenomena, but it can also be used very simply, using only a few essential concepts that describe only the most basic textual facts: sections, headings, paragraphs. The more detailed the encoding, the more one can do with it, but factors such as time, cost, available staff, and local expertise may place constraints on the level of detail that is feasible.
In addition to providing an encoding system that scholars can use in its original state, the TEI also provides a way for scholarly projects to define custom versions of the TEI language which include modifications that are necessary to support specific local needs. Because these custom versions operate within the overall TEI framework, they can use its broadly shared core of terms and concepts, thereby avoiding unnecessary work in reinventing these. And because the TEI provides a common framework for creating and describing customizations, these can be shared easily and meaningfully. As a result, groups of scholars in particular disciplines can articulate the specific goals and methods which characterize their work, and the differences which distinguish them from others working in related fields. Instead of mutually unintelligible approaches, different projects can produce results whose differences result from real disagreements rather than simple random divergence.
Within this framework, the EpiDoc community has been working, since 2000, to develop a custom version of the TEI Guidelines to support the particular needs of epigraphers. The idea was launched by Tom Elliott, an ancient historian at the University of North Carolina at Chapel Hill; the aim is both to make the fullest possible use of the work that has already been done, and also to ensure that texts which happen to be inscribed are handled in a manner consistent with that used for other texts, and not distanced from them. The EpiDoc customization removes irrelevant elements from the main body of the TEI, and it adds provisions for the specific kinds of transcription, analysis, description, and classification that are essential for epigraphic work. The result is a simple yet powerful language which can be used to mark all of the significant features of inscriptions and also represent the accompanying information about the epigraphic object itself.
To accompany the EpiDoc encoding language, the EpiDoc community has also produced a set of encoding guidelines and software tools, as well as documentation which describes how to use the encoding language, the tools, and the other elements of the EpiDoc method. The goal is to establish a framework that is easy to learn and use, even for scholars with no technical background or support. This may sound improbable, but the enterprise is of the same order as learning to mark-up a standard epigraphic text, with the existing series of sigla.
The group has worked to develop expressions for all the agreed epigraphic conventions. They have expanded this guidance to address the various fields which may be presented in an epigraphic publication, including:
See further: Document Structure.
Further areas under active exploration include developing interoperability. A software tool for converting texts in normal epigraphic markup into EpiDoc XML has already been developed (the so-called Chapel Hill Electronic Text Converter (CHETC)). Other areas involve the use of authoritative lexica. For example, the Inscriptions of Aphrodisias project is working closely with the Lexicon of Greek Personal Names, to ensure full coverage and consistent usage.
The work, led by Dr. Elliott, has been undertaken by various individual scholars working in close collaboration, and in regular contact with the wider profession. They have drawn on the experience of an established EpiDoc project, the Vindolanda Tablets on line, and two current projects: the US Epigraphy Project (USEP) (supported by Brown, Princeton and Rutgers Universities), and the Inscriptions of Aphrodisias Project (InsAph) (supported by the Arts and Humanities Research Council). The generous support of the AHRC allowed for the intensive workshop in March 2006 where these guidelines were refined.