Attributes | lemma | provides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.Status | Optional | Datatype | teidata.text | <w lemma="wife">wives</w> | <w lemma="Arznei">Artzeneyen</w> |
| lemmaRef | provides a pointer to a definition of the lemma for the word, for example in an online lexicon.Status | Optional | Datatype | teidata.pointer | <w type="verb" lemma="hit" lemmaRef="http://www.example.com/lexicon/hitvb.xml">hitt<m type="suffix">ing</m> </w> |
| pos | (part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).Status | Optional | Datatype | teidata.text | The German sentence ‘Wir fahren in den Urlaub.’ tagged with the Stuttgart-Tuebingen-Tagset (STTS). <s> <w pos="PPER">Wir</w> <w pos="VVFIN">fahren</w> <w pos="APPR">in</w> <w pos="ART">den</w> <w pos="NN">Urlaub</w> <w pos="$.">.</w> </s> | The English sentence ‘We're going to Brazil.’ tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace). <p><w pos="PNP">We</w><w pos="VBB">'re</w> <w pos="VVG">going</w> <w pos="PRP">to</w> <w pos="NP0">Brazil</w><pc pos="PUN">.</pc></p>
| The English sentence ‘We're going on vacation to Brazil for a month!’ tagged with the CLAWS-7 tagset and arranged sequentially. <p> <w pos="PPIS2">We</w> <w pos="VBR">'re</w> <w pos="VVG">going</w> <w pos="II">on</w> <w pos="NN1">vacation</w> <w pos="II">to</w> <w pos="NP1">Brazil</w> <w pos="IF">for</w> <w pos="AT1">a</w> <w pos="NNT1">month</w> <pc pos="!">!</pc> </p> |
| msd | (morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).Status | Optional | Datatype | teidata.text | <ab> <w pos="PPER" msd="1.Pl.*.Nom">Wir</w> <w pos="VVFIN" msd="1.Pl.Pres.Ind">fahren</w> <w pos="APPR" msd="--">in</w> <w pos="ART" msd="Def.Masc.Akk.Sg">den</w> <w pos="NN" msd="Masc.Akk.Sg">Urlaub</w> <pc pos="$." msd="--">.</pc> </ab> |
| join | when present, provides information on whether the token in question is adjacent to another, and if so, on which side.Status | Optional | Datatype | teidata.text | Legal values are: | - no
- the token is not adjacent to another
- left
- there is no whitespace on the left side of the token
- right
- there is no whitespace on the right side of the token
- both
- there is no whitespace on either side of the token
- overlap
- the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream
| The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join. <s> <pc join="right">"</pc> <w join="left">Friends</w> <w>will</w> <w>be</w> <w join="right">friends</w> <pc join="both">.</pc> <pc join="left">"</pc> </s> Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally. | The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated. <p> <w pos="PNP">We</w> <w pos="VBB" join="left">'re</w> <w pos="VVG">going</w> <w pos="PRP">on</w> <w pos="NN1">vacation</w> <pc pos="PUN" join="left">.</pc> </p> | Note | The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012. |
|
|