This document describes the lexicography module of the Lexicon Model for Ontologies (lemon) as a result of the work of the Ontology Lexicon community group (OntoLex). The module is targeted at the representation of dictionaries and any other linguistic resource containing lexicographic data, and addresses structures and annotations commonly found in lexicography. This module operates in combination with the lemon core module, referred to as OntoLex.
The RDF file with the OntoLex lemon lexicography module can be found at http://www.w3.org/ns/lemon/lexicog
There are a number of ways that one may participate in future developments of this report:
The lemon model provides a core vocabulary (OntoLex) to represent linguistic information related to ontology and vocabulary elements. The model follows the principle of semantics by reference in the sense that the semantics of a lexical entry is expressed by reference to an individual, class or property defined in an ontology.
The current version of lemon (as an outcome of the OntoLex group, sometimes referred as OntoLex-lemon in the literature) as well as its previous version (lemon [1]) have been increasingly used in the context of work in dictionaries and lexicographical data to convert existent lexicographic information into the standards and formats of the Semantic Web. Such preliminary experiences comprise monolingual [2], bilingual [3], and multilingual [4] dictionaries, as well as diachronic [5], dialectal [6], and etymological ones [7], among others. The added value of using linked data technologies in lexicography and its implications for the micro and macro structure of dictionaries have been explored as well by several authors [e.g., 8, 9].
After analysing the literature, the proposers of this module perceived a strong need for reaching some agreement that allows for a better and more inter-operable migration of existing dictionaries into linked data [10]. For illustration, the Oxford Global Language ontology [11] has its own notion of dictionary entry materialised in the ogl:Entry class, while in [4] the ad-hoc kd:dictionaryEntry relation was introduced in the conversion of the multilingual Global Series of KDictionaries, i.e, different researchers introduced their own modelling solutions to account for similar notions. Interoperability is a key issue in linked data technologies, thus building a common space in which these concepts can be agreed upon and commonly defined is a logical next step. The OntoLex community is the natural forum to accomplish this for several reasons:
The main goal of this module is to complement the lemon core module, OntoLex, and to overcome its limitations when modelling lexicographic information as linked data in a way that is agnostic to the underlying lexicographic view and minimises information loss.
The scope of the model is two-fold:
In terms of applying the module, we propose the following best practice or "rule of thumb" when representing a dictionary as linked data:
OntoLex lexicography module:
@prefix lexicog: <http://www.w3.org/ns/lemon/lexicog#> .
OntoLex (core) model and other lemon modules:
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix vartrans: <http://www.w3.org/ns/lemon/vartrans#> .
@prefix lime: <http://www.w3.org/ns/lemon/lime#> .
Other models :
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
The class lime:Lexicon has been used in the literature as a natural way of modelling a collection of dictionary entries. However, there are certain situations (see [10]) in which there is no one-to-one mapping between the entries in a dictionary, or lexicographic resource in general, and the lexical entries in a lemon lexicon (lime:Lexicon). For instance, some dictionary entries contain information about their translations and synonyms, though the latter might not have their own dictionary entry in the source dictionary. Still, such translations and synonyms can be treated as "first class citizens" in the RDF graph by representing them as lemon lexical entries with ontolex:LexicalEntry. In such situations, there is a disparity between the ensuing lemon lexicon and the original dictionary, as the linked data representation would include entries not encoded in the original resource as dictionary entries.
Thus, the class Lexicographic Resource is intended to represent the original resource, in order to keep track of the collection of lexicographic entries contained in it. This does not replace, but complements, the use of lime:Lexicon, which will include the lexical entries explicitly declared in the resource along with other ones (e.g., synonyms, translations)
SubClassOf: void:Dataset, dc:language min 1
The class Entry is introduced to represent the entry as it is encoded in a lexicographic resource, e.g. a dictionary entry, thereby reflecting the arrangement decided upon by the lexicographer (supported by the occurrences of the word or phrase in the corpus used for dictionary compilation). This class is thus intended to be the counterpart of the lemon lexical entry (ontolex:LexicalEntry) in the linked data representation of a dictionary, and fulfills a structural function.
An entry is a structural element that represents a lexicographic article or record as it is arranged in a source lexicographic resource. As such, it supports the description of lexical entries or senses according to the lexicographic micro-structure, decided upon during a lexicographic resource compilation process.
SubClassOf: Lexicographic Component
The property entry relates a lexicog:LexicographicResource to a lexicog:Entry.
Domain: LexicographicResource
Range: Entry
A simple example of an entry belonging to a lexicographic resource in English is given below. In this example, a lexicographic record for the word animal groups (under the same dictionary entry) descriptions for the lexical entries animal (n.), and animal (adj.), which both share phonetic representation and etymology. The example has been extracted from the American Heritage Dictionary [16]:
an·i·maln.
1. Any of numerous multicellular eukaryotic organisms of the kingdom Metazoa (or Animalia) [...]
2. An animal organism other than a human, especially a mammal.
[...]
adj.
1. Relating to, characteristic of, or derived from an animal or animals, especially when not human: animal cells; animal welfare.
2. Relating to the physical as distinct from the rational or spiritual nature of people: animal instincts and desires.
In this example, a dictionary provides one single dictionary entry for the word animal, comprising its noun and adjective definitions under one same structure. This is not compliant with the definition of ontolex:LexicalEntry, which requires a single part of speech. Therefore, a lime:Lexicon accounting for the lexical information in this dictionary will contain two lexical entries, one per part of speech. On the other hand, an instance of the Lexicographic Resource class will gather entities as they were presented in the original dictionary in order to keep track of the original representation. In this simplified example, animal will be the only entry of the lexicographic resource:
# LEXICOGRAPHIC RESOURCE
:myDictionary a lexicog:LexicographicResource ;
dc:language "en" ;
lexicog:entry :animal_entry .
:animal_entry a lexicog:Entry .
# LEXICON
:myLexicon a lime:Lexicon;
lime:language "en" ;
lime:entry :animal_n, :animal_adj .
:animal_n a ontolex:LexicalEntry .
:animal_adj a ontolex:LexicalEntry .
A lexicographic component is a structural element in the lexicographic resource that represents either a lexicographic record or any other sub-structure to refer to senses, sense groups, or nested entries. Lexicographic components do not necessarily describe lexical data but can often fulfill a grouping function to gather other components that do. Entry is a particular subclass of Lexicographic Component used to represent the main "entry point" in the dictionary, i.e., the headword or the root of the lexicographic record. The lexicographic components can describe lexical senses (ontolex:LexicalSense) or lexical entries (ontolex:LexicalEntry), depending on the arrangement in the original resource. Components can in turn be arranged in a specific order and/or hierarchy or just be declared to be part of an entry.
A lexicographic component is a structural element that represents the (sub-)structures of lexicographic articles providing information about entries, senses or sub-entries. If desired, lexicographic components can be arranged in a specific order and/or hierarchy.
SubClassOf: owl:Thing
# LEXICOGRAPHIC RESOURCE
:myDictionary a lexicog:LexicographicResource;
dc:language "en" ;
lexicog:entry :animal_entry .
:animal_entry a lexicog:Entry ;
rdfs:member :animal_n_comp, :animal_adj_comp .
:animal_n_comp a lexicog:LexicographicComponent .
:animal_adj_comp a lexicog:LexicographicComponent .
# LEXICON
:myLexicon a lime:Lexicon;
lime:language "en" ;
lime:entry :animal_n, :animal_adj .
:animal_n a ontolex:LexicalEntry .
:animal_adj a ontolex:LexicalEntry .
The property describes relates a lexicographic component to an element that represents the actual information provided by that component in the lexicographic resource. In most cases, this information will be lexical, and hence the object of the property will be an instance of ontolex:LexicalEntry or ontolex:LexicalSense.
Domain: LexicographicComponent
Range: owl:Thing
# LEXICOGRAPHIC RESOURCE
:myDictionary a lexicog:LexicographicResource;
dc:language "en" ;
lexicog:entry :animal_entry .
:animal_entry a lexicog:Entry ;
rdfs:member :animal_n_comp, :animal_adj_comp .
:animal_n_comp a lexicog:LexicographicComponent .
:animal_adj_comp a lexicog:LexicographicComponent .
# LEXICON
:myLexicon a lime:Lexicon;
lime:language "en" ;
lime:entry :animal_n, :animal_adj .
:animal_n a ontolex:LexicalEntry .
:animal_adj a ontolex:LexicalEntry .
# LEXICOGRAPHIC RESOURCE - LEXICON RELATIONS
:animal_n_comp lexicog:describes :animal_n .
:animal_adj_comp lexicog:describes :animal_adj .
Note that this only states that there is a structure (entry) which contains sub-structures (components) which describe lexical elements, in this case, lexical entries. In case that this arrangement is not present in the dictionary or that there is no need to represent it in RDF (e.g., animal (adj.) and animal (n.) appear in different records) then the instantiation of lexicog:Entry without sub-components would suffice to keep track of the collection of lexicographic entries contained in the resource.
# LEXICOGRAPHIC RESOURCE
:myDictionary a lexicog:LexicographicResource;
dc:language "en" ;
lexicog:entry :animal_entry .
:animal_entry a lexicog:Entry ;
rdfs:member :animal_n_comp, :animal_adj_comp .
:animal_n_comp
rdf:_1 :animal_n_sense_1_comp ;
rdf:_2 :animal_n_sense_2_comp .
:animal_adj_comp
rdf:_1 :animal_adj_sense_1_comp ;
rdf:_2 :animal_adj_sense_2_comp .
:animal_n_comp a lexicog:LexicographicComponent .
:animal_adj_comp a lexicog:LexicographicComponent .
:animal_n_sense_1_comp a lexicog:LexicographicComponent .
:animal_n_sense_2_comp a lexicog:LexicographicComponent .
:animal_adj_sense_1_comp a lexicog:LexicographicComponent .
:animal_adj_sense_2_comp a lexicog:LexicographicComponent .
# LEXICON
:myLexicon a lime:Lexicon;
lime:language "en" ;
lime:entry :animal_n, :animal_adj .
:animal_n a ontolex:LexicalEntry .
:animal_adj a ontolex:LexicalEntry .
:animal_n_sense_1 a ontolex:LexicalSense .
:animal_n_sense_2 a ontolex:LexicalSense .
:animal_adj_sense_1 a ontolex:LexicalSense .
:animal_adj_sense_2 a ontolex:LexicalSense .
# LEXICOGRAPHIC RESOURCE - LEXICON RELATIONS
:animal_n_comp lexicog:describes :animal_n .
:animal_adj_comp lexicog:describes :animal_adj .
:animal_n_sense_1_comp lexicog:describes :animal_n_sense_1 .
:animal_n_sense_2_comp lexicog:describes :animal_n_sense_2 .
:animal_adj_sense_1_comp lexicog:describes :animal_adj_sense_1 .
:animal_adj_sense_2_comp lexicog:describes :animal_adj_sense_2 .
# LEXICOGRAPHIC RESOURCE
:myDictionary a lexicog:LexicographicResource;
dc:language "en" ;
lexicog:entry :animal_entry .
:animal_entry a lexicog:Entry ;
rdfs:member :animal_n_comp, :animal_adj_comp .
:animal_n_comp
rdf:_1 :animal_n_sense_1_comp ;
rdf:_2 :animal_n_sense_2_comp .
:animal_adj_comp
rdf:_1 :animal_adj_sense_1_comp ;
rdf:_2 :animal_adj_sense_2_comp .
:animal_n_comp a lexicog:LexicographicComponent .
:animal_adj_comp a lexicog:LexicographicComponent .
:animal_n_sense_1_comp a lexicog:LexicographicComponent .
:animal_n_sense_2_comp a lexicog:LexicographicComponent .
:animal_adj_sense_1_comp a lexicog:LexicographicComponent .
:animal_adj_sense_2_comp a lexicog:LexicographicComponent .
# LEXICON (lexical entries and lexical senses)
:myLexicon a lime:Lexicon;
lime:language "en" ;
lime:entry :animal_n, :animal_adj .
:animal_n a ontolex:LexicalEntry .
:animal_adj a ontolex:LexicalEntry .
:animal_n_sense_1 a ontolex:LexicalSense .
:animal_n_sense_2 a ontolex:LexicalSense .
:animal_adj_sense_1 a ontolex:LexicalSense .
:animal_adj_sense_2 a ontolex:LexicalSense .
# LEXICON (forms and lexical concepts)
:animal_form a ontolex:Form ;
ontolex:writtenRep "animal"@en .
:animal_n ontolex:lexicalForm :animal_form ;
lexinfo:partOfSpeech lexinfo:noun ;
ontolex:sense :animal_n_sense_1 ;
ontolex:sense :animal_n_sense_2 .
:animal_adj ontolex:lexicalForm :animal_form ;
lexinfo:partOfSpeech lexinfo:adjective ;
ontolex:sense :animal_adj_sense_1 ;
ontolex:sense :animal_adj_sense_2 .
:animal_n_1_concept a ontolex:LexicalConcept ;
skos:definition "Any of numerous multicellular eukaryotic organisms of the kingdom Metazoa (or Animalia)"@en .
:animal_n_2_concept a ontolex:LexicalConcept;
skos:definition "An animal organism other than a human, especially a mammal."@en .
:animal_adj_1_concept a ontolex:LexicalConcept ;
skos:definition "Relating to, characteristic of, or derived from an animal or animals, especially when not human: animal cells; animal welfare."@en .
:animal_adj_2_concept a ontolex:LexicalConcept;
skos:definition "Relating to the physical as distinct from the rational or spiritual nature of people: animal instincts and desires."@en .
:animal_n_sense_1 ontolex:isLexicalizedSenseOf :animal_n_1_concept .
:animal_n_sense_2 ontolex:isLexicalizedSenseOf :animal_n_2_concept .
:animal_adj_sense_1 ontolex:isLexicalizedSenseOf :animal_adj_1_concept .
:animal_adj_sense_2 ontolex:isLexicalizedSenseOf :animal_adj_2_concept .
# LEXICOGRAPHIC RESOURCE - LEXICON RELATIONS
:animal_n_comp lexicog:describes :animal_n .
:animal_adj_comp lexicog:describes :animal_adj .
:animal_n_sense_1_comp lexicog:describes :animal_n_sense_1 .
:animal_n_sense_2_comp lexicog:describes :animal_n_sense_2 .
:animal_adj_sense_1_comp lexicog:describes :animal_adj_sense_1 .
:animal_adj_sense_2_comp lexicog:describes :animal_adj_sense_2 .
# LEXICOGRAPHIC RESOURCE
:myDictionary a lexicog:LexicographicResource ;
dc:language "en" ;
lexicog:entry :animal_entry .
:animal_entry a lexicog:Entry ;
lexicog:subComponent :animal_n_comp ;
lexicog:subComponent :animal_adj_comp .
:animal_n_comp a lexicog:LexicographicComponent .
:animal_adj_comp a lexicog:LexicographicComponent .
# LEXICON
:myLexicon a lime:Lexicon;
lime:language "en" ;
lime:entry :animal_n, :animal_adj .
:animal_n a ontolex:LexicalEntry .
:animal_adj a ontolex:LexicalEntry .
airn. [...]
2. An impression of a quality or manner given by someone or something. [...]
2.1 (airs) An annoyingly affected and condescending manner.
[...]
While there are plural nouns which only occur in plural form (e.g. amends, scissors), the so-called pluralia tantum, and they are usually defined in their own dictionary entry, there is also some variance across lexicographic resources in cases in which the noun also occurs in singular form. For instance, see glasses in Oxford Living Dictionaries, in an entry separate from glass [18], as opposed to its treatment as a sub-sense of a sense of the lemma glass in Merriam Webster Dictionary, with the annotation "glasses plural" [19].
There are dictionaries which provide grammatical annotations indicating varying gender in the senses of the same dictionary entry. For example, in Spanish, the lemma policía can be used in feminine or masculine if denoting a police officer, but only in feminine when denoting the police force or administration. Depending on the criteria followed for headword selection during the compilation of the resource, e.g. etymology, in these cases we may find two independent entries or one single entry with senses having different gender restrictions.
The lexicog:FormRestriction class is intended to provide a way to specify the set of grammatical features of a lexical entry when used in a specific ontolex:LexicalSense in cases in which it does not allow for all of those reflected in the lexical forms provided. Any external catalogue can be used for this purpose. For example, in Spanish, ratón 'mouse', in its meaning of "animal", has a masculine and a feminine form. In its meaning of computer device, small rock or biceps (in Costa Rica), it only occurs as masculine, and therefore these senses would receive a FormRestriction.There are cases in which a specific ontolex:LexicalSense does not allow for all the available ontolex:Form(s) of the ontolex:LexicalEntry. In those cases, the class FormRestriction represents (a set of) grammatical features of the ontolex:Form(s) in which that sense occurs. The sense does not occur in forms whose features do not match with those of such a set.
SubClassOf: owl:Thing
The property restrictedTo relates a LexicalSense to a FormRestriction when a lexicographic resource provides information about the specific morphological features of the ontolex:Form in that sense.
Domain: LexicalSense
Range: FormRestriction
:air_n a ontolex:LexicalEntry ;
ontolex:sense :air_n_sense_2, :air_n_sense_2_1 ;
ontolex:canonicalForm :air_n_form ;
ontolex:otherForm :airs_n_form .
:air_n_form a ontolex:Form ;
ontolex:writtenRep "air"@en ;
lexinfo:number lexinfo:singular .
:airs_n_form a ontolex:Form ;
ontolex:writtenRep "airs"@en ;
lexinfo:number lexinfo:plural .
:air_n_sense_2 a ontolex:LexicalSense ;
ontolex:isLexicalizedSenseOf :air_n_sense_2_lc .
:air_n_sense_2_1 a ontolex:LexicalSense ;
ontolex:isLexicalizedSenseOf :air_n_sense_2_1_lc ;
lexicog:restrictedTo :air_n_formRes .
:air_n_formRes a lexicog:FormRestriction ;
lexinfo:number lexinfo:plural .
:air_n_sense_2_lc a ontolex:LexicalConcept ;
skos:definition "An impression of a quality or manner given by someone or something"@en .
:air_n_sense_2_1_lc a ontolex:LexicalConcept ;
skos:definition "An annoyingly affected and condescending manner"@en .
The class UsageExample represents a textual example of the usage of a sense in a given lexicographic record. A usage example can group several string values, in which case they will encode the same meaning. Thus, if such values are expressed in different languages, they can be interpreted as translations.
SubClassOf: rdf:value min 1 xsd:string, owl:Thing
The property usageExample relates an ontolex:LexicalSense with a lexicog:UsageExample
Domain: LexicalSense
Range: UsageExample
We fist show how to represent the monolingual information only (English).monastery n (monk's residence) [English] monasterio nm [Spanish]
We visited a Buddhist monastery deep in a jungle.
Visitamos un monasterio budista en medio de la selva.
# Entries
:monastery_n_en a ontolex:LexicalEntry ;
ontolex:sense :monastery_n_en_sense .
# Senses
:monastery_n_en_sense a ontolex:LexicalSense ;
ontolex:isLexicalizedSenseOf :monastery_n_en_sense_concept ;
lexicog:usageExample :monastery_n_en_sense_ex .
# Concepts
:monastery_n_en_sense_concept a ontolex:LexicalConcept ;
skos:definition "monk's residence"@en .
# Examples
:monastery_n_en_sense_ex a lexicog:UsageExample;
rdf:value "We visited a Buddhist monastery deep in a jungle."@en.
# Entries
:monastery_n_en a ontolex:LexicalEntry ;
ontolex:sense :monastery_n_en_sense .
:monasterio_n_es a ontolex:LexicalEntry ;
ontolex:sense :monasterio_n_es_sense .
# Senses
:monastery_n_en_sense a ontolex:LexicalSense ;
ontolex:isLexicalizedSenseOf :monastery_n_en_sense_concept ;
lexicog:usageExample :monastery_n_en_sense_ex .
:monasterio_n_es_sense a ontolex:LexicalSense ;
ontolex:isLexicalizedSenseOf :monastery_n_en_sense_concept .
# Concepts
:monastery_n_en_sense_concept a ontolex:LexicalConcept ;
skos:definition "monk's residence"@en .
# Translations
:monastery_n_en_sense-monasterio_n_es_sense-tr a vartrans:Translation ;
vartrans:source :monastery_n_en_sense ;
vartrans:target :monasterio_n_es_sense .
# Examples
:monastery_n_en_sense_ex a lexicog:UsageExample;
rdf:value "We visited a Buddhist monastery deep in a jungle."@en;
rdf:value "Visitamos un monasterio budista en medio de la selva."@es.
A lexicographic resource represents a collection of lexicographic entries (lexicog:Entry) in accord with the lexicographic criteria followed in the development of that resource.