IATE Data fields explained

The IATE data structure is based on a concept-oriented approach. Mono- and multilingual information on each aspect of a concept can be expressed on three inter-related levels of the data structure of the terminological entries:

 

The IATE download file contains the following data fields:

Language Independent Level:

·         Entry ID

·         Subject domain

·         Domain note

Language Level:

·         Language code

Term Level:

·         Term

·         Term type

·         Reliability code

·         Evaluation

     

 

The entry ID

 

The entry ID is the unique identifier of each concept entered in IATE.  In the TBX file this information is included in the “termEntry” element. The numeric ID is prefixed with ‘IATE-‘, e.g:

<termEntry id="IATE-33032">

 

Languages

In the download file languages are represented by their ISO language code, e.g. .: <langSet xml:lang="bg"> indicates a Bulgarian term. Here is the list of all language codes used in IATE:

Language

Language Code

English

en

Bulgarian

bg

Croatian

hr

Czech

cs

Danish

da

German

de

Greek

el

Spanish

es

Estonian

et

Finnish

fi

French

fr

Irish

ga

Hungarian

hu

Italian

it

Lithuanian

lt

Latvian

lv

Maltese

mt

Dutch

nl

Polish

pl

Portuguese

pt

Romanian

ro

Slovak

sk

Slovene

sl

Swedish

sv

Latin

la

Multilingual

mul

 

The language code “mul” (“multilingual”) identified codes or signs that are language-independent (e.g., ISO codes, chemical formulae, certain acronyms and abbreviations)

 

 

Subject domain

Concepts in IATE are linked to specific domains, i.e. the fields of knowledge in which the concept is used. Based on EuroVoc - a multidisciplinary thesaurus covering the activities of the EU - IATE offers 21 subject domains with 2 hierarchically linked sub-levels. The biggest domain clusters are “education and communication (260 000 concepts), “industry” (240 000) and “transport” (160 000) and “law” (120 000). Note that concepts can be linked to several domains.

In the TBX file the domains are represented by their numeric identifiers, e.g.:

        <descrip type="subjectField">6621001, 6826001</descrip>

Please consult the following file for a complete list of domain identifiers and domain names: IATE Domain codes.

 

Domain Note

The domain note gives more specific information on the context in which a concept is used.

In the TBX file the domain note is part of the <descripGrp>, e.g.:

<descripGrp>

           <descrip type="subjectField">4826002</descrip>

           <note>Aviation</note>

 </descripGrp>

 

Term type

Term

One word or a set of words which designate a defined concept in a particular language, or a name.

Abbreviation

Abbreviation, acronym, initialism, contraction or truncation

Phrase

Used for phraseological units that it would be difficult to call “terms” but which nevertheless have a standard translation - and must therefore always be translated the same way - or which repeatedly occur in our texts and pose real translation problems.

Formula

Chemical formulae, mathematical and other scientific expressions, to be written wherever possible in line with the prevailing international standards.

Short Form

For example: the common name of an agreement or the short, unofficial name of a country, etc.; any accepted shorter version of a title or of a name, e.g.:

Term: "United Nations Convention on Contracts for the International Sale of Goods"

Short form: "Vienna Sales Convention"

 

In the TBX file this information is encoded in a term note, i.e.:

<termNote type="termType">fullForm</termNote>

The following table provides a correspondence between the term types used in IATE and the TBX attributes:

Term

fullForm

Abbreviation

abbreviation

Phrase

phraseologicalUnit

Formula

Formula

Short Form

shortForm

 

Reliability code

IATE uses four codes to indicate the reliability of terms.

Code

Description

Explanation

1

Reliability not verified

Automatically assigned to terms entered by non-native speakers

2

Minimum reliability

Automatically assigned to terms entered or updated by native speakers.

3

Reliable

Manually assigned by a terminologist following a reliability assessment. Reliable terms should satisfy at least one of the following criteria:

·         be obtained from a trusted source;

·         be agreed by a representative body of same-language terminologists;

·         be the common designation of the concept in its field.

 

N.B. This code was automatically assigned to many entries, regardless of their previous validation status, following the merger of existing databases to create IATE. Therefore some entries marked as ‘reliable’ are not necessarily so.

4

Very reliable

Manually assigned following a reliability assessment. Very reliable terms are:

·         well-established and widely accepted by experts as the correct designation, or

·         confirmed by a trusted and authoritative source, in particular a reliable written source.

 

Encoding in the TBX file: <descrip type="reliabilityCode">3</descrip>

 

Evaluation of the term

IATE uses four codes to indicate the reliability of terms.

Preferred

A term may be marked as ‘preferred’ because it is intrinsically better than other terms, or simply to ensure consistency in EU texts.

TBX encoding:

Admitted

A term which is correct, but for which better synonyms exist.

Deprecated

A term which is widely used, and is therefore likely to appear in documents, but which should not be used, and should be changed when editing a text.

Obsolete

A term which was previously used to denote a concept, but is no longer in use (e.g. the old 'Bank Identifier Code' is now called the ‘Business Identifier Code’).

 

In the TBX file this information is encoded in a term note, i.e.:

<termNote type="administrativeStatus">deprecatedTerm-admn-sts</termNote>   

The following table provides a correspondence between the evaluation codes used in IATE and the TBX attributes:

Preferred

preferredTerm-admn-sts

Admitted

admittedTerm-admn-sts

Deprecated

deprecatedTerm-admn-sts

Obsolete

supersededTerm-admn-sts