Glossary

Data model

Informational representation of the subject area, including the structure of objects and the relationship between them. Data model of Unidata contains the main objects: Entities, Lookup entities, Relations and Classifications.

Entity

Representation of the characteristics of the described object in the form of Records, that has a set of Attributes. The Entity supports the creation of Relations and complex attribute structures, does not use Code attributes. Entity Records can be classified. Main application: storage of Master data (e.g., information about customers, contractors).

Lookup entity

Representation of the characteristics of the described object in the form of Records, that has a set of Attributes. The Records contain the Code attribute. Relations are not available. Main application: storage of Reference data (e.g., dictionaries, regulations, standards).

Cleanse function

Used to perform specified actions on data. Allow to perform Validation and Data Enrichment. By the type of implementation of the function, there are 3 types: Standard, Composite (consists of several standard), Third-party (Jar-file, implemented separately).

Classifier

Basic unit of classification. A systematic description of the concepts of any area of knowledge or human activity. A structure consisting of main and subordinate Nodes, which reflects a variety of concepts and properties. Classifier nodes can contain attributes, which allows to expand the attributive structure of records when determining their belonging to particular class.

Record

A description of an object in the real world, which consists of a set of property values of this object (attributes). E.g.: if the entity describes a set of car attributes, then the record is a description of a particular Audi A7 with attribute values specific to that car.

Attribute
  1. Feature of an object. A record consists of a number of attributes, each attribute is represented by a name of the feature and its value (e.g., “Country of origin - Denmark”).

  2. Part of the data model objects: Entities/Lookup entities, Relations, Classifiers. There are several attribute types each has its own corresponding data.

Simple attribute

An attribute containing a single value. E.g. gender, age, weight.

Complex attribute

Attribute with a number of Simple attributes of various types (e.g. Contacts > phone number, e-mail).

Array attribute

An attribute that can contain 2 or more values of the same type. E.g. a list of addresses.

Code attribute

Attribute used to identify records in the Lookup entity.

Data quality

How valid is the data: how many errors, discrepancies, contradictions are there in the data, etc.

Validation
  1. Data verification for conformity with the specified quality requirements in order to identify meaningless, uninformative, incorrect or erroneous data. (E.g., removal of clearly fake email addresses.)

  2. Applying Data quality rules to the record attribute, and checking the attribute value for compliance with the rules.

Data enrichment

Applying Data quality rules to the record attribute in order to change the data in a way to fill it in a unified manner or supplement with new information (attributes).

Data cleansing

An action taken on records that converts data to a single standard format based on specified criteria. General term for Data enrichment and Validation.

Data quality rule

Set of data processing actions that allows using previously created data processing functions to individual record attributes or their specific values. There are 2 possible modes: Validation, Data enrichment. Rules can be assigned to an Entity, Lookup entity, Classifier.

Etalon record

Record that was created by combining all records from various sources and/or by applying data quality rules. This record can be considered relevant and does not contain errors, discrepancies, etc.

Original record

Full information about the record. May contain Record cards with which the current record was combined, attribute values, relations, system information.

Unit

Description of the measured quantities and parameters (e.g., amount, currency, weight).

Base unit

Unit of measure that defines all measured quantities to be converted to a common form according to specified rules (e.g.: basic weight unit - kilogram).

Data source
  1. Third-party information systems that contain data to be used by the Unidata platform as a source data.

  2. Description of third-party systems in the Unidata platform. Sources will have a name, weight, and a description. This description will be used later with other platform operations (e.g., when setting Quality rules).