ABOUT DATA DICTIONARIES AND SUCH

April 1997



Burton G. Parker

Paladin Integration Engineering









1 Introduction

1.1 Background

At the January meeting of the Council of Standards Organizations (CSO), discussion of the ITS data dictionary environment and the ITS Data Dictionary Standard being developed under the auspices of the IEEE resulted in a request by the CSO for a white paper on data dictionaries, data registries, 'core' data, and other data management related topics. This paper is in answer to that request.

1.2 Purpose

The purpose of this paper is to provide the reader with a non-technical understanding of the nature a data registry, data dictionary, other related data management tools, core data, and their relationships in the ITS data dictionary environment. The nature of data, metadata, and their relationships is also described.

2 Data

Crucial to discussions of data management and data management tools is the establishment of a common understanding of the nature of data.

English language dictionaries define data in terms such as "…known facts or things used as a basis for inference or reckoning…"1, typically (in modern usage) operated upon or manipulated by computers, or "…factual information used as a basis for discussion, reasoning, or calculation"2. In data management, data is defined as "(1) (ISO) A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means."3

Thus, data are sets of basic factual information. A convenient way to understand data important to data management is in terms of three categories of data: Data Assets, Data Engineering Assets, and Data Management Assets4.

2.1 Data assets

"Data needed to accomplish enterprise missions and functions…".4 Data assets include data elements, business rules, data records, files, databases, data warehouses, reports, computer screen displays, and the like. Such data includes information about employees, products, marketing, finance, and the like. It is used for understanding, and making decisions about, the business activities of the enterprise.

2.2 Data engineering assets

"Data about data (metadata) needed to accomplish data engineering activities…"4 . Data engineering assets include information about the data assets of the enterprise, such as, data element definitions, data models, database designs, data dictionaries, repositories, and the like. It is used for understanding, designing, and building the data environment of the enterprise.

2.3 Data management assets

"Data needed to establish and control enterprise data-oriented activities…".4 Data management assets include information about goals, policies, standards, plans, budgets, metrics, and the like. It is used to guide, create, and maintain the data management infrastructure of the enterprise. These sorts of data (meta-meta-data) are, at least as yet, beyond the scope of any ITS data dictionary and are, therefore, not discussed further in this paper.

2.3 "Core" data

Sometimes, certain data is considered "core" data of the enterprise. Two different understandings of what core data consists of is common in data management practice.

One sense of core data is that of certain business data elements of interest to all functional components of the enterprise. Such data would be a subset of data in the data asset category. An example of core data of this type might be a set of data about the customers of an enterprise that is shared in common across all activities of the enterprise.

Another sense of core data is that of the standardized or preferred forms for representing data. Such data are NOT data elements; but forms of representations of data element. Thus, these data are, from the above definitions, data in the data engineering asset category. An example of this sense of core data would be the way the enterprises elects to represent ANY data element including the representation of date or time, e.g., YYYYMMDD for date and HHMMSS for time.

Both uses of the term core data occur in the ITS community, depending upon which ITS group one is talking with, although the latter sense of 'core data' appears more prevalent.

A RECOMMENDATION: Use the term "explicit value domains" for standard or preferred forms for representing types or classes of data element representations and use the term "common data" for business data elements shared across ITS functional areas.

3 Metadata management

Modern data management practice depends upon managing the metadata of the enterprise, i.e., the data engineering assets. Metadata management facilitates creation and use of quality data. Older practices of directly managing only instances of data (data element values) have proven laborious, inefficient, and less than effective. Example: Correcting instance values of data elements in data stores is typically a temporary solution, since the next update of the data store is likely to reintroduce 'bad' data. Alternatively, correcting the metadata about the data elements as to preferred source, format, etc. will lead to correction of 'bad' data at its source and thus prevent 're-infection' of the data store.

A number of data management tools are available to facilitate metadata management: DBMSs, data dictionaries, data directories, data encyclopedias, data registries, and repositories. While commercially available tools such as these rarely limit their functionality solely to one of these conceptual categories, the categories are discussed individually to illustrate the essential purposes of each type of tool.

3.1 Database management system (DBMS)

A DBMS is a software environment specifically designed to assist in the design, construction, operation, use, and tuning of data stores (e.g., databases). Examples of DBMS software include Oracle, DB2, and Ingress (to mention only a few among the many). The basic purpose of a DBMS is to organize, store, and provide access to instance values of data elements. A DBMS is a tool used by database administrators (DBAs).

3.2 Data dictionary

A data dictionary is a software tool specifically designed to assist in the documentation of data element syntax (representational form) and some semantics (usually only name and definition), independently from the data store in which their instance values are stored. A data dictionary may be a stand-alone tool (such as PC Dictionary), or a component of a DBMS. A data dictionary is a tool typically used by database administrators (DBAs) (if it is part of a DBMS) or a data administrator (if it is a stand-alone tool).

3.3 Data directory

A data directory is a software tool specifically designed to assist in the documentation of "where-used" data element locations, independently from the data stores in which their instance values are stored. A data directory may be a stand-alone tool (such as Easy View), or a component of a DBMS. Stand-alone tools typically provide mappings of which databases contain a particular data element, or in other words, logical locations of data elements. Data dictionaries that are components of a DBMS typically provide physical addresses of where data elements are located in a database. A data directory is a tool typically used by database administrators (DBAs) (if it is part of a DBMS) or a data administrator (if it is a stand-alone tool).

3.3 Data encyclopedia

A data encyclopedia is a software tool specifically designed to provide both data dictionary and at least logical data directory functionality to the user. Additionally, it will typically provide additional contextual information about data elements, such as data element classification structures, thesaurus or glossary capabilities, mappings of data elements to data models and the data models themselves, and the like. Note, however, that the data encyclopedia is very narrowly focused: toward particular system(s) or applications. A data encyclopedia may be a stand-alone tool, or (more typically) a component of a Computer Aided Software Environment (CASE) tool, such as ADW, Bachman, or IEF tools. The data encyclopedia is not focused upon enterprise-wide representations of information structures of the enterprise. A data encyclopedia is a tool typically used by a data analyst or data administrator involved in system data modeling and database development.

3.4 Data element registry

A data element registry is a software tool specifically designed to combine capabilities of the data dictionary, logical data directory, and specific contextual documentation capabilities of the data encyclopedia for the purpose of facilitating enterprise-wide understanding, standardization, and reuse of enterprise data elements, as well as facilitating interchange of enterprise data elements among enterprise information trading partners. A data registry is a tool typically used by data administrators involved in system data modeling and database development within a coherent, overall, enterprise data environment.

The data element registry is a relatively new concept developed by an ISO/IEC JTC1 standards body, and formalized in the ISO 11179 standard, Specification and Standardization of Data Elements. This standard provides the most comprehensive set of metadata requirements available for documenting data elements and for progressing specific data elements as enterprise standard or preferred data elements. No commercially available data element registries yet exist. Data element registries are being developed by organizations such as Bellcore, the Environmental Protection Agency, and the U.S. Census Bureau.

The data element registry addresses a major fundamental portion of the data management requirement. However, a general consensus is developing in the ANSI accredited standards committee X3L8, Data Representation (which held editorship for five of the six parts of ISO 11179), that ISO 11179 concepts of the data element registry need to be extended to that of a data registry. The data registry is intended to document and register not only data elements but their reusable components as well, and, additionally, provide structured access to the semantic content of the registry. Such reusable components include data element concepts and data element value domains (both of which are subjects of current X3L8 working papers), and also (potentially) object classes, property classes, and generic property domains. The extension of data element registry concepts is proceeding within the overall framework of the X3L8 Metamodel for Management of Sharable Data, ANSI X3.285-1997, which explains and defines terms.

3.5 Repository

A repository is a software tool specifically designed to document and maintain all informational representations of the enterprise and its activities, that is data-oriented representations, software representations, systems representation, hardware representation, etc.. Repositories typically include the functionality of data dictionaries, data directories, and data encyclopedias; but, they necessarily also extend dictionary, directory, and encyclopedia functionality to the documentation of the enterprise's existing and planned applications, systems, process models, hardware environment, organizational structure, strategic plans, implementation plans, and all other IT and business representations of the informational aspects of the enterprise. Commercially available repositories include Rhochade, MSP, Platinum, and Transtar offerings.

4 ITS 'data dictionary' environment

As the ITS data dictionary environment is evolving, three distinct 'levels' of data dictionaries are discernible: application systems' data dictionaries, functional area data dictionaries, and The ITS 'data dictionary'.

4.1 Application data dictionary

The application systems of ITS have and manipulate data relevant to the purposes of the systems. Typically, the data elements used in each system will be recorded and documented in a data dictionary of some sort. Normally, such a data dictionary will contain only data elements used in achieving the functions of the system. Often the application data dictionary will be part of the DBMS software platform upon which the application's database is built. The purpose of these data dictionaries is to understand and document the data elements of the systems with which they are associated.

4.2 Functional area data dictionaries

In several functional areas of the ITS, efforts are underway to record and document the data elements of the applications systems that accomplish the purposes of the functional area. Examples of these data dictionaries include the Traffic Management Data Dictionary and Advanced Public Transportation Systems Data Dictionary. These dictionaries contain data element information relevant to the functional area, from all or most of the application systems supporting the functions in the functional area. The purpose of these data dictionaries is to understand and document the data elements pertinent to the functional area, promote and reuse of data elements among functional area systems, and facilitate data interchange among the application systems of the functional area.

4.3 The ITS Data Dictionary/Registry

The need to record and document data elements across ITS functional areas has been identified. The ITS data dictionary, with data registry capabilities, will serve this need. This dictionary will contain data element information from all the functional areas. The purpose of this data dictionary is to understand and document the data elements pertinent to the entire ITS enterprise, promote standardization or preferential identification of, and reuse of, data elements across functional areas. In short, The ITS Data Dictionary/Registry will facilitate data interchange among the application systems of all the functional areas of ITS.

4.4 Inter-relationships among ITS data dictionary components

The current vision of the ITS-DD/MST working group is that the application level data dictionaries (those closely associated with application systems of ITS developers and vendors) are the originating sources of all ITS data elements. These data elements are anticipated to be collected and documented within appropriate functional area data dictionaries in order that they may be shared, especially across applications of a functional area, but also AMONG the various functional area data dictionaries.

To achieve this goal, these application-oriented data elements, as recorded and documented within appropriate functional area data dictionaries, will be collected and documented in a more semantically robust form in The ITS Data Dictionary/Registry. Having these data elements from all functional area data dictionaries (and, therefore, from all relevant application level data dictionaries) in a central ITS Data Dictionary/Registry will promote the understanding and reuse of appropriate data elements across ALL functional areas of the ITS enterprise. This objective will be facilitated by the semantic structures of The ITS Data Dictionary/Registry, which should not be expected to be present in the functional area data dictionaries, since they should be more oriented to the data dictionary with data directory functionality described above. The ITS Data Dictionary/Registry is expected to include the functionality of the 'data encyclopedia' and 'data registry' described above, as well as certain of the functionality of the 'repository' as described above, at least as it relates to data. Thus, The ITS Data Dictionary/Repository will be the key to inter-operability and interchange of data among ITS application systems.

References

  1. The Oxford Dictionary and Thesaurus, Oxford University Press, 1993, [ISBN 0-89577-894-7]
  2. Compton's Interactive Encyclopedia, SoftKey Multimedia, Inc, 1996.
  3. American National Standard for Information Systems-Dictionary for Information Systems, ANSI X3.172-1990, American National Standards Institute, 1990.
  4. Parker, B., and L. Chambless, D. Duvall, D. Satterthwaite, D. Smith, Data Management Capability Maturity Model, MITRE, MP 95W0000089, March 1995.

IEEE Stds Home Page  -- Sue Vogel, Staff