At the January meeting of the Council of Standards Organizations
(CSO), discussion of the ITS data dictionary environment and the
ITS Data Dictionary Standard being developed under the auspices
of the IEEE resulted in a request by the CSO for a white paper
on data dictionaries, data registries, 'core' data, and other
data management related topics. This paper is in answer to that
request.
Crucial to discussions of data management and data management
tools is the establishment of a common understanding of the nature
of data.
English language dictionaries define data in terms such as "
known
facts or things used as a basis for inference or reckoning
"1,
typically (in modern usage) operated upon or manipulated by computers,
or "
factual information used as a basis for discussion,
reasoning, or calculation"2. In data management,
data is defined as "(1) (ISO) A representation of facts,
concepts, or instructions in a formalized manner suitable for
communication, interpretation, or processing by humans or by automatic
means."3
Thus, data are sets of basic factual information. A convenient
way to understand data important to data management is in terms
of three categories of data: Data Assets, Data Engineering Assets,
and Data Management Assets4.
"Data needed to accomplish enterprise missions and functions
".4
Data assets include data elements, business rules, data
records, files, databases, data warehouses, reports, computer
screen displays, and the like. Such data includes information
about employees, products, marketing, finance, and the like. It
is used for understanding, and making decisions about, the business
activities of the enterprise.
"Data about data (metadata) needed to accomplish data engineering
activities
"4 . Data engineering assets include
information about the data assets of the enterprise, such as,
data element definitions, data models, database designs, data
dictionaries, repositories, and the like. It is used for understanding,
designing, and building the data environment of the enterprise.
"Data needed to establish and control enterprise data-oriented
activities
".4 Data management assets include
information about goals, policies, standards, plans, budgets,
metrics, and the like. It is used to guide, create, and maintain
the data management infrastructure of the enterprise. These sorts
of data (meta-meta-data) are, at least as yet, beyond the scope
of any ITS data dictionary and are, therefore, not discussed further
in this paper.
Sometimes, certain data is considered "core" data of
the enterprise. Two different understandings of what core data
consists of is common in data management practice.
One sense of core data is that of certain business data elements
of interest to all functional components of the enterprise. Such
data would be a subset of data in the data asset category. An
example of core data of this type might be a set of data about
the customers of an enterprise that is shared in common across
all activities of the enterprise.
Another sense of core data is that of the standardized or preferred
forms for representing data. Such data are NOT data elements;
but forms of representations of data element. Thus, these data
are, from the above definitions, data in the data engineering
asset category. An example of this sense of core data would be
the way the enterprises elects to represent ANY data element including
the representation of date or time, e.g., YYYYMMDD for date and
HHMMSS for time.
Both uses of the term core data occur in the ITS community, depending
upon which ITS group one is talking with, although the latter
sense of 'core data' appears more prevalent.
A RECOMMENDATION: Use the term "explicit value domains"
for standard or preferred forms for representing types or classes
of data element representations and use the term "common
data" for business data elements shared across ITS
functional areas.
Modern data management practice depends upon managing the metadata
of the enterprise, i.e., the data engineering assets. Metadata
management facilitates creation and use of quality data. Older
practices of directly managing only instances of data (data element
values) have proven laborious, inefficient, and less than effective.
Example: Correcting instance values of data elements in data stores
is typically a temporary solution, since the next update of the
data store is likely to reintroduce 'bad' data. Alternatively,
correcting the metadata about the data elements as to preferred
source, format, etc. will lead to correction of 'bad' data at
its source and thus prevent 're-infection' of the data store.
A number of data management tools are available to facilitate
metadata management: DBMSs, data dictionaries, data directories,
data encyclopedias, data registries, and repositories. While commercially
available tools such as these rarely limit their functionality
solely to one of these conceptual categories, the categories are
discussed individually to illustrate the essential purposes of
each type of tool.
A DBMS is a software environment specifically designed to assist
in the design, construction, operation, use, and tuning of data
stores (e.g., databases). Examples of DBMS software include Oracle,
DB2, and Ingress (to mention only a few among the many). The basic
purpose of a DBMS is to organize, store, and provide access to
instance values of data elements. A DBMS is a tool used by database
administrators (DBAs).
A data dictionary is a software tool specifically designed to
assist in the documentation of data element syntax (representational
form) and some semantics (usually only name and definition), independently
from the data store in which their instance values are stored.
A data dictionary may be a stand-alone tool (such as PC Dictionary),
or a component of a DBMS. A data dictionary is a tool typically
used by database administrators (DBAs) (if it is part of a DBMS)
or a data administrator (if it is a stand-alone tool).
A data directory is a software tool specifically designed to assist
in the documentation of "where-used" data element locations,
independently from the data stores in which their instance values
are stored. A data directory may be a stand-alone tool (such as
Easy View), or a component of a DBMS. Stand-alone tools typically
provide mappings of which databases contain a particular data
element, or in other words, logical locations of data elements.
Data dictionaries that are components of a DBMS typically provide
physical addresses of where data elements are located in
a database. A data directory is a tool typically used by database
administrators (DBAs) (if it is part of a DBMS) or a data administrator
(if it is a stand-alone tool).
A data encyclopedia is a software tool specifically designed to
provide both data dictionary and at least logical data directory
functionality to the user. Additionally, it will typically provide
additional contextual information about data elements, such as
data element classification structures, thesaurus or glossary
capabilities, mappings of data elements to data models and the
data models themselves, and the like. Note, however, that the
data encyclopedia is very narrowly focused: toward particular
system(s) or applications. A data encyclopedia may be a stand-alone
tool, or (more typically) a component of a Computer Aided Software
Environment (CASE) tool, such as ADW, Bachman, or IEF tools. The
data encyclopedia is not focused upon enterprise-wide representations
of information structures of the enterprise. A data encyclopedia
is a tool typically used by a data analyst or data administrator
involved in system data modeling and database development.
A data element registry is a software tool specifically designed
to combine capabilities of the data dictionary, logical data directory,
and specific contextual documentation capabilities of the data
encyclopedia for the purpose of facilitating enterprise-wide understanding,
standardization, and reuse of enterprise data elements, as well
as facilitating interchange of enterprise data elements among
enterprise information trading partners. A data registry is a
tool typically used by data administrators involved in system
data modeling and database development within a coherent, overall,
enterprise data environment.
The data element registry is a relatively new concept developed
by an ISO/IEC JTC1 standards body, and formalized in the ISO 11179
standard, Specification and Standardization of Data Elements.
This standard provides the most comprehensive set of metadata
requirements available for documenting data elements and for progressing
specific data elements as enterprise standard or preferred data
elements. No commercially available data element registries yet
exist. Data element registries are being developed by organizations
such as Bellcore, the Environmental Protection Agency, and the
U.S. Census Bureau.
The data element registry addresses a major fundamental portion
of the data management requirement. However, a general consensus
is developing in the ANSI accredited standards committee X3L8,
Data Representation (which held editorship for five of the six
parts of ISO 11179), that ISO 11179 concepts of the data element
registry need to be extended to that of a data registry. The data
registry is intended to document and register not only data elements
but their reusable components as well, and, additionally, provide
structured access to the semantic content of the registry. Such
reusable components include data element concepts and data element
value domains (both of which are subjects of current X3L8 working
papers), and also (potentially) object classes, property classes,
and generic property domains. The extension of data element registry
concepts is proceeding within the overall framework of the X3L8
Metamodel for Management of Sharable Data, ANSI X3.285-1997,
which explains and defines terms.
A repository is a software tool specifically designed to document
and maintain all informational representations of the enterprise
and its activities, that is data-oriented representations, software
representations, systems representation, hardware representation,
etc.. Repositories typically include the functionality of data
dictionaries, data directories, and data encyclopedias; but, they
necessarily also extend dictionary, directory, and encyclopedia
functionality to the documentation of the enterprise's existing
and planned applications, systems, process models, hardware environment,
organizational structure, strategic plans, implementation plans,
and all other IT and business representations of the informational
aspects of the enterprise. Commercially available repositories
include Rhochade, MSP, Platinum, and Transtar offerings.
As the ITS data dictionary environment is evolving, three distinct
'levels' of data dictionaries are discernible: application systems'
data dictionaries, functional area data dictionaries, and The
ITS 'data dictionary'.
The application systems of ITS have and manipulate data relevant
to the purposes of the systems. Typically, the data elements used
in each system will be recorded and documented in a data dictionary
of some sort. Normally, such a data dictionary will contain only
data elements used in achieving the functions of the system. Often
the application data dictionary will be part of the DBMS software
platform upon which the application's database is built. The purpose
of these data dictionaries is to understand and document the data
elements of the systems with which they are associated.
In several functional areas of the ITS, efforts are underway to
record and document the data elements of the applications systems
that accomplish the purposes of the functional area. Examples
of these data dictionaries include the Traffic Management Data
Dictionary and Advanced Public Transportation Systems Data Dictionary.
These dictionaries contain data element information relevant to
the functional area, from all or most of the application systems
supporting the functions in the functional area. The purpose of
these data dictionaries is to understand and document the data
elements pertinent to the functional area, promote and reuse of
data elements among functional area systems, and facilitate data
interchange among the application systems of the functional area.
The need to record and document data elements across ITS functional
areas has been identified. The ITS data dictionary, with data
registry capabilities, will serve this need. This dictionary will
contain data element information from all the functional areas.
The purpose of this data dictionary is to understand and document
the data elements pertinent to the entire ITS enterprise, promote
standardization or preferential identification of, and reuse of,
data elements across functional areas. In short, The ITS Data
Dictionary/Registry will facilitate data interchange among the
application systems of all the functional areas of ITS.
The current vision of the ITS-DD/MST working group is that the application level data dictionaries (those closely associated with application systems of ITS developers and vendors) are the originating sources of all ITS data elements. These data elements are anticipated to be collected and documented within appropriate functional area data dictionaries in order that they may be shared, especially across applications of a functional area, but also AMONG the various functional area data dictionaries.
To achieve this goal, these application-oriented data elements, as recorded and documented within appropriate functional area data dictionaries, will be collected and documented in a more semantically robust form in The ITS Data Dictionary/Registry. Having these data elements from all functional area data dictionaries (and, therefore, from all relevant application level data dictionaries) in a central ITS Data Dictionary/Registry will promote the understanding and reuse of appropriate data elements across ALL functional areas of the ITS enterprise. This objective will be facilitated by the semantic structures of The ITS Data Dictionary/Registry, which should not be expected to be present in the functional area data dictionaries, since they should be more oriented to the data dictionary with data directory functionality described above. The ITS Data Dictionary/Registry is expected to include the functionality of the 'data encyclopedia' and 'data registry' described above, as well as certain of the functionality of the 'repository' as described above, at least as it relates to data. Thus, The ITS Data Dictionary/Repository will be the key to inter-operability and interchange of data among ITS application systems.
References
IEEE Stds Home Page
Sue Vogel, Staff