A. ITS SYSTEM AND SUBSYSTEM-RELATED
DOCUMENTS
1. ITS Architecture documents
(softcopies):
1.1 Executive Summary [EXECSUM.DOC]
Note: A preliminary review of
this document was conducted for general background purposes only.
The preliminary review was deemed sufficient.
1.2 National Architecture Vision [NARVSN.DOC]
Note: A preliminary review of
this document was conducted for general background purposes only.
The preliminary review was deemed sufficient.
The ITS architecture is based
on 29 user services, with 19 subsystems in four major areas. It
includes a logical architecture, a physical architecture, and
a mapping. Subsystem interconnections are modeled via data flows.
1.3 Logical Architecture
[LAVOL1.DOC, La_v1c/doc, La_v2.doc, La_v3.doc]
This is an important document.
Although it deals with DD/MST contents, it provides a perspective
on the nature of ITS data and interchange among ITS subsystems.
It is also a potential source for examples for the guidelines
document. La_v3.doc contains the (full) ITS-level data dictionary.
The ITS-level DD dealt with supporting
subsystem-subsystem interfaces. One use of this global DD is for
control, somewhat also as a design specification, and somewhat
also strictly for information or communication purposes. Its form
varies from informal (text-based glossaries) to formal (CASE models).
The CASE models were produced using Cadre's TEAMWORK product.
This allows export of CDIF files, which can then be imported into
RDBMSs such as MS Access. The overall structure is: Transaction;
Data Flow; Data Element. However, the contents are not usually
(or always) limited to data elements. They address three views:
Control, Process and Data. [Note: That implies constraints/rules,
process models and data/information models]. There are 300 processes
in ITS, with 80 DFD's and one global DD. There are 2500 'entries'
in the DD: these define data elements that are interchanged
between two or more of the processes. [Note: These are not
being reviewed in detail as part of this effort, as these represent
DD content, not structure. However, cursory reviews have been
performed in conjunction with reviews of the ITS subsystem documents
noted below, to ensure that the varying nature of ITS data is
taken into account when defining the meta structures for the ITS
DD standard.
1.4 Standards Development
Plan (SDP) [primary document is SDP.DOC].
This document provides a good
background on the areas for standardization within ITS and the
purpose and benefits of doing standardization.
"3. A review of the Logical
and Physical Architecture data dictionaries identify elements
that span multiple physical interfaces and services. These elements
are candidates for foundational standards."
"The approach that has garnered
the most attention and analysis to date is to review each of the
subsystem interfaces defined by the Physical Architecture as potential
candidates for standardization."
"To provide separable sets
of data elements for use by standards organizations, a set of
standards packages has been developed. Eleven packages have been
defined that cover most near-term applications...These packages
select specific interfaces from the architecture that have common
data elements. For example, map databases and location information
are used in multiple places in the architecture. One would expect
that they would all conform to the same standard A standards package
focuses on these data elements wherever they appear in the architecture."
Standards Requirements Packages
[see 1.5 below] have been produced as part of the architecture
as input to the standardization process. The SRPs contain data
dictionary element (DDE) definitions and sizes. These are described
as follows:
"Abstracts that describe
the composite and primitive logical architecture data elements.
This is a brief description of each data item and its use. For
a more in depth examination of a data flow and the functions that
use it, it is necessary to refer to the logical architecture documentation."
Several relevant standards (existing
and in-work) are noted on pages 31-63, including the ITS subsystems
to which they apply. Page 32 notes several data dictionary (content)-related
standards projects.
Towards the end of the document
is table of standards projects, noting the umbrella ITS DD and
MST as being of high-priority and early need.
1.5 Standard Requirements
Package(s) (SRPs) [SRP1 to 11.DOC, plus two other documents]
These documents are basically
a repackaging of material extracted from the ITS architecture
and arranged into logical groupings for standardization of short-term,
high priority areas of the ITS. From the perspective of the DD
and MST standards projects, little or no new information is added
beyond the architecture documents (primarily the logical architecture,
which is of most interest to the DD and MST projects). Data dictionary
element (DDE) definitions are included in the SRPs. These obviously
deal with DD and MST content issues, and can complement the logical
architecture in terms of providing a sense of the nature of the
DD/MST data elements and the nature of subsystem interchange,
and perhaps as a source for examples for the guidelines.
1.6 Traceability Tables from Logical to Physical Architecture [TRACE.DOC].
Note: This is not really relevant,
as physical architecture is not an issue here, and I did not download
any documents related directly to the physical architecture.
2. Commercial Vehicle Information
Systems and Networks (CVISN) Data Dictionary preliminary, 29 February
1996 - (1 hardcopy). Note: This is the one completed ITS subsystem
data dictionary project.
Typical EAR (per clause 1.4.3
on section 1 page 10), supplemented with data flows (and therefore
some level of processes/functions). The data flows have attributes,
too. Note: Does the standard for ITS DDs need to allow mapping
of CS data concepts (within an IS) to particular application systems/applications
not just processes (the processes are CS and are currently the
subject of the data flows). The relationships in the data model
have the full suite of cardinalities. Emphasis is on the attributes
(i.e., data elements), which they define as consisting of "a
definition and the characteristics of the data." There are
also codes representing types of valid contents (i.e., 'system
tables' of types as values for a data element such as "Accident
Parameters", which would include a one or two character code
for the accident type). The tables are stated as having attributes,
too. Applied naming standards to data elements. Keep basic data
administration meta data such as source, format, synonyms, etc.
The data element description is a text block. Used Popkin Software's
CASE tool (System Architect). Didn't address composites, only
atomic data elements. Note in the DD, using asterisks, whether
a data element is a direct/exact use of X12 or D20, versus a specialization/modification/extension
(but don't define the nature of the specialization, etc. if it
is not an exact usage).
The DD contents are voluminous.
There are lot of synonyms, as well as 'repackaging' of the same
basic information into slightly different data structures. There
are some confusing entries. For example: CV_border_clearance_data
is stated as having the purpose of conveying identification numbers
for the carrier, vehicle, driver and manifest to the roadside.
However, the listing of data elements/attributes for this data
structure does not contain this information.
Since it is based on application
systems and reports and the like, one can speculate that it reflects
more of the flavor of COBOL copylib layouts than a true logical,
or even integrated physical, database structure.
Data structures and data flows
have a listing of data elements separated by commas or pluses
at the beginning, followed by an alphabetical listing of the data
elements as attributes on the data structures/data flows. I assume
the sequence of the comma-delimited list is significant: do these
map directly to standard message sets (at least for the data flows)?
The data models contain entities
with attributes just like the data structures/data flows only
without the separate delimited versus alpha listings (just a sequential
listing with one data element per line). These appear to be 'chunkier',
more generic entities versus the packages of more specific data
implied by the data structures and data flows.
3. Advanced Public Transportation
Systems (APTS) documents:
3.1 Advanced Public Transportation Systems (APTS): Evaluation Guidelines (softcopy)
Note: This document is not particularly
relevant. Deals mostly with surveying and test specification,
but some useful background on APTS, and the discussion of APTS
data gives a flavor of the nature of the data and provides a potential
source of examples from the APTS domain.
3.2 Federal Transit Administration
(FTA) - National Transit Geographical Information Systems (GIS)
Manual (softcopy), includes Spatial Data Features Definitions
as precursor to an APTS data dictionary
This document is a good source
for understanding the nature of spatial data that must be able
to be described (at the meta level) and managed in various ITS
DDs (although this particular application is for transit use specifically).
It is based on the FGDC standard.
"Metadata describe the content,
quality, contacts, conditions and other characteristics of a data
set. Many information specialist say that in the computer age,
data is not power, "metadata" is. More than any other
standard, the metadata provides a "roadmap" to information
in a data set. The metadata provides information on the organization
of, maintenance of and investment in data, data catalogs, access
paths, and data transfer. A metadata document on geospatial data
helps people who use geospatial data find the data they need and
determine how best to use that data. To this end, the Federal
Geographic Data Committee (FGDC) developed a Content Standard
for Digital Geospatial Metadata to facilitate access to data inventoried
in the National Geospatial Data Clearinghouse. This standard provides
a format to catalog information about geospatial data sets. Appendix
D contains an adaptation of the Content Standard for Digital Transit
Geospatial Metadata reflecting transit domain considerations.
The appendix describes and explains elements of the FGDC Metadata
standard, and augments the FGDC version with guidelines for transit
features, themes and databases. Metadata documents created for
each data set in the NTG will advance appropriate use of the data,
and those created for data sets submitted to the NTG will improve
the data's integration with the NTG."
It deals mostly with the schemas
required to represent and interchange spatial/GIS data (i.e.,
it is primarily content-related as far as the DD/MTS goes), but
also deals to some degree with the meta structures used for those
schemas.
"Referential Integrity refers
to the accuracy, validity or correctness of the data in meeting
the constraints and rules defined by the internal structure and
content of the data. These rules apply to multiple levels of data
and data base management systems. Tabular data are constrained
by data domain and type rules, and data base management system
rules. The relational database model must be normalized, and referential
integrity constraints ensure that a unique primary key exists
for each table and no foreign key is unmatched to a primary key.
General integrity constraints associated with geographical data
relate to data quality, such as ensuring fundamental relationships
in the graph structure. For example, the sum of the degrees of
the graph vertices equals twice the number of edges, where the
vertex is a node and the edge is the line between two nodes."
Requirements to support transaction
specifications are also discussed.
3.3 APTS Map Database User
Requirements Specification, includes Spatial Features and
Feature Types (hardcopy)
This document gives a good flavor
for the disparity of data to be described and managed in a data
dictionary even just within public transit. There is financial
data like fare collection, and geospatial data like "where
in the world is the bus stop I'm looking for." The geospatial
data raises some questions about what level(s) of this they expect
to put in (various) ITS data dictionaries, and whether that affects
the meta-data structures and relations they need. They talk for
example (on page 5) about a Geographic Layer with Spatial Object
that has examples: Point, Piece, Polygon, Path and Plane; a Transit
Concept Layer that has Spatial Feature with examples: Access Point,
Segment, Transit Route; and a Transit Application Layer that has
Included Term: Bus Stop, Fixed Route, Bicycle Path. They address
Functions, as well as Features (which are entities or objects
in everybody else's terminology) and Attributes, and the mapping
between Functions and Features+Attributes.
3.4 Final Draft White Paper
on The National ITS System Architecture: Data Dictionary for Transit
Use (hardcopy).
This white-paper was written
by Sandia, and esstentially states that the overall ITS data dictionary
portion of the ITS architecture did not take into their early
work on transit-related data, and does not meet the 'requirements'
for APTS. They are very implementation-oriented, and appear to
want common definitions and objects that they can easily reuse
during actual physical system development and implementation,
particularly to support real-time messaging for PT fare collection,
locationing, etc. They complain that the ITS DD (which is part
of the ITS Logical Architecture) addresses only portions of the
'technical definition' of data [elements] (such as size, data
rate, etc.) and nothing in the area of how data are physically
transferred (media, protocol, frequency bands, software standards).
It is apparent from looking at
the logical-to-physical comparison for transit-related data elements
that there are major differences. One difference they point out
is genericity versus specialization: they want common, general
data elements at the physical level, whereas the logical architecture
contains all sorts of specializations of data elements to reflect
their specific use in various data flows (which correspond to
detailed transactions to support particular services). There are
also naming conventions problems (see page 14 for an example),
and there are cases where at least some of the logical data elements
that are mapped to a single, common physical data element don't
appear to be a good (or at least intuitive) match at all.
4. Traffic Management Data Dictionary (TMDD)
Note: The frst internal draft
of the actual TMDD is due in January, but may not be available
in 'public form' until March. In lieu of this material, the following
TMDD-related documents in existence at this time were reviewed:
4.1 Strategic Plan for a Data
Dictionary for Automated Traffic Management Systems (ATMS)
- Some relevant excerpts from the Strategic Plan:
"One of the most important
aspects when designing and implementing systems is the uniform
understanding of the terminology being used. A data dictionary
provides a unique identification and description of the data elements
used in the transmission and communication of messages between
computer systems. For each data element there normally is a description,
size estimate, and listing and description of its critical attributes.
Some dictionaries will include other features such as description
of origin, timing requirements, and valid entries for the data
element.
Perhaps surprisingly, there is
some inconsistency as to the name of an individual data dictionary
entry and at what information level it exists. At various times,
these entries are referred to as data flows, messages, message
sets, tables, data elements, and/or objects. It is commonly understood
that a data dictionary holds data elements, but what appears to
not be always understood is at what level these elements are being
defined. This confusion has been recognized by others and Allan
Kirson is preparing an excellent paper defining terminology and
hierarchical structure for this area of message exchange. For
this report, a data dictionary entry will be called a data element
and will normally represent the smallest variable data unit definable.
That is, a data element will normally consist at the level in
a message where it cannot be further decomposed or subdivided.
The TMDD does not necessarily
seek to identify data terms used only internally within a proprietary
system or its database. Also, it is important to emphasize a data
dictionary in itself does not seek to determine a database design,
file structure or any method of internal storage in a system.
Listing of Data Element Description
for Traffic Records Systems - ANSI D20.1
* Name
* Short Name
* Abbreviation
* Definition
* Sources
* Uses
* Type of Data Element (Basic or Composite)
* Type of Representation (Name, Abbreviation, Code, Numeric Value)
* Type of Character(s) (Numeric, Alphanumeric, Alphabetic, Special)
* Length (Fixed or Variable and Number of Characters)
* Synonyms
* Other Characteristics
* Source of Data Representation
* Description of Data Items (Name of Item, Abbreviation, Code,
Definition)
The information in Table 1 is
an example of the structure of a data dictionary. It again demonstrates
that a data dictionary can generally be described as having the
following features:
* A definition of the term similar to a glossary.
* A listing of the various attributes or properties of the particular data element.
* A definition of how attributes
are coded and the length or size property of the coded data element."
Note: This is basic EAR, with
the attributes being largely extensional and most of the semantics,
including relationships and typing buried in the text-based definition.
"Focusing on purer forms
of TMDD's finds they exist primarily for specific traffic management
systems which have been designed and developed by major traffic
systems firms. Specific examples were examined in visits to PB
Farradyne, Loral AeroSys and JHK & Associates. These individual
data dictionaries were developed to support the data flow with
their system database. They generally are organized in a hierarchy
consisting of a set of tables for some function, with the tables
made up of individual definitions. In database terminology this
structured definition is called a schema. In this structure, one
set of tables could, for example, be grouped under "Device
Modules" with individual tables prepared for detector stations,
ramp control, camera control, etc. To get a sense of size, the
PB Farradyne Mist System contains over 130 individual tables with
each table containing from as few as two defined attributes to
as many as 20. An example of the table for LINK DEF is shown in
Figure 2 and is one of the longer tables with 19 entries defining
specific attributes. In addition to the exact attribute name and
its description (definition), each attribute is described by its
data type which is based on a structured format chosen by the
system designer. Examination of the data dictionary or schema
for other systems shows a similar structure consisting of tables
for specific message sets. For example, the schema for the Loral
AeroSys system database also includes a table for LINK but is
significantly larger as it contains 32 attributes which are defined.
Inspection will show, however, part of the increase is due to
the system designers including environmental information in their
table. (Figure 3). This comparison of how similar terms are handled
by different system designers provides insight into the specific
work that must be performed during the TMDD development. First,
a common set of data elements must be identified; second, the
specific definition of each data element must be established;
and then, third, a table or list of the necessary attributes must
be established. This demonstrates that consolidation and coordination
of terminology and even system structure will be a considerable
part of the TMDD development activity."
Note: This brings up the need
to clearly allow for ES, IS and CS in the DD, but to stress that
physical implementation (IS) should not be the primary issue except
in cases of very static, hardwired interchange among homogeneous
systems. For performance, this will likely need to be the case,
for example, for real-time messaging. But you don't get information
interchange unless the CS (read, the semantics) are interchanged
between the systems, too.
"The ITS data dictionary
developed in the NA program is substantial and consists of approximately
2500 entries. A rough estimate is that approximately 600 of these
entries are related to ATMS."
Note: The messaging format used
is based on the OSI seven-layer architecture, and uses ASN.1,
as noted below:
"The STMF incorporates
a structure based on a standardized procedure for structuring
message formats of various communications protocols known as Abstract
Syntax Notation One (ASN.1). Basically, it defines data terms
which it calls 'Objects' in five fields as follows:
Object name - A textual name
and an identifier for the object type.
Syntax - The abstract syntax
of the object, i.e. how it is built.
Definition - A textual description
of the meaning of the object type.
Access - The object can read-only, read-write, write-only, or not
accessible.
Status - Support is either mandatory,
optional or obsolete."
Note: Again the semantics are
essentially only a text-string within a message.
4.2 Summary document (softcopy)
- Brief background.
4.3 The TMDD Prototype: An Analysis of ISO 11179 for the Development of ITS Standard Data Dictionaries (dated January 27, 1997)
Note: Refer also to the analysis
of the IS 11179 standard documented in the companion notes on
existing and emerging standards.
5. Data for Decision Requirements
for Transportation Systems (softcopy)
The need for intermodal data
analysis indicated in this report is a general driver for providing
a DD standard that would enable information interchange. That
interchange needn't be limited to intra-ITS systems, but also
between those systems and the systems of related external organizations
such as DoE, EPA , DoD, DoA and the Census Bureau.
"What is lacking is a systemwide
framework and capacity to integrate and compare data on a more
consistent basis over time to track system performance and determine
where the transportation system is headed."
It was noted that more 'demand-oriented
data' is needed to indicate areas of weakness and where to focus
investments.
"An effective data support
system has two essential components. First, the data should be
organized in a framework keyed to the broad subject areas of interest.
Second, analytic capability is critical to ensure that the data
are translated into information that is useful for policy analysis.
The latter is particularly important for understanding qualitative
changes that are not readily measured or, if they do appear in
time series data, are reflected too late for policy makers to
take action."
"The time and cost of collecting
and integrating data, as well as the need for systematic and reliable
monitoring over time, work against constant modification of data
bases. Thus, NTPMS is best structured not by issues, which tend
to be transient, but by major attributes of the transportation
system, which fall into four broad categories - supply, demand,
performance, and impacts."
"At a minimum, a brief description
of the data items, their sources, and methods of collection should
be provided. A summary of key trends and changes in trends would
also be appropriate, as would a discussion of the quality and
limits of the data."
"The biggest gap in DOT's
multimodal data programs is in flow data. Flow data refer to information
on passenger and freight volumes from origin to final destination
by trip purpose, distance, mode, and passenger and freight characteristics."
"The Statement of National
Transportation Policy identified safety as the top departmental
priority (DOT 1990, 7), yet the data to monitor the safety and
security of the system across all transportation modes are inadequate."
Note: This is due to different
levels of details, different measures, lack of correlation with
volume (e.g., flow) statistics, etc. Things like performance versus
condition reporting are also made difficult because of a lack
of correlation and because of differing underlying assumptions
used in statistical models. Are these assumptions ever part of
a DD, or are they strictly part of stand-alone statistical analysis
and simulation systems? If data warehousing is a technology to
be used in the future, ITS DDs will need to have such information.
It was noted that a lot of the
data envisioned as being collected by in-vehicle systems and related
traffic control systems is intended for systems management. The
DoT would also like to have access to this information for broader
statistical analysis purposes. Use of data collected by the private
sector is also desired.
Appendices provide a listing
of existing databases, data programs and reports within the agency.
6. NATO Industrial Advisory
Group, Subgroup 52 (NIAG/SG52), Allied Naval Engineering Publication
(ANEP)-51, NATO Naval Combat System Information Catalogue, Volume
0 - Introduction and Volume 4 - Message Construction Standard
At the level of Volume 0, there
appears to be many similarities to the objectives and overall
structuring of approaches between ANEP and the ITS DD standard
and guidelines and MST standard. For example, the notions of 'formally
and unambiguously' defining data structures and data elements
and the [generic] messages they get packaged into are found in
both projects. ASN.1 is the expression mechanism, with MS Access
serving at least as an interim analytical tool, if not a limited
data dictionary or registry [Note: Vol. 0 talks about standardization
of data elements and messages, particularly looking towards genericity
to reduce unnecessary differences and hence reduce ambiguity].
There are also some obvious differences in terms of the breadth
and general nature of the data being interchanged; ANEP data might
well be fairly analogous to the emergency and traffic management
data within the ITS, but doesn't deal with commercial/business
data elements, financial data feeds, etc. that are more complex
and don't map as directly to the low-level basic data types of
ASN.1 (e.g., Integer, Real, String, etc.).
Based on reviewing Volume 0,
the most relevant detailed document to review appeared to be
Volume 4: Message Construction Standard. This was stated as providing
a "formal and unambiguous syntax for the definition of messages
and data structures (components of messages)". Some of the
basic data elements are also stated as being 'formally defined'.
However,, within Volume 4 itself it makes it clear that 'formally
defined' is really only 'formally syntactically defined'. The
basic data types are really only the eight most primitive ASN.1
data types (i.e., they don't really even make use of the full
set of ASN.1 basic types). Additionally, while ASN.1 may be formal
and unambiguous in its syntax (since the syntax and at least aspects
of the associated grammar are defined in specifications that are
well-controlled standards documents), that does not in any way
mean that the [semantic] definitions one specifies using ASN.1
are necessarily formal or unambiguous. That would be presuming
the nature of content based on the nature of its form. "C"
is a formal and unambiguously defined programming language, but
that does not mean someone cannot write bad or completely nonsensical
programs in it, and while they may compile and execute, they don't
produce any meaningful output. Saying, for example, that Message
37, Incident, is a set with members Incident Type (VisibleString),
Date (VisibleString -- YYYYMMDD), ShipNumber (Integer), NatureOfIncident
(VisibleString) and Report (Sequence of Events (VisibleString))
doesn't tell a human user explicitly about what an incident is,
what a ship is, what's an acceptable description of the nature
of an incident, what constitutes and event versus a part of an
event or a description of an event. A message of this nature can
be parsed and read into a receiving application, but unless that
application already 'knows' what those data elements mean and
has the same expected meanings for them as the sending application,
then there is no assurance of nonambiguity. Given that, appropriate
questions to ask would include: do these standards at least include
a set of procedures for defining data elements in a consistent
manner using, for example, structured English; do they have a
set of real 'building blocks." In this case, the answers
seem to be 'no', and that is apparently sufficient in this case,
as the messages are simple enough and the data structures and
data elements are at a low enough level that a direct mapping
to ASN.1 basic data types provides recipients of the message all
that is necessary (in this case, syntax) to act upon the message
or otherwise make use of it in the receiving application's context.
Back
to Home Page
E-mail to Sue Vogel, Staff