Introduction and Overview to the CDF Project
This page presents an introduction to the P1622 common data format project. It starts with a discussion of the proprietary nature of current election equipment data formats and the resultant problems this creates, and then discusses the advantages to using a common data format and how this improves efficiency, reduces cost, and increases the overall transparency of election equipment and, ultimately, elections. The discussion concludes with a high-level overview of the P1622 CDF format and project plan for accomplishing a comprehensive CDF standard.
- Why the Need for a CDF for Election Equipment?
- Advantages to Using a CDF
- P1622 CDF Example
- P1622 Plan for Building CDF Capability
Why the Need for a CDF for Election Equipment?
The need for a common data format is analogous to the use of a common language for people and economies to share the best of ideas, products, and services. A language used exclusively by a few isolates people from the rest of what the world has to offer. As the demand and use of technology increases in elections, a variety of new products are being used by election officials that must be able to talk intimately with each other (i.e. share data) or talk with a common host in order to integrate them into the entire election administration process. Since the "data language" used by these products tends to be proprietary and doesn’t communicate with products from another manufacturer, election officials can be limited to the voting systems product line available through the manufacturer they already have a relationship with.
In the elections marketplace, this has several disadvantages:
- Election officials can be "locked" into a single manufacturer’s product line by decisions made years ago in their jurisdiction when the needs were different. The cost of converting the jurisdiction’s entire product line to another manufacturer and the top to bottom change in procedures often required could be prohibitive.
- Election officials may not have an opportunity to shop for the more appropriate product to meet their needs if they are limited to using only the products offered by their current manufacturer.
- Smaller companies that might focus on a single product can be locked out because of the lack of interoperability with manufacturer product lines.
- The addition of newer devices such as online blank ballot distribution or tablet technology is complicated by projects having to define its own data format or use a format that is proprietary to an existing manufacturer.
Without a CDF, duplication of effort, greater risk of getting things wrong, and re-invention of the wheel can occur. Several examples suffice:
A jurisdiction’s voter registration system and candidate filing system may both contain the information needed by its ballot layout system when it comes time to create a new election. These are actually "families" of equipment, and due to a lack of a CDF between the two families, jurisdictions that import/export information between them have commonly had to expend their own resources to create a sort of "translation service" between the two that is unique to that jurisdiction’s situation. Or, some jurisdictions may have to duplicate their data entry by re-creating the information in the ballot layout system more or less manually. Obviously, a CDF between these families of equipment would largely eliminate this duplication of effort and reduce the potential for mistakes.
During an election, the vote-capture devices such as DREs, POS, and CCOS (Direct-Record Electronic, Precinct Optical Scanner, and Central-Count Optical Scanner respectively) communicate the votes they gather to a common EMS (Election Management System) where the votes are tabulated and reported. Prior to its use in the election, a common EMS communicates a complex set of ballot configurations to these devices that define the ballot styles of each vote the device will be used to gather. Typically the EMS that creates the ballot styles and communicates the ballot styles to the devices is the same system that receives the votes gathered by the devices. Without a common data format and given that the communication protocol, data structure, and data elements are unique to each manufacturer, the DRE, POS, and CCOS in a jurisdiction will generally be products developed, marketed, and/or integrated by the same manufacturer who created the EMS. There are a number of issues created by this scenario combined with the lack of a CDF:
- If a jurisdiction wants to use a different DRE, or POS, or CCOS than the one provided by its manufacturer, it may be very costly to integrate the product.
- The same issue occurs with EMSs; a state must use the same manufacturer throughout the state or else must import/export tabulated results to intermediate formats.
A number of states, motivated by the MOVE (Military and Overseas Voter Empowerment) Act's requirements and the need to deliver blank ballots to voters located overseas in ample time for the election, are now fielding blank ballot distribution systems (BBDS) to allow voters to download their respective ballots; some states are also experimenting with using tablets (e.g., the iPad) to assist in blank ballot delivery. These technologies each need to "speak" to other parts of the voting system, for example the BBDS needs to access ballot information stored in the EMS and voter/precinct information stored in the VRDB (Voter Registration Database). However, without a common data format, the data format either needs to be invented for that particular project, or the data format will be the same proprietary format as one of the manufacturer partners, e.g., for example, the data format used by the EMS. This typically requires a business arrangement between the project developer and the owner of the proprietary format.
Advantages to Using a CDF
There are numerous advantages to using a common data format in election equipment and associated software and systems. Perhaps the best answer is because it makes the devices easier to use, deploy, and understand. Ultimately, it can make them less complex to administer and understand, and the resultant reduced complexity may then lead to greater trust of the voting devices. Thus, a CDF is foundational in a number of ways for improving current voting systems and for making it possible to develop new voting technologies in an efficient, orderly way. It follows from the previous discussion that use of a CDF brings the following advantages:
1. Anyone can build or sell a device; no manufacturer gets locked out of the market
Support of a CDF in manufacturer equipment results in interoperability of data format and permits new manufacturers to sell equipment to states or jurisdictions where they were formerly locked out. It allows small manufacturers to build one-off devices as opposed to having to build a complete suite of products.
2. Election officials are empowered to buy whatever devices best suit their needs
When there is interoperability, election officials can then shop for the devices that best suit the needs of the voters, regardless of manufacturer. If, for example, a particular accessible voting device is deemed better for a particular state, election officials can now use this system regardless of who has manufactured it.
3. Software and new system developers can write applications that make use of the CDF
The P1622 CDF is an open standard freely available, and developers/integrators of new equipment and software can use it to interface to other manufacturer equipment. This prevents the continual "re-invention of the wheel" that occurs when new systems must develop their own format.
4. Elections can be audited and analyzed and archived more easily
Voting devices store a number of data elementsthat are important and useful for election audit and analysis, but these items are sometimes not easily accessible due to their proprietary formats, e.g., event logs. When a CDF export format is provided for a class of voting devices, manufacturers can then build in the export capability for these elements. In other words, "Build it and they will come."
5. Device certification is possible
The EAC certifies voting systems, that is, complete systems of devices to run an election. Certifications can be as expensive as $2 to $4 million and may take several years. If a state wishes to use a new device in a certified voting system, it may "break" the certification because of the resultant changes that would need to occur in order to make the new device operate with the other devices in the system. Thus, states will avoid having to re-certify as much as possible. With interoperability, however, a device itself could be certified and used in an existing voting system without breaking the certification.
6. Voting equipment testing is easier with common formats and imports/exports
When devices have a common import/export format capability, tests can be made more uniform and devices can more easily be tested against common collections of data. Outputs from devices can be analyzed with more consistency.
7. The transparency of the equipment is greater, ultimately, and more trust of the equipment is possible
This last point is to emphasize that a voting device becomes more "transparent" when it is possible to export all of its data and more easily analyze its workings. This doesn't mean necessarily that a device incorporating CDF capability is automatically more secure, but it does indicate that the device is less complex to analyze and understand, and this can therefore lead towards greater trust. Reduced complexity and greater transparency is especially important when dealing with voting devices that are connected to the Internet, e.g., BBDS.
P1622 CDF Example
P1622 voted at a meeting in February, 2011 to use the Organization for the Advancement of Structured Information Standards' (OASIS) EML (Election Markup Language) international standard as a basis for its common data format development, with a goal being to feed changes back into the OASIS EML standards effort. EML has been used in parts of the United States for a number of years, including notably its use for statewide election results reporting by California and by Virginia as well as for other aspects of elections operations in other states. It has been used more in various parts of Europe and in Australia and is a requirement in the Council of Europe's voting recommendations for e-voting. It is a large collection of XML schemas that address virtually every aspect of data interchange for elections operations. At the March 2011 meeting, the U.S. manufacturers present unanimously voted for use of EML, to some extent because the election equipment market has become international and requests for EML capability have been increasing.
Below we present an example of a P1622-modified EML file, the EML "520" file used for election results reporting and, among other applications, state "roll-ups" on Election Day (note: the most current version of the OASIS EML 520 may differ from this example in some ways). Briefly, the use case for state roll-ups would involve a U.S. state whose counties (a) aggregate votes reported from precincts in the county and (b) report those aggregated vote totals upward to the state, as follows:
- At each precinct when the polls close, precinct tabulators export an EML 520 file containing aggregated vote totals and transmit the file to the county's central office.
- At the county's central office, the EML 520 files received from the precincts are imported into an EMS, where all precinct vote totals are then tabulated.
- The EMS at the county's central office exports an EML 520 file containing aggregated vote totals from all its precincts and transmits it to the state's central office.
- The state imports the EML 520 files received from each of its counties into an EMS, and then exports a final EML 520 file containing aggregated vote totals from all its counties and makes this available to the public/media. The file contains the vote totals for the entire state, and then shows the vote totals reported by each of its counties.
The snippet of the file presented below shows the final EML 520 file in Step 4:
<EventIdentifier Id="2012 November General" />
<!-- Aggregated ballot counts go here for the entire state -->
<!-- Aggregated ballot and vote counts go here for this county -->
<ReportingUnitIdentifier Id="Georgia County">
<ContestIdentifier Id="346" />
<CandidateIdentifier Id="12S" DisplayOrder="1">
This example file is hopefully easy to understand if one views <Election>, <ReportingUnits>, <ReportingUnitIdentifier>, and <Contest> as containers for information that are nested hierarchically. <Election> contains the vote totals for this state's "2012 November General" election, <ReportingUnits> contains the vote totals for each county (a "reporting unit" is synonymous with "county" and the county shown is "Georgia"), and <Contest> contains the vote totals from contest ID 346 (presumably, the county has associated "346" with the contest name/information located in a different file).
The <Selection> container identifies information about one candidate running in contest 346; in this case it's Leo Patrick; he received 17231 votes and he is the winner.
In a more complete version of this example file, there would be additional <Selection> containers for the other candidates in contest ID 346, and there would be additional <Contest> containers for the other contests on the ballot. Lastly, there would be additional <ReportingUnitIdentifier> containers in the file, one each for the other counties in the state.
P1622's Plan for Building CDF Capability
Achieving the goal of election device-to-device interoperability, using a CDF in part as the means, is complicated and requires the involvement and cooperation of many parties. Given this, P1622 has developed a project plan that consists of a series of small, focused standards that are limited in scope to "slices" of election data and that are modeled after typical use cases used in system design. This P1622 strategy for use case standard development involves addressing the “easier” aspects of a CDF that involve the least amount of device-to-device interoperability and that have the highest odds of early success, and then subsequently working towards those aspects that are more difficult but achieve more device-to-device interoperability. Addressing these standards in parallel as opposed to serially allows more flexibility and capability to take advantage of external assistance or collaboration with other interested parties or coalitions, most notably the Pew VIP effort.
P1622 has developed a project plan that contains four levels of use case standards, focusing on the endpoints of the voting system at level one and working towards greater aspects of device interoperability at levels 2, 3, and 4:
Level One Use Case Standards
Level One focuses on inputs into the voting system and exports from the voting system and could be considered as to address the “low hanging fruit,” that is, data that does not necessarily require device interoperability and that is relatively straightforward to place in a CDF. The devices addressed are VRDBs and EMSs, both of which generally permit imports/exports in a comma-separated value (CSV) format. Translators could thus be built from CSV to the CDF initially, and manufacturers could add direct EML import/export capability.
Level Two Use Case Standards
Level Two addresses data that is more complicated to export in some cases, e.g., event log data or cast vote record data, as well as data that may allow limited device interoperability, e.g., blank ballot information, minus state-specific formatting, that can be used by BBDS.
Level One Three Case Standards
Level Three focuses specifically on EMS export of information to dynamically build blank ballots with state-specific formatting, known as ballot definition data. This would permit other voting equipment to accept this information and display ballots according to state requirements. Practically speaking, the state-specific information might be contained in templates or configuration files that may work in tandem with a CDF, but if made part of the CDF specification, this would permit interoperability among EMSs and voting equipment including BBDS, vote-capture devices, ballot marking devices, and ballot printers.
Level Four Use Case Standards
Level Four involves EMS export of other voting device configuration data (memory card configuration, other device initialization information) to achieve interoperability among voting devices in general. At the base level within the EMS, the machine configuration data that is downloaded to the vote-capture devices (e.g., DREs) could be provided to vote-capture devices from other manufacturers. Level four provides data elements required to achieve a greater degree of interoperability, such as device configuration parameters.
The use cases are written for both technical and non-technical audiences to the extent possible. Each use case standard contains an overview of the use case, with a model of the election data involved, expressed in UML (Unified Modeling Language). UML is useful for enumerating the various data elements and how they relate to each other and then showing these mappings in a visual diagram, and UML can also be used to generate newer versions of the XML schemas or for other data formats (such as YAML or JSON, two other data interchange formats). The use case standard also contains annexes documenting the relevant EML schemas and worked examples of the EML files.