November 30, 2017, is International Digital Preservation Day (Twitter hashtag #IDPD17). The National Archives is participating in this worldwide initiative to promote digital preservation by talking about its work with electronic records. Today’s post comes from Ted Hull (Electronic Records Division), Leslie Johnston (Digital Preservation), and John Martinez (Policy and Standards Team).
The National Archives and Records Administration has a long history of working with born-digital electronic records—as far back as 1970. An integral part of this work is the issuance of extensive guidance on all aspects of Federal electronic records eligible for transfer to NARA, including media types, file formats, and metadata.
The development of this guidance is now complemented by a greater focus inside NARA on digital preservation. NARA has recently issued its first agency-wide digital preservation strategy. We are also developing file format preservation plans that align with the guidance issued to agencies to inform both the processing and the preservation of its electronic records holdings.
Here’s an introduction to the lifecycle for one category of electronic records—textual—from start to preservation.
It Starts with Guidance
The Records Management Policy and Outreach Program, Policy and Standards Team, in the Office of the Chief Records Officer for the U.S. Government develops and maintains guidance on the requirements for the transfer of electronic records to the National Archives. The two main NARA Bulletins related to electronic records transfer are Bulletin 2014-04, Revised Format Guidance for the Transfer of Permanent Electronic Records and Bulletin 2015-04, Metadata Guidance for the Transfer of Permanent Electronic Records.
Bulletin 2014-04 specifies which file formats are acceptable when transferring permanent electronic records to NARA. An appendix lists the acceptable formats for different categories of electronic records.
Bulletin 2015-04 defines the minimum set of metadata elements that must accompany transfers of permanent electronic records to the National Archives. Its appendix describes the required minimum elements.
Both of the Bulletins were developed to provide agencies with guidance to ensure that the electronic records they transfer are compatible with NARA processing and preservation needs. In the case of the format guidance, acceptable formats were selected based on whether a format was:
- widely adopted
- an open format
- well documented
- supported by current software
The metadata elements and terms selected for the metadata guidance are a subset of the Dublin Core Metadata Element Set v1.1, which provides generic, repeatable, human-readable elements that can be applied to any electronic record. This approach was taken to provide agencies with a comprehensible standard that could be mapped easily to existing metadata, and to provide NARA processing and preservation staff with information to enable NARA to appropriately manage, preserve, and provide access to electronic records for as long as needed.
In developing and maintaining this electronic records guidance, the Policy and Standards Team works with NARA custodial units such as the Electronic Records Division, Processing Branch to identify additions or other changes to the specified file formats or metadata elements. The suitability of formats, file transfer problems, and emerging formats or record types are examined to determine where to take further action. When guidance is first drafted or revised, it is made available to all of NARA to ensure that all internal stakeholders can review and react before it is issued to agencies.
The Policy and Standards Team also works with agencies to hear and react to questions, issues, or suggestions regarding the guidance. The Records Management Policy and Outreach Program convenes agency-NARA collaborations such as the Federal Records Management Council and the Electronic Records Management Working Group, which are venues where discussions about the interpretation, suitability, and recommendations regarding the guidance can be raised.
The blog of the Chief Records Officer of the U.S. Government, Records Express, also provides information and clarification about electronic records guidance. A series of posts on metadata has provided further examination of specific aspects of the guidance, as well as the role of metadata in electronic records management.
How NARA Processes Records
One of the primary responsibilities of NARA’s Electronic Records Division, Processing Branch, is working with federal agencies on the transfer of electronic records series scheduled for permanent retention in the National Archives.
The division tracks 1,750 permanent series and in FY 2017 received transfers against 113 of them, or 5 percent. In 2009, the process of transferring physical and legal custody of agency records became automated with the Electronic Records Archives (ERA) Transfer Request (TR) and Legal Transfer Instrument (LTI) business objects. These records come to NARA through direct offers and involve archivists negotiating acceptable electronic records formats and media with agencies.
With the publication of NARA Bulletin 2014-04: Revised Format Guidance for the Transfer of Permanent Electronic Records, the range of acceptable formats increased 10-fold over those previously accepted since guidance and the regulation (36 CFR 1235) issued in 2003.
Federal agencies transfer many types of electronic records to the division, including structured data files, geospatial data, websites, and born-digital textual records including reports, publications, and email. These records range from fully open, unrestricted records to National Security–classified records up to the Top Secret, compartmentalized level.
Another responsibility of the Processing Branch staff is to review the formats received and contents of the approved transfers using a variety of systems and tools. Processing born-digital electronic records involves carrying out a systematic series of actions to prepare the electronic records in an accession for verification and preservation. Based on the accession project plan, processing may involve digital file arrangement, records restriction identification, file transformation, and other tasks using the division’s internal systems and computer tools or applications.
Staff verify the records received to compare the content of the electronic records to the records disposition schedule and agency documentation. Verification can involve content review using the Archival Electronic Records Inspection and Control (AERIC) utility for structured data files and text, or a combination of manual and automated review using computer tools or applications. Because of the increased scope and variety of formats designated in Bulletin 2014-04, the division established a “Tools Group” in FY 2013 that actively seeks COTS (Commercial off-the-shelf) and custom software to support review and processing.
Preservation involves staging the files for ingest to the repository, submitting files for preservation, and confirming that the files are successfully ingested and available in the digital repository. With the deployment of the Electronic Records Archives (ERA) in 2009, the division moved from a tape-based preservation paradigm support by the Archival Preservation System (APS) to ingest of an accession’s digital files to the ERA digital repository. During ingest to ERA, critical metadata is captured from the files that document the file format.
Over its 47-year history, the electronic records custodial program has received 4,317 transfers of born-digital records and completed 3,944, with 373 in the processing queue at the end of FY 2017. The total volume accessioned amounts to 400 terabytes. In FY 2017, the division received 276 new transfers and completed 247 accessions and 17 pre-accessions.
The definition of acceptable file formats for transfer and preservation also guides the Division’s work in making these records available to the public. The division uses two platforms for making its holdings available: the Access to Archival Databases (AAD) system for record-level search and retrieval from structured data and text files and the National Archives Catalog for download of full files. AAD makes accessible 66 series with 174 million records, and the Catalog holds 257 series, 117,000 files, and over 2 terabytes of accessioned records.
What Does This Mean for Preservation?
The development of guidance for agencies and the work of processing both inform and are informed by digital preservation needs. Digital preservation is most successful when it’s considered from the very beginning of the lifecycle, at the creation of the records.
The guidance that NARA provides to agencies about file formats must be in line with preservation sustainability factors for file formats—are the formats widely adopted, non-proprietary, open formats that are well documented and still supported by current software that can read them? When records are transferred, they are validated to ensure that they are uncorrupted and, if possible, meet NARA’s format guidance. During processing, NARA archivists take care to make sure the significant properties of the records are unchanged so their content can be preserved.
But preservation doesn’t end once the records are stored in NARA’s preservation repository—their condition is constantly monitored to ensure that the files remain unchanged. If issues are found, preservation actions are taken based on internal guidance documented in File Format Preservation Plans.
Formats that are appropriate for preservation will always be under review, which will in turn inform the ongoing Guidance issued for record creation and transfer and the tools used in records processing. It’s an ongoing cyclical activity to ensure that records remain accessible in the long run; all of NARA’s work is aimed at both public access and preservation.