October is American Archives Month and today is Electronic Records Day! We’re celebrating the work of archivists and the importance of archives with a series of blog posts about the electronic records. Today’s post comes from Sam McClure, Electronic Records Program Officer in the Office of the Archivist.
With more than 12 billion pages of textual materials, 600,000 reels of motion picture film, 18 million maps and charts, 400,000 sound and video records, 9 million aerial photographs, 17.6 million still pictures and posters, 550,000 artifacts, and 20 billion records in our electronic holdings, the scale of the National Archives’ archival holdings is difficult to grasp.
Any category of records in our holdings can be daunting to consider, whether because of the sheer volume of the material, the legal requirements for gaining access, or because of the challenges in finding specific records related to your research interest.
However, electronic records combine these challenges in a unique way.
Many discussions of electronic records focus on email. But email is by no means the only type of electronic record that NARA takes in—we ingest digital images, databases of many kinds, basic office files in directory structures, and countless other formats.
Email is far from the most complex of the records that we ingest and index. It is, however, a perfect example through which to see and examine several specific challenges facing the National Archives in the realm of electronic records.
The scale of electronic records can be massive. One email series from the George W. Bush administration, Exchange Email, consists of more than 180 million emails.
Archivists at the George W. Bush Presidential Library estimate that the average content of each message (including the contents of any attached files) is equivalent to five pages of printed material.
In other words, the 180 million messages in this series equals approximately 900 million pages of textual material. The combined holdings of all of NARA’s Presidential Libraries total just under 700 million pages of textual records; this one series is more than the textual records of all the Presidential Libraries put together.
The formats of electronic records present challenges as well.
To continue with the email example—while emails are often received in standard email file formats, the attachments to these messages come in an array of formats—word processing files, spreadsheets, slide show presentations, image files, databases, etc. When these attachments are from a recent time frame, the vast majority of the files remain accessible using standard desktop software.
However, as time passes and technology continues to evolve, we should not assume that future software will maintain compatibility with these file formats. As a result, we will need to find ways to ensure we can preserve and access these files and their associated metadata over time.
The contents of all these emails (including their attachments) are searchable. With so much information available, archivists who search these millions of emails for Freedom of Information Act (FOIA) or other access requests frequently contend with search results lists of many thousands of emails.
In many cases, statutory authorities that govern access to emails require archivists to review the messages to identify and protect sensitive information in a variety of categories (e.g., unwarranted invasions of personal privacy) before making the emails available to the public.
Given the large number of emails responsive to access requests, the search for responsive records and the review of those records is a mammoth undertaking.
In order to meet the challenges presented by these records, we must continue our work to develop more capabilities—to improve our ability to search our records, to support the management of our digital content to make sure it remains accessible over time, and to deploy tools to help us reduce the backlog of electronic records requiring access review.
The volume and complexity of emails make them a useful example, but the issues these records present are not unique in NARA’s holdings.
The National Archives has been providing public access to previously sensitive electronic messages for many years. While the many file formats in this example series of emails require monitoring so we can continue to access the records, these are only a small sampling of the file formats found in NARA’s holdings.
And, while the millions of emails in this series present archivists at the George W. Bush Library with a daunting task, there are millions of other sensitive electronic records in the National Archives requiring access review—and many millions more are coming with the transfer of Presidential records at the end of the Obama administration.
Indeed, as the National Archives and the rest of the federal government work to implement the Managing Government Records Directive—and its emphasis on managing electronic records electronically—the transition of Obama electronic Presidential records is only one in what will be a growing wave of massive transfers to NARA of electronic content documenting the actions of the federal government.
The National Archives is currently developing a new framework called ERA 2.0. We are working to create a processing environment and a digital repository that will help us meet the challenges of scale, accessibility, and review.
By providing a structure that will support the use of a broad range of software tools, ERA 2.0 will allow us to apply new technologies to the issues we currently face and will give us the ability to deploy new tools as we encounter new types of records as well as find better ways to preserve and make our records available.
Given the challenges we face with electronic records, the success of ERA 2.0 is our best chance to ensure we can play our role in preserving the records of government.