October is American Archives Month! We’re celebrating the work of archivists and the importance of archives with a series of blog posts highlighting our “Archives Across America.” Today’s post comes from Elizabeth Lanier, an archivist at the George W. Bush Presidential Library and Museum in Dallas, Texas.
The George W. Bush Presidential Library and Museum holds 70 million pages of textual materials, in addition to vast quantities of audiovisual and electronic records.
While that is a large amount of material, only a portion has been made available to the public for research through the systematic processing of specific collections and the Freedom of Information Act (FOIA). FOIA requests are groupings of records from across the library’s holdings that have been processed in response to topical requests from an individual.
I lead the library’s textual digitization initiative. In other words, I work to get all processed paper records scanned and placed online. This is an area I’ve always been passionate about because it connects people around the world with records when and where it’s convenient for them.
Many times, visiting an institution in person is too expensive or too time-consuming. At the library, we strive to digitize each FOIA request that has been released. Some requests contain only a few folders, while others consist of 50 or more boxes. So, the amount of work required can vary dramatically from FOIA request to FOIA request.
Digitization can be a time-consuming process. Once a box is selected for scanning, the records themselves must be analyzed. Are pages torn or fragile? Are documents single or double-sided? The characteristics of the records themselves determine how they will be scanned. Additionally, what are the saved files going to be named? How are they going to be organized? We use a file-naming convention that links numeric codes to each collection and series within our holdings. The files are housed in digital folders that mirror the physical organization of the records.
Next, scanning begins. We use a high-speed scanner with a document feeder for the vast majority of our materials. Any items identified during the records analysis stage as fragile must be scanned on a flat-bed scanner. Once the scans are generated, TIFF image files are created for each scanned page.
Then, a PDF document containing images of all pages in a folder is created as an access copy. The PDF must also go through the Optical Character Recognition (OCR) process to ensure the file is accessible to everyone. Finally, data must be collected about every scanned file. These steps must be repeated for every folder in a FOIA request.
Once a request has been fully scanned, it can be placed online for the world to see. We place holdings online in two places: 1) the National Archives Catalog and 2) the George W. Bush Library’s Digital Library. As the lead for this project, I coordinate with the staff at the National Archives Catalog to ensure our files can be uploaded into their system. This process uses the data that was collected about each digitized folder. Finally, for each request I build individual web pages on our library’s website under the Digital Library.
While this work can be tedious, it also has a satisfying rhythm to it. I tell everyone I train that it is important to visualize the end process—replicating the experience of viewing records in person in the research room—as you create files. We have many interns and volunteers help us with this initiative. Their assistance is always greatly appreciated.
In my opinion, digitization is at the heart of NARA’s strategic goal to make access happen.
Visit the National Archives American Archives Month web page for more information about our events and activities throughout the month.
Join us on October 4 on Twitter for #AskAnArchivist Day when staff from across the nation, including Elizabeth Lanier, will be answering questions and talking about what it’s like to be an archivist at the National Archives.
And don’t miss our #AskAnArchivist follow-up sessions every Tuesday in October from 1 p.m. to 2 p.m. EDT. Follow us on Twitter @usnatarchives for more information.