US national archives to use artificial intelligence to manage digital records

0
US national archives to use artificial intelligence to manage digital records

The National Archives and RecordsAdministration (NARA), which holds all of the country’s government records, manages millions of digital files. In order to facilitate the search of this valuable data while limiting the manual marking of files, artificial intelligence will be used. The primary objective will be to automate the document management processes to achieve these results.

Artificial intelligence to manage millions of digital documents

The U.S. National Archives catalog currently contains more than 120 million digital documents as well as archival metadata and other types of documents. Paradoxically, the search function to find a specific document is not at all sophisticated, nor is the tagging of metadata, which must be done manually.

From this observation, three key questions emerged: How can NARA make its document search easier? How can it make metadata records more efficient and timely? How can NARA ensure the integrity of its data?

Through a Request for Information (RFI), NARA was able to obtain concrete information on many issues: on identifying and solving data-related problems, on creating AI solutions for search functions and metadata tagging, on the potential licensing costs of creating these solutions, and on storing digital documents.

Metadata tagging and search functionality

The American Archives Administration therefore organized a day where it explained its objectives to integrate AI and machine learning in two of its projects: the customization of the search function in the catalogue and the automation of metadata tagging.

This day consisted in describing also all the difficulties that could exist in the implementation of these two projects. Among the facts retained: the presence of results coming from the same source making it difficult to search for multi-source documents, the absence of precise results if the keyword used is not the one precisely used in the document or the recording of homonymous keywords, the example evoked being that of President Truman and the Truman aircraft carrier.

NARA also wants to automate the process of tagging metadata to avoid relying on employees and the possible human errors of manual tagging. Machine learning technologies are therefore being considered to implement this automation. The solution developed would identify useful metadata at the time of acquisition and apply tags as the metadata is recorded.

Translated from Les archives nationales américaines utiliseront l’intelligence artificielle pour gérer leurs documents numériques