The Intelligence Advanced Research Projects Activity (IARPA), an organization within the U.S. Director’s Office, has a mission “to push the boundaries of science to develop solutions that enable the IC (Intelligence Community) to do its job better and more effectively for national security.” HIATUS is one of its research programs that aims to authenticate the author of a text and ensure its privacy through human-explainable algorithms.
IARPA implements research programs and delivers the results to its IC clients who themselves deploy the resulting innovative technologies. The four main research areas in which it invests are artificial intelligence, quantum computing, machine learning and synthetic biology.
The HIATUS program, human-interpretable attribution of text using the underlying structure
Whether spoken or written, linguistic components differ from one person to another, the organization of words, sentences, and their content can reveal who spoke or wrote them.
Timothy McKinnon, the manager of the Hiatus program, told Nextgov in an interview:
“For a little bit of context, it’s like if you had 100 different people, and you asked them to describe something simple – like how to open a door – in two sentences or one sentence, you’d probably get about 100 different answers. Each person sort of has their own idiosyncrasies as an author that are potentially used by authorship attribution systems.”
On a daily basis, a mass of text is written by anonymous authors, human or machine. Timothy McKinnon points out that most of these documents contain linguistic components that can be used to identify who wrote the information, or to protect the identity of the authors if attribution could put them at risk.
He explains:
“With attribution, we identify stylistic features. So it’s things like word placement and syntax that can identify who wrote a given text. Think of it as your written fingerprint. What are the characteristics that make your handwriting unique? Then the technology would be able to identify that fingerprint against a corpus of other documents and compare whether they came from the same author. On the privacy side, the technology would find ways to alter the text so that it no longer looks like a person’s handwriting.”
Currently, there are three ways to authenticate the author of a text: linguistic experts can do this by analyzing the text, one can also use Machine Learning, including logistic regression or use a Bayesian model, but according to Timothy McKinnon, these methods would not be valid for all texts. The third alternative is to use a neural language model, but for him, they are not sufficiently explainable.
He states:
“The problem with these models is that even though they are very, very fast and they work very well, we don’t really understand what’s going on inside. They are very complex.
And so what HIATUS is trying to do, among other things, is to find out some of the reasons behind the behavior of these models, so that when we do authorship attribution or confidentiality, we’re able to really understand why the system behaves the way it does, and be able to verify that it’s not detecting false information and that it’s doing the right thing.”
The HIATUS program therefore aims to develop new human-usable systems for attributing authorship and protecting the privacy of authors through the identification and exploitation of explainable and exploitable linguistic fingerprints in different languages. It is expected to last 42 months, from September 30, 2022 to about March 29, 2026, the BAA (call for proposals) was published last February 25.
Translated from HIATUS, le programme d’IARPA (Intelligence Advanced Research Projects Activity), pour authentifier et protéger les auteurs de texte