Voice technologies have experienced a great boom in recent years with innovative solutions, research projects with numerous applications but also the issue of data protection. It is in this context that the Voice Lab and Datafunding have launched a data funding campaign with the main objective of collecting 2,000 hours of voice data.
This initiative invites the French to donate their voice data in order to co-construct the voice technologies of tomorrow, open and at the service of all, and to put voice at the service of citizen artificial intelligence. Through this collection available on the Datafunding website, users can share their voice data and thus help research, as well as the entire digital voice ecosystem and innovation in France, Europe and the French-speaking world.
“The message we want to convey to Google Home, Alexa or Siri users is this: the data in your voice assistant belongs to you. Donate it to French search! “explains Karel Bourgois, President of the Voice Lab.
“Today, more and more people want devices with which they can really communicate naturally. In order to develop such systems it is necessary to have access to free voice data. This is why we are launching a citizen data funding campaign with respect for personal data. Each participant is invited to share the voice data stored at GAFAM in a French marketplace! This will enable the French voice ecosystem to develop sovereign technologies independent of the digital giants GAFAM and BATX. This citizen act will support the French voice ecosystem and the development of a diverse, dynamic and sovereign voice technology. »
User data is valuable. The Voice Lab wants to make citizens aware of the full ownership they keep on the data collected by the different Voice Assistant services such as Google Home, Alexa or Siri. The Voice Lab now allows citizens to decide who can use their voice data and for what purpose.
Voice assistants on connected loudspeakers or other voice channels such as the telephone have become essential, based on speech recognition engines driven by very large sets of data collected from users during their interactions and to which only a few large companies have access. Thanks to the Voice Lab’s data funding operation, users can now share their voice data and thus help research, as well as the entire digital voice ecosystem and innovation in France, Europe and the French-speaking world.
The Voice Lab and Datafunding have set up a website that very simply allows each user to give their consent to the Voice Lab so that the Voice Lab can retrieve their voice data from the voice assistant editors. This tool allows consent to be managed and automatically revoked if the user decides to stop contributing to the Voice Lab.
Once collected, the data are anonymized and standardized to contribute to corpora and create independent and contributory transcription models.
The main objective is to share data and to design speech recognition models in French language to feed different automatic speech transcription services, among others open source. The engines thus enable the various players – academic, private or public – to create voice services or products independently of GAFAM. French laboratories will thus be able to lift many locks, particularly those related to the volume and diversity of available data.
The mission of the Voice Lab is to create an alternative to existing solutions and business models, in order to guarantee on the one hand the privacy of users and on the other hand to prevent economic dependence on GAFA and BATX on the voice interfaces that will be the interfaces of the future.
The Voice Lab(levoicelab.org) specifies that this data will be hosted in strict compliance with privacy and personal data protection laws and that the primary objective is to ensure that user data will be used in accordance with the values of IA Ethics by Design, such as respect, transparency, fairness, security, equity and control.
This campaign is an opportunity to communicate to the general public and to raise awareness of this initiative among citizens.
Some examples of products and services that can be created by Voice Lab members using this data:
- Speech recognition applications are essential in many business areas and are also the basis for advances in terms of accessibility (visual and hearing disabilities, electronics);
- Creation of on-board voice assistants;
- Comprehension of telephone conversations with a telephone advisor;
- Creation of voice assistants for telephones or connected loudspeakers;
- Help for the hearing-impaired;
- Subtitling and media monitoring.
Translated from Données vocales : Le Voice Lab et Datafunding lancent une campagne de data funding