How does Google seek to fight unwanted content with artificial intelligence?

6 mai 2021

In 2020, Google announced the introduction of new artificial intelligence tools in its services to fight spam. According to the Mountain View firm’s teams, these solutions can block up to 99% of unwanted content. About 40 billion spam pages are discovered every day among the billions of sites or mails explored and indexed. Let’s take a look at the techniques used by Google to eradicate spam and malicious content.

Several indexing systems to fight against spam

Firstly, the multinational company has designed AI-based systems capable of detecting undesirable content when a user explores web pages or other content (such as emails for example). If they detect content that seems undesirable, they do not include it in the index with which they provide search results.

Next, the indexing model analyzes the content that has been included in that index and checks whether it can be considered spam. If so, this content will not appear in the search results or in the user’s mailbox. The different stages of spam detection actually work like a funnel:

The closer you get to the bottom of the funnel, the more systems there are to counteract spam. The first technology used is the GoogleBot crawler. This is a robot that crawls websites link by link, for the purpose of indexing. This is how we go from “crawled spam” to “indexed spam”. This second phase uses the previously mentioned model. After that, there is only manual action to detect potential spam.

Beyond spam, a question of security

Google says it wanted to go further by entering the field of user data protection in the context of scams or online fraud. The latter exist in many forms, very often reported by users. Thanks to these reports, the systems developed by Google have been able to train themselves to detect potentially fraudulent sites and then analyse their true nature in the same way as for spam as shown in the illustration below:

On the other hand, spam is becoming more and more prevalent because of website hacking. Spammers use stolen content to add additional pages with fraudulent links that can redirect you to fake sites asking for bank details, login information or personal data. Even more seriously, hackers can trick you into downloading malware that takes control of your computer or Google account.

Google’s systems are designed to remove any content related to these scams as quickly as possible. By fighting against this kind of practices, the Mountain View firm has developed its AI technologies and says it has improved some of its services, especially in the classification and indexing of information related to the purchase of a product for example.

Translated from Comment Google cherche à lutter contre les contenus indésirables grâce à l’intelligence artificielle ?