Focus on PanGu-Alpha, the language model built with 25 billion more parameters than GPT-3

0
Focus on PanGu-Alpha, the language model built with 25 billion more parameters than GPT-3

GPT-3, Open AI’s language model, available since July 2020, was at the time of its announcement, the largest language model ever trained with 175 billion parameters for a size of 45 terabytes. For comparison, GPT-2, Open AI’s previous language model, had been trained with “only” 1.5 billion parameters. A research team from the multinational Huawei has announced the development of a language model that could be similar to GPT-3. Named PanGu-Alpha, the model would contain up to 200 billion parameters, 25 billion more than Open AI.

The challenge of PanGu-Alpha: a model trained with 200 billion parameters

PanGu-Alpha was unveiled as part ofa publication by one of the research teams at Chinese company Huawei. About 40 researchers helped write the paper and contributed to the implementation of this project. It aims to design a search model containing up to 200 billion parameters, trained on 1.1 terabytes of ebooks, encyclopedic articles, news, social media, and web pages.

Large language models like GPT-3 learn to write text using the billions of examples available on the internet as examples. Like the Open AI model, PanGu-Alpha is a language model that is first pre-trained on unlabeled text and then refined to fit a particular task.

The researchers leveraged the MindSpore framework to develop and test the model. A cluster of 2,048 Huawei Ascend 910 AI processors, each delivering 256 teraflops of computing power, were required to develop the tool. The research team collected nearly 80 terabytes of raw data from public datasets.

They then filtered this data by removing documents containing less than 60% Chinese characters, less than 150 characters, advertisements, etc. The Chinese text was then converted into simplified Chinese. One of the differences concerns the number of tokens on which PanGu-alpha and GPT-3 were designed: 499 billion for the American model versus 40 billion for the Chinese model.

Experimentation and scientific progress: a model almost equivalent to GPT-3

The researchers tested their new model and found that it was particularly effective for writing poetry, fiction and dialogue, and for summarizing a fairly long text. However, a group evaluating the model’s performance determined that 10% of the results provided by the tool were not of high quality. In addition, the researchers found that some of PanGu-Alpha’s creations contained illogical, repetitive or irrelevant sentences.

Like GPT-3, the Chinese model cannot remember previous conversations and lacks the ability to learn concepts through further discussion and thus, anchor actions and entities through real-world experiences for example. However, while PanGu-Alpha seems quite impressive in terms of performance, the model is not a scientific breakthrough in itself: that’s according to Guy Van den Broeck, assistant professor of computer science at the University of California at Los Angeles.

As for the carbon footprint, the environmental impact of PanGu-Alpha is not very clear, but it is likely that the carbon footprint is just as “substantial”, if not more, than models of the same size.

What future for language models? Performance and ethical issues

An article published last February by researchers from Open AI and Stanford University, put the finger on the capabilities, limitations and societal impact of large language models such as GPT-3 or PanGu-Alpha. It was written by Alex Tamkin, Miles Brundage, Jack Clark and Deep Ganguli. It is stated that large developers of language models such as Open AI or Huawei only have a six to nine month advantage before others may have the ability to replicate the same type of model.

The experts who wrote the publication suggested several recommendations to address the negative consequences of language models:

  • Passing laws requiring companies to recognize that text is generated by AI if it is
  • Training a separate model that acts as a filter for content generated by a language model
  • Deploy a suite of bias tests to evaluate the models before allowing certain individuals or the general public to use the model
  • Avoiding specific use cases

This last point is worth noting. The main fears in the West about models like PanGu-Alpha are about discrimination against people like the Uyghurs. Last month, the BBC interviewed a Chinese software engineer who was reportedly potentially involved in the development of AI tools to judge the emotions of Uyghurs under duress.

Meanwhile, the Middlebury Institute of International Studies’ Center on Terrorism and Extremism claims in a publication authored by Kris McGuffie and Alex Newhouse that GPT-3 would have the capabilities to very reliably generate “informative and influential” texts that could drive people toward radical ideologies. Two potential use cases that AI researchers will have to consider to avoid controversy…which Open AI seems to have started doing.

Translated from Focus sur PanGu-Alpha, le modèle de langage élaboré avec 25 milliards de paramètres de plus que GPT-3