Meta’s research arm has opened up a number of its internal artificial intelligence (AI) projects to the wider research community in a bid to help it improve its AI models.
Fundamental AI Research (Fair) is the open science research group at the social media company. It comprises between 500 to 600 people across Europe and North America, and is focused on solving core problems in AI.
Fair has recently released several new research artefacts it hopes will enable the research community to innovate, explore and discover new ways to apply AI at scale.
These include Chameleon, which provides a unified architecture for text and image input and output; multi-token prediction for training language models to predict multiple future words at once; and AudioSeal, an audio watermarking technique.
Looking at the role of Fair in the social media giant’s business, Joëlle Pineau, Meta’s vice-president of AI research, says, “We’re not necessarily the team that brings those innovations into product. We’re squarely focused on solving AI.”
Fair shares research publicly along with code bases, data sets, models, training recipes and safety guides. While the group is focused on fundamental innovation, the research is shared internally with Meta’s applied research team that she says takes the model and figures out how to go from a concept like the new Chameleon one and works with Meta’s product teams to make it into a product.
“Over the years, a number of our innovations have made it into products,” says Pineau. “If you’ve seen the Meta glasses – smart glasses – the AI model that it runs came out of our research. The first Llama model came out of our research lab. But as Llama 2 and 3 are product-focused, they are developed by Meta’s generative AI [GenAI] team, which is more of an applied research team.”
New open models
Meta Chameleon uses tokenisation for text and images. According to the company, this enables a more unified approach, and makes the model easier to design, maintain and scale. Application areas include generating creative captions for images or using a mix of text prompts and images to create an entirely new scene.
With Chameleon, Pineau says the model uses text and images to reason about specific properties. “We’ve trained Chameleon up to about 30 billion parameters, which is much smaller than, for example, models like Llama, GPT and so on,” she says. “But we have a proof of concept which works up to a particular size.
“Applied research teams have the ability either to scale it up more or make it work with different types of data, and under different constraints,” says Pineau.
The second piece of research that Meta has now made public is a new approach to tokenisation. Most modern large language models (LLMs) have a simple training objective: predicting the next word. While this approach is simple and scalable, Meta says that it is also inefficient. It requires several orders of magnitude more text than what children need to learn the same degree of language fluency.
Pineau says multi-token prediction was directly inspired by the work on code generation. “There’s an opportunity to generate many tokens eventually in a structured way, not just in a linear way,” she says.
“Whereas classic LLMs just generate one word after the other, and product linearisation of the output tokens, for code, many people don’t write one token at a time. You write the code structure, then you write some of the sub-structures, and then you resolve the details in terms of the structures, and you go back and forth at different levels of abstraction as you’re building up the code.”
This, she adds, is much more complex than the linear approach used in LLMs.
Closed versus open debate
When asked about whether there is a place for closed AI models, Pineau believes an AI model should be made open when there are safeguards in place to ensure it doesn’t cause undue risk.
“In the case of our Chameleon model, we chose after going through a risk analysis not to release the image generation capabilities,” she says. “The model is able to generate images, but we felt the safety measures are not mature enough.”
Another model developed by Fair, which has been published but not released, is one for voice synthesis. “Within a few seconds of a voice recording from an individual, we can essentially generate speech that mimics someone’s voice to the point that it is misleading,” says Pineau.
In this case, she says authentication tools do not exist that can distinguish between the AI-generated voice and the genuine voice recording. However, Meta has been researching authentication, and she says AudioSeal’s watermarking technique was shared with a small cohort of academic researchers for third-party examination of the model. While the audio watermarking techniques are not quite mature enough to give Meta’s researchers the confidence they have sufficient safety to make its voice synthesis model publicly available, AudioSeal has been designed specifically for the localised detection of AI-generated speech.
So, should you give them a go? While the new AI models Meta has released are open, Pineau says there’s a reasonable learning curve to figure out how to get them to work – but people who are using models routinely, such as those available via Hugging Face, should be in a position to get up and running relatively easily.
“We have people who take a model such as Llama and fine-tune it, and within 48 hours there’s a fine-tuned version available that shows up on some of the [AI model] leaderboards,” she says. “It really depends on your level of proficiency.”
Getting started doesn’t require high-end hardware. She says that in some cases, models are made available in different sizes: the smaller models can run on a single graphics processing unit and are easier to get started with. “The bigger models require more knowledge of distributed systems to get the required level of performance,” says Pineau.