Pioneering: The New Multi-modal Gemini AI Model Launches Google

Alphabet unveiled the first iteration of their next-generation AI model, Gemini, on December 6. Google DeepMind and CEO Sundar Pichai oversaw and guided Gemini.

When it comes to MMLU (Massive Multitask Language Understanding), one of the most widely used techniques for evaluating language model performance, Gemini is the first model to beat human experts. Gemini can produce text and graphics together, reason visually across languages, and generate code based on various inputs.

The CEO of Google, Sundar Pichai, claims that Gemini performs better than OpenAI’s ChatGPT. He emphasized Gemini’s ability on a series of assessments gauging AI’s performance on a range of text- and image-based activities.

In addition to being multimodal, Gemini is made to be scalable and efficient. Its architecture makes it possible to quickly integrate it with current tools and APIs, which makes it a potent engine for further AI developments in the future. This open-source methodology encourages cooperation and advancement within the AI community, quickening the rate of advancement and guaranteeing that Gemini’s entire potential is reached.

Gemini was initially available in three versions: Ultra, the largest; Pro, a medium-sized variant; and Nano, a much smaller and more efficient version. Gemini Pro will power Google’s Bard, a chatbot that functions similarly to ChatGPT. The Pixel 8 Pro smartphone from Google will power the Nano.

Social media users’ reactions have been conflicted; some have noted remarkable outcomes, while others have mentioned persistent hallucinations. Although Gemini is obviously a very advanced AI system, Melanie Mitchell, an artificial intelligence researcher at the Santa Fe Institute in New Mexico, stated, “It’s not obvious to me that Gemini is actually substantially more capable than GPT-4.”

LaMDA and PaLM 2 have been replaced by Google DeepMind’s Gemini family of multimodal big language models. The name of the model alludes to NASA’s Project Gemini. The model is made up of Transformers that are decoder-only and has been modified to enable effective training and inference on TPUs. Images may be submitted in varying resolutions, whilst video is input as an image sequence. The Universal Speech Model samples audio at 16 kHz and then transforms it into a series of tokens.

Its team created model effect assessments prior to the release of Gemini in order to pinpoint, evaluate, and record the main social advantages and possible drawbacks connected to the creation of the sophisticated Gemini models. A set of “model policies” were created to direct the creation and assessment of the models based on knowledge about known and expected impacts. A thorough suite of tests was conducted to compare the Gemini models against policy areas and other important risk areas found in the impact assessments.

To reduce model safety concerns, mitigations were also applied at the data layer of the model and through instruction tuning. Methods of attribution, closed-book response creation, and hedging were applied to minimize hallucinations. Google announced that it will provide the US federal government with the Gemini Ultra test results in compliance with President Joe Biden’s October Executive Order 14110.

Google has released a technical study that developers can peruse to find out more about Gemini.

Pooja: