Google LLC today released the first model in the Gemini 2.0 artificial intelligence family, an experimental version of Gemini 2.0 Flash designed to become the foundation for generative AI agents and assistants.
Gemini 2.0 Flash builds on the company’s 1.5 Flash, a lightweight workhorse large language model, optimized for speed and efficiency. The company noted that Flash 2.0 outperforms Gemini 1.5 Pro, the company’s largest and most complex AI model, in some key benchmarks, while performing at twice the speed.
The model supports inputs such as images, video and audio, but has been updated to support multimodal outputs such as natively generated images mixed with text and text-to-speech audio. To become a superior model for assistants, Google also enabled it to use external tools such as Google Search, code execution and third-party functions.
Just as Gemini 1.5 Flash was popular with developers, 2.0 Flash is now available as an experimental model via the Gemini application programming interface to early access partners through Google AI Studio and Vertex AI, a Google Cloud platform that allows users to train and deploy models. General availability is planned for January.
Starting today, 2.0 Flash experimental is available via dropdown menu on desktop and mobile web in the Gemini chat assistant for users to test. It will be available in the Gemini mobile app soon. The company said it will be coming to more Google products soon.
Testing prototype agents and assistants
Putting Gemini 2.0 Flash to work, Google’s team said that it has been exploring several new products that will build on its foundation for using new features that will focus on generative AI agents and assistant capabilities.
AI agents are pieces of intelligent software that can work proactively on behalf of human users gather information and use tools to achieve goals. For example, unlike current assistants, which are only conversational, answer questions and summarize information, an AI agent would be able to go out and complete tasks such as shopping or purchasing tickets.
“Gemini 2.0 Flash’s native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences,” Google said about the update.
Google introduced Project Astra as an initiative to develop a universal AI assistant at Google I/O 2024 in May. Astra is capable of natural-sounding speech conversations with users and answering questions about the world.
With the addition of Gemini 2.0, Astra can interact with Google Search to retrieve information, Lens to identify objects and Maps to understand local areas. The team also improved its ability to remember things, allowing it to recall details from conversations such as reminders, where a user wants to go, phone numbers and lock codes. This also enables users to personalize the assistant.
Also thanks to Gemini 2.0, Astra can switch between multiple languages mid-conversation. The same capability also makes it better at understanding accents and uncommon words, which can cause trouble even for many speech-recognition AI models today.
Google said the company is working on bringing testers these AI assistant capabilities to more devices, such as hands-free glasses. The company is also expanding the number of trusted testers who have access to Astra.
Another AI agent prototype that Google is building with Gemini 2.0 Flash is Project Mariner, which will allow the model to surf the web for users. It’s able to take control of the browser and understand information on the screen, including elements such as links, text, code, buttons and forms to navigate web pages.
Currently in testing, it works as a Chrome extension that can complete some tasks for users while keeping the human in the loop. In a demonstration, Google had Mariner go through a Google Sheet of company names and the names of people and prompted the AI model to find their contact emails. The model then took over the browser to go to the websites, find email addresses and finally display the information it found.
Each step of the way, the model displayed its reasoning and the user could watch it in action – even interrupt it if necessary. Since users could prompt the model to go grocery shopping on e-commerce websites or purchase tickets, Google researchers said that it would not finalize purchases without direct human interaction, but could be tasked with going through the motions of finding items and loading up carts.
Jules: an experimental agent for developers
Jules is an experimental AI-powered coding agent that uses Gemini 2.0 and can work on its own to complete tedious work through direct integration with a GitHub codebase based on prompts from a developer.
“It’s very good at bug fixes, small features, things like that, you can almost think of it like a junior engineer and you’re there directing it,” Kathy Korevec, director of product management at Google Labs, told SiliconANGLE in an interview.
Jules exists as a standalone application that takes a GitHub repository and creates its own copy to work on. Once it’s given a “task,” which is what Google calls the prompt from the developer, it generates a plan to produce the bug fixes or code changes and then provides that to the user to see what it intends to do. From there, it begins a multi-step process of fixing and coding to make the appropriate changes.
At any time during the process, the developer can interrupt it, change its plan – to redirect it in action. It might even change its plan if it runs into issues. It may even update code dependencies or modify entire files as it goes. When it’s complete it will wait for the developer to affirm code changes and prepare a pull request so that the changes can be put into a pull request back to GitHub.
“I didn’t become a software engineer because I dream every day about fixing bugs, that wasn’t my ambition,” said Korevec. “I want to build really cool, creative apps. What’s nice about Jules is that I can say ‘Hey, go fix these bugs for me.’”
Certainly, Korevec added, some engineers love fixing bugs, but they don’t want to migrate from one version to another or other similarly tedious tasks. The impetus behind building Jules came from allowing developers to get to the work that they wanted to do and unleashing Jules on the busywork they don’t want to do.
Jules is currently available to a small set of trusted testers and will become available to a larger number of interested developers in early 2025, Google said.
Image: Google
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
Leave a Comment