Google's Gemini 1.5 Pro can now listen

Google’s update to the Gemini 1.5 Pro adds ears to the model. The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to reference a written transcript.

During its own Google Next event, Google also announced they will make available Gemini 1.5 Pro for the public to create the Vertex AI (AI Application). Gemini 1.5 Pro was first announced in February month.

This new version of Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, already outperforms the biggest and most powerful model, Gemini Ultra, in performance. Gemini 1.5 Pro can understand complex instructions and eliminates the need to fine-tune models, Google claims.

Gemini 1.5 Pro is not available to people without access to Vertex AI and AI Studios. Right now, most people encounter Gemini language models via the Gemini chatbot. Gemini Ultra provides power to the Gemini advanced chatbot and no doubt it is strong and also able to understand the commands, but it is not as fast as Gemini 1.5 pro.

Gemini 1.5 Pro is not the only large AI model from Google receiving an update. Imagen 2, the text-to-image generation model that helps power Gemini’s image-generation abilities, will also add inpainting and outpainting, which let users add or remove elements from images. Google also made its SynthID digital watermarking feature available on all photographs created through Imagen models. SynthID adds an invisible viewer watermark on images that mark their provenance when viewed via a detection tool.

Many of Imagen’s new features, notably inpainting, and outpainting, have been part of other text-to-image models such as Stability AI’s Stable Cascade and Getty’s Generative AI by iStock, not to mention wider consumer availability on new Samsung Galaxy phones. Has gone.

Google says it’s also publicly previewing how to ground its AI responses with Google Search so they answer with the latest information. That’s not always a given with the responses produced by large language models, sometimes deliberately; Google has intentionally kept Gemini from answering questions related to the 2024 US election.

Gemini was also recently criticized for generating photos with historically incorrect people.


