You can now prompt ChatGPT with pictures and voice commands

Date:

Share:

[ad_1]

Most of OpenAI’s changes to ChatGPT involve what the AI-powered bot can do: questions it can answer, information it can access, improved underlying models. This time, though, it’s tweaking the way you use ChatGPT itself. The company is rolling out a new version of the service that allows you to prompt the AI bot not just by typing sentences into a text box, but by either speaking aloud or just uploading a picture. The new features are rolling out to those who pay for ChatGPT in the next two weeks, and everyone else will get it “soon after” according to OpenAI.

The voice chat part is pretty familiar: you tap a button and speak your question, ChatGPT converts it to text and feeds it to the large language model, gets an answer back, converts that back to speech, and speaks the answer out loud. It should feel just like talking to Alexa or Google Assistant, only — OpenAI hopes — the answers will be better thanks to the improved underlying tech. Most virtual assistants are being rebuilt to rely on LLMs, it appears; OpenAI’s just ahead of the game.

OpenAI’s excellent Whisper model does a lot of the speech-to-text work, and the company is rolling out a new text-to-speech model it says can generate “human-like audio from just text and a few seconds of sample speech.” You’ll be able to choose ChatGPT’s voice from five options, but OpenAI seems to think the model has vastly more potential than that. OpenAI is working with Spotify to translate podcasts into other languages, for instance, all while keeping the sound of the podcaster’s voice. There are lots of interesting uses for synthetic voices, and OpenAI could be a big part of that industry.

But the fact that you can build a capable synthetic voice with just a few seconds of audio also opens the door for all kinds of problematic use cases. “These capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” the company says in a blog post announcing the new features. The model isn’t available for broad use for precisely that reason, OpenAI says: it’s going to be much more controlled and restrained to specific use cases and partnerships.

The image search, meanwhile, is a bit like Google Lens. You snap a photo of whatever you’re interested in, and ChatGPT will try to suss out what you’re asking about and respond accordingly. You can also use the app’s drawing tool to help make your query clear, or speak or type questions to go along with the image. This is where ChatGPT’s back-and-forth nature is helpful: rather than doing a search, getting the wrong answer, and then doing another search, you can prompt the bot and refine the answer as you go. (This is a lot like what Google is doing with multimodal search, too.)

Obviously, image search has its potential issues too. One is what could happen when you prompt a chatbot about a person: OpenAI says it has deliberately limited ChatGPT’s “ability to analyze and make direct statements about people” both for accuracy and privacy reasons. That means one of the most sci-fi visions for AI — the ability to look at someone and say, “who is that?” — isn’t coming anytime soon. Which is probably a good thing.

Almost a year after ChatGPT’s initial launch, OpenAI seems to be still trying to figure out how to give its bot more features and capability without creating new sets of problems and downsides. With these releases, the company’s attempted to walk that line by deliberately capping what its new models can do. But that approach won’t work forever. As more people use voice control and image search, and as ChatGPT inches closer to being a truly multi-modal, useful virtual assistant, it’ll get harder and harder to keep the guardrails on.

[ad_2]

Source link

Subscribe to our magazine

━ more like this

Fire Watch Guard Duties: What They Actually Do When Safety Is on the Line

If your fire alarm system goes down in a commercial building, you don’t get to wait and see what happens. In most U.S. cities,...

Sports Betting Reddit Trends: What Smart Bettors Are Doing Differently

Introduction Over the past few years, Reddit has become one of the most active platforms for bettors looking to improve their strategies. What started as...

The Rise of Specialist Executive Recruitment Firms in the UK

Finding the right senior leader has never been easy. But in today’s fast-moving UK business environment, it has become even harder. Companies face rapid digital...

Why Non-Executive Directors Are Essential for Strong Governance and Business Growth

Did you know that companies with effective non-executive directors (NEDs) can outperform their competitors by up to 20%? This remarkable statistic underscores the vital...

What Canadian Bettors Look for in a Great Sports Betting Experience

What Canadian Bettors Look for in a Great Sports Betting Experience Sports betting has grown quickly across Canada. From casual fans placing weekend wagers to...
Situs Toto
situs toto situs togel toto bo togel situs toto situs toto