The Openai operator agent helped me move, but I had to help with that too

Openai gave us a week to test new AI agents, operators and systems that can perform tasks independently on the Internet.

Operators are the closest I’ve seen to the vision of AI agents’ high-tech industry. This is a system that can automate the boring parts of life and frees us to do what we really love. But judging from my experience with Openai agents, a truly “autonomous” AI system is still out of reach.

Openai trained a new model to power operators that combine visual understanding of the GPT-4o with O1’s inference capabilities.

The model appears to be suitable for basic tasks. I watched the operator click the button, navigate the website’s menu and fill out the form. AI succeeds in taking action independently and runs much faster than the web-based agents seen on humanity and Google.

However, during my trial I realized that I would support Openai’s agents more than I wanted. I felt like I was coaching the operators through each issue, but I wanted to push certain tasks completely out of the plate.

Frequently during the test, I had to answer some questions, give permission, fill out my personal information, and help the agent when they were stuck.

From a car perspective, the operator is like driving a car with cruise control. Sometimes you’ll have to remove your feet from the pedals and drive the car itself, but that’s far from a full-fledged autopilot.

In fact, Openai says that frequent operator pauses are due to design.

AI power operators do not function independently and independently for long periods of time, like AI power chatbots like Openai’s ChatGpt, and are prone to hallucinations of the same kind. Therefore, Openai does not want to give its systems decision-making power or sensitive user information. It may be a safe choice by Openai, but it reduces operator practicality.

That said, Openai’s first agent is an impressive proof of concept and interface for AI, which can use the frontend of any website. However, to create truly independent AI systems, tech companies need to build more reliable AI models that do not require this much steering.

A little “practical”

My operator trial was consistent with the week I was moving through the apartment, so there was help from running logistics to the Openai agents.

I asked the operator to help me buy a new parking permit. The Openai agent told me “certainly,” then opened a window in my browser on my PC screen.

The operator then searched the San Francisco parking permit in his browser and took him to the correct city website, and even the right pages.

The operator can use the rest of the computer while it is functioning. This is something Google Project Mariner cannot say. This is because Openai’s agents aren’t actually working on a computer, they’re off in the cloud somewhere.

Operator InterfaceImage credits: Maxwell Zeff and Openai

For parking permission I had to give the operator permission to start various processes multiple times. I’ve also stopped asking them to fill out my personal information in the form, including my name, phone number, and email address. Sometimes the operators were lost, too, controlling their browsers and forced their agents to get back on track.

On another test, I asked the operator to make a reservation at a Greek restaurant. For that credibility, the operator found a good place in my area at an affordable price. But I had to answer over a half dozen questions throughout the flow.

Several steps to make a reservation with an operatorImage credits: Maxwell Zeff and Openai

At what point is it easy to do it yourself if you need to intervene more than six times just to book an appointment via an AI agent? That’s a question I often asked myself while I was testing the operator.

Agent As-a-platform

In some tests I came across a website that blocked the operator for some reason. For example, I tried to book an electrician using TaskRabbit, but the Openai agent told me I had encountered an error and asked if I could use an alternative service instead. Expedia, Reddit and YouTube have blocked AI agents from accessing the platform.

However, other services accept operators with arms crossed. Instacart, Uber and eBay have worked with Openai to launch operators, allowing agents to navigate the website on behalf of humans.

These businesses are preparing for a future where a subset of user interaction is driven by AI agents.

“Customers use Instacart through various entry points,” said Daniel Danker, Chief Product Officer of Instacart, in an interview with TechCrunch. “Operators are potentially seen as one of these entry points.”

Having Openai agents use Instacart’s website on behalf of someone seems to separate Instacart from their customers. But Dunker says Instacart wants to meet customers wherever they are.

“We are really bullish about our belief that, like Openai, agent systems have a big impact on how consumers interact with digital properties,” he said in an interview with TechCrunch on eBay. said Nitzan Mekel-Bobrov, Chief AI Director.

Even if AI agents become more popular, Mekel-Bobrov expects users to come to eBay’s website all the time, saying “online destinations are not going anywhere.”

Trust issues

After the operator hallucinated several times, there were some issues trusting the operator, which cost me almost hundreds of dollars.

For example, I asked my agent to find a parking garage near my new apartment. In the end I ended up suggesting two garages that I said would take a few minutes to walk.

Hallucinations regarding the distance of the parking lotImage credits: Maxwell Zeff and Openai

In addition to breaking out of my price range, the garage was actually quite far from my apartment. One was a 20 minute walk and the other was a 30 minute walk. After all, the operator had put the wrong address in it.

This is exactly why Openai doesn’t provide agents with access to credit card numbers, passwords, or emails. If Openai hadn’t let me intervene here, the operator would have wasted hundreds of dollars in a parking lot that I didn’t need.

Hallucinations like this are indeed important obstacles to useful autonomous agents. This allows you to remove troublesome tasks from the plate. You don’t trust your agent if you are prone to making basic mistakes, especially if you are prone to making mistakes with actual results.

With the operator, Openai appears to have built some impressive tools to enable AI systems to browse the web. However, these tools aren’t that many until the underlying AI is able to ensure that it does what it asks for. Until then, humans will be stuck supporting agents, but not the other way around. And such things beat the point.

TechCrunch has a newsletter focused on AI! Sign up here to get it every Wednesday in your inbox.

Source link

What's Hot

“Easter Stop” in the Russian Ukrainian War Characterized by accusations of Violation | News of the Russian-Ukraine War

Small businesses that have hit tariffs fear being crushed by corporate rivals

Uncovered emails showed how Meta struggled with making Facebook culturally relevant

The Openai operator agent helped me move, but I had to help with that too

Famous AI researchers launch controversial startups to replace all human workers everywhere

Openai’s new inference AI model shows even more hallucinations

ChatGpt refers to users by undeclared names, and some people find them “creepy”

ChatGPT now uses “memory” to personalize web searches

Is the Spack back? | TechCrunch

Openai is reportedly in talks to buy Windsurf for $3 billion, with news forecasts expected later this week

“Easter Stop” in the Russian Ukrainian War Characterized by accusations of Violation | News of the Russian-Ukraine War

Small businesses that have hit tariffs fear being crushed by corporate rivals

Uncovered emails showed how Meta struggled with making Facebook culturally relevant

Hawaii, Israel: How Trump justified his long-standing vision of Israel | Israeli-Palestinian conflict

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

“Easter Stop” in the Russian Ukrainian War Characterized by accusations of Violation | News of the Russian-Ukraine War

Small businesses that have hit tariffs fear being crushed by corporate rivals

Uncovered emails showed how Meta struggled with making Facebook culturally relevant

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

The Openai operator agent helped me move, but I had to help with that too

A little “practical”

Agent As-a-platform

Trust issues

Related Posts