Openai gave us a week to test new AI agents, operators and systems that can perform tasks independently on the Internet.
Operators are the closest I’ve seen to the vision of AI agents’ high-tech industry. This is a system that can automate the boring parts of life and frees us to do what we really love. But judging from my experience with Openai agents, a truly “autonomous” AI system is still out of reach.
Openai trained a new model to power operators that combine visual understanding of the GPT-4o with O1’s inference capabilities.
The model appears to be suitable for basic tasks. I watched the operator click the button, navigate the website’s menu and fill out the form. AI succeeds in taking action independently and runs much faster than the web-based agents seen on humanity and Google.
However, during my trial I realized that I would support Openai’s agents more than I wanted. I felt like I was coaching the operators through each issue, but I wanted to push certain tasks completely out of the plate.
Frequently during the test, I had to answer some questions, give permission, fill out my personal information, and help the agent when they were stuck.
From a car perspective, the operator is like driving a car with cruise control. Sometimes you’ll have to remove your feet from the pedals and drive the car itself, but that’s far from a full-fledged autopilot.
In fact, Openai says that frequent operator pauses are due to design.
AI power operators do not function independently and independently for long periods of time, like AI power chatbots like Openai’s ChatGpt, and are prone to hallucinations of the same kind. Therefore, Openai does not want to give its systems decision-making power or sensitive user information. It may be a safe choice by Openai, but it reduces operator practicality.
That said, Openai’s first agent is an impressive proof of concept and interface for AI, which can use the frontend of any website. However, to create truly independent AI systems, tech companies need to build more reliable AI models that do not require this much steering.
A little “practical”
My operator trial was consistent with the week I was moving through the apartment, so there was help from running logistics to the Openai agents.
I asked the operator to help me buy a new parking permit. The Openai agent told me “certainly,” then opened a window in my browser on my PC screen.
The operator then searched the San Francisco parking permit in his browser and took him to the correct city website, and even the right pages.
The operator can use the rest of the computer while it is functioning. This is something Google Project Mariner cannot say. This is because Openai’s agents aren’t actually working on a computer, they’re off in the cloud somewhere.

For parking permission I had to give the operator permission to start various processes multiple times. I’ve also stopped asking them to fill out my personal information in the form, including my name, phone number, and email address. Sometimes the operators were lost, too, controlling their browsers and forced their agents to get back on track.
On another test, I asked the operator to make a reservation at a Greek restaurant. For that credibility, the operator found a good place in my area at an affordable price. But I had to answer over a half dozen questions throughout the flow.

At what point is it easy to do it yourself if you need to intervene more than six times just to book an appointment via an AI agent? That’s a question I often asked myself while I was testing the operator.
Agent As-a-platform
In some tests I came across a website that blocked the operator for some reason. For example, I tried to book an electrician using TaskRabbit, but the Openai agent told me I had encountered an error and asked if I could use an alternative service instead. Expedia, Reddit and YouTube have blocked AI agents from accessing the platform.
However, other services accept operators with arms crossed. Instacart, Uber and eBay have worked with Openai to launch operators, allowing agents to navigate the website on behalf of humans.
These businesses are preparing for a future where a subset of user interaction is driven by AI agents.
“Customers use Instacart through various entry points,” said Daniel Danker, Chief Product Officer of Instacart, in an interview with TechCrunch. “Operators are potentially seen as one of these entry points.”
Having Openai agents use Instacart’s website on behalf of someone seems to separate Instacart from their customers. But Dunker says Instacart wants to meet customers wherever they are.
“We are really bullish about our belief that, like Openai, agent systems have a big impact on how consumers interact with digital properties,” he said in an interview with TechCrunch on eBay. said Nitzan Mekel-Bobrov, Chief AI Director.
Even if AI agents become more popular, Mekel-Bobrov expects users to come to eBay’s website all the time, saying “online destinations are not going anywhere.”
Trust issues
After the operator hallucinated several times, there were some issues trusting the operator, which cost me almost hundreds of dollars.
For example, I asked my agent to find a parking garage near my new apartment. In the end I ended up suggesting two garages that I said would take a few minutes to walk.

In addition to breaking out of my price range, the garage was actually quite far from my apartment. One was a 20 minute walk and the other was a 30 minute walk. After all, the operator had put the wrong address in it.
This is exactly why Openai doesn’t provide agents with access to credit card numbers, passwords, or emails. If Openai hadn’t let me intervene here, the operator would have wasted hundreds of dollars in a parking lot that I didn’t need.
Hallucinations like this are indeed important obstacles to useful autonomous agents. This allows you to remove troublesome tasks from the plate. You don’t trust your agent if you are prone to making basic mistakes, especially if you are prone to making mistakes with actual results.
With the operator, Openai appears to have built some impressive tools to enable AI systems to browse the web. However, these tools aren’t that many until the underlying AI is able to ensure that it does what it asks for. Until then, humans will be stuck supporting agents, but not the other way around. And such things beat the point.
TechCrunch has a newsletter focused on AI! Sign up here to get it every Wednesday in your inbox.