This site is under construction. More coming soon.

Most "AI Agents"
Aren't

And that's fine. But knowing which one you're building changes how you should build it.

Draft AP · May 11, 2026

For the last two weeks, people at my company have been building AI agents. Or at least, that's what they've been calling them. People across sales, engineering, data, finance, and operations, all given the same tools and the same mandate: build something useful with AI.

The vast majority of them built automations and called them agents. Not because they were confused, but because the vocabulary doesn't exist yet to help them see the difference. And the difference matters. The two things have different failure modes, different costs, and different ceilings.

What People Actually Built

The same pattern showed up over and over. A trigger (daily cron, a Slack mention, a manual invocation) kicks off a sequence: pull data from predetermined sources, run it through an LLM for synthesis or classification, post the output to a predetermined destination. A daily sales briefing that scans call transcripts and emails a summary. A doc updater that polls merged PRs three times a day. A contract review tool that classifies redlines and generates talk tracks.

Useful stuff. Some of them save 5-10 hours a week. But they're automations with an AI step, not agents. Remove the trigger and they stop existing. They don't decide what to do next. They do what they were told to do, in the order they were told to do it, and the LLM makes one of those steps smarter than a regex or a rule engine would have.

A handful were different. A fleet manager that routes requests across specialist agents and makes its own decisions about access control. An engineering assistant that takes a ticket queue at 4:30 PM and, by morning, has investigated each issue, chosen an approach, written code, and opened draft PRs with passing CI. An onboarding coach that stays with a new hire for 90 days, adjusting what it teaches based on where the person struggles.

These aren't fancier or more complex. They hold a goal. If you removed their triggers, they'd still know what they were supposed to be doing.

Triggers vs. Inception Points

The clearest way I can draw the line: an automation has a trigger, an agent has an inception point.

A trigger is an external event that activates a dormant system. A cron tick, a webhook, a new row in a table. The system runs its steps and goes back to sleep. Trigger, execute, done.

An inception point is closer to hiring someone. You give them context about what they're responsible for, and they're on. They might watch for events, but that's a choice they're making in service of their goal, not a trigger that defines their existence.

Nobody sends you a webhook every morning that says "be a PM now." You wake up already holding your responsibilities. You decide what to pay attention to, what to ignore, when to act, when to wait. What counts as a meaningful signal is your call.

The design decision that matters
Who controls the path?
or

The industry calls this a spectrum. I think that's descriptively true and practically useless. It hides the fork. At some point in your design, you make this choice, and everything downstream follows from it.

Test what you're building

Answer three questions about the system you're working on.

1
Does it hold a goal that persists beyond any single execution?
An automation runs and finishes. An agent is on. It has a responsibility, not just a task.
2
Does it decide its own path?
Not "does it use an LLM" but does it choose which tools to call, which sources to check, which actions to take?
3
Would it survive a trigger swap?
If you replaced the cron schedule with a Slack mention, would the system still understand its purpose? Or would it break because the trigger is the system?

Most "AI agents" in production today fail all three. And that's fine. A well-built automation that saves five hours a week is worth more than a conceptually pure agent that breaks in production. But knowing which one you're building changes how you should build it.

01
Keep the LLM step contained

Define exactly what goes in and what comes out. Use structured output (JSON, typed schemas) so the rest of your workflow doesn't have to parse free text. The fewer decisions the LLM makes, the more predictable your system is.

02
Make everything around the LLM deterministic

The data pull, the routing, the output destination, the error handling: all regular code. If your LLM step fails or returns garbage, the system should know what to do without asking the LLM for help.

03
Monitor it like any other integration

Track latency, failure rate, and output shape. You don't need agent-grade observability. You need the same monitoring you'd put on a third-party API call, because that's what it is.

04
Use evals, but keep them simple

Pick 20-30 representative inputs, run them through the LLM step, and check whether the outputs are acceptable. Do this before every prompt change or model upgrade. A spreadsheet works.

05
Resist the urge to add more AI

When the automation doesn't handle an edge case, the instinct is to add another LLM step. Usually the better answer is a conditional branch or a validation rule. Every LLM step you add multiplies your variance.

06
Know when you're outgrowing automation

If you keep adding LLM steps to handle cases the workflow can't predict, or if you want the system to "figure out" what to do, you're building toward an agent. That's a different architecture, not a bigger automation. Recognize the fork.

01
Start with the goal, not the tools

Before you connect a single API, write down what the agent is responsible for and what "done well" looks like. The engineering ticket assistant's goal isn't "use GitHub and Jira." It's "turn this ticket queue into draft PRs that a human can review by morning."

02
Define what the agent should refuse

Boundaries matter more than capabilities. What data should it not access? What actions need human approval? What's out of scope? The fleet manager builder said the hardest problem wasn't routing, it was access-permission design. Start there.

03
Graduate trust deliberately

Start with the agent proposing actions and a human approving every one. Once you've seen enough good proposals, let it execute certain categories autonomously and escalate others. Don't skip this step because it feels slow.

04
Instrument for failure from day one

Context rot, tool failures, eval blindness, observability gaps, nondeterminism. You will hit all five. Build the instrumentation before you need it, not after users tell you something is broken.

05
Design your UX around inconsistency

The agent will give different answers to the same question on different days. That's not a bug you can fix. Show confidence levels. Let users see the reasoning. Make it easy to ask "why did you do that?"

06
Treat the agent's judgment as a first draft

The best agent systems pair AI judgment with human review. The contract review agent doesn't approve deals. It proposes a classification and a talk track, and a person decides. As the agent proves itself, you widen the aperture. But you never stop checking.

Why This Isn't Just Semantics

A fair question at this point: if the job gets done, who cares what you call it? If the daily briefing saves five hours a week, does it matter whether it's technically an "agent" or an "automation"?

I don't think this is just vocabulary polishing. Two real things are at stake.

The first is that people should be able to speak clearly about what they're building. When everyone calls everything an agent, the word loses meaning, and it gets harder to have useful conversations about architecture, production readiness, or what to build next. If you can't name the thing you're building, you can't reason about its tradeoffs.

The second is more practical and it's the one I keep seeing: people are applying AI to steps in their process that are objective, structured, and deterministic, and then spending hours troubleshooting the unpredictability they introduced.

Someone told me the other day that they asked their agent to only message them on Mondays and Fridays. It messaged them on a Wednesday. That's not an agent problem. That's a scheduling rule. "Send on Monday and Friday" is a deterministic constraint, and it should be handled by deterministic code, not by an LLM that's interpreting the instruction from a system prompt each time it runs. A cron job will never message you on Wednesday. An LLM might, because it's probabilistic, and "only on Mondays and Fridays" is just another string in its context window.

This pattern comes up constantly. People embed objective constraints inside agent prompts, and when the agent violates them, they try to fix it with more prompt engineering. But the fix isn't a better prompt. The fix is moving that constraint out of the LLM and into code. Let the AI handle the parts that actually need interpretation. Let code handle the parts that don't.

That's the practical reason this distinction matters. It's not about naming things correctly for the sake of it. It's about putting AI where it helps and keeping it out of places where a simple conditional would be more reliable.