Premise Ventures
What We Look For in Agentic Startups
A framework for evaluating AI agent companies at pre-seed and seed. Written for founders, but honest enough to be useful to anyone trying to understand what separates real agent products from everything else.
Is this an agent, or is it a wrapper?
The most important question we ask when evaluating an agentic startup is also the simplest: is this product actually doing the work, or is it just organizing the work for a human to do? The answer determines almost everything else about how we evaluate the company.
A wrapper is a product that makes an existing capability easier to access. It might have a great interface, smart defaults, and a useful workflow. But the human is still in the loop for every meaningful decision. The software is a tool. A real agent product is different. It takes a goal, breaks it into steps, executes those steps, handles failures, and delivers an outcome. The human sets the destination. The agent drives.
This distinction matters because wrappers and agents have fundamentally different business models, defensibility profiles, and failure modes. Wrappers compete on UX and distribution. Agents compete on reliability and trust. Wrappers can be replicated quickly by anyone with access to the same underlying model. Agents accumulate proprietary data, learned preferences, and institutional memory that compounds over time.
We are not opposed to wrappers. Some of the most successful software companies in history were wrappers around infrastructure that someone else built. But we are specifically looking for companies where the agent is the product, not the interface. If the core value proposition requires a human to complete the task, we are probably not the right investor.
What we look for technically at pre-seed and seed.
At pre-seed, we are not evaluating a product. We are evaluating a technical founder's understanding of the problem space. The signal we are looking for is depth of insight, not breadth of features. A founder who can explain exactly why the current generation of models fails at their specific task, and what architectural choices they are making to compensate, is more interesting to us than a founder with a polished demo that hides the hard parts.
The technical questions we ask at pre-seed are: How does your agent handle ambiguous instructions? What happens when a tool call fails mid-task? How do you prevent the agent from taking irreversible actions it should not take? How do you know when the agent is confident versus guessing? Founders who have thought carefully about these questions have usually built something real. Founders who have not have usually built a demo.
At seed, the bar shifts from insight to evidence. We want to see that the agent is actually completing tasks in production, not just in controlled demos. The technical signals that matter most at seed are: task completion rate in real user sessions, error recovery behavior when things go wrong, latency and cost per completed task, and the ratio of human interventions to autonomous completions over time. If that last number is trending toward zero, you are building something real.
One thing we pay close attention to that many founders undervalue: the quality of the agent's uncertainty communication. An agent that says "I'm not sure, here's what I know and what I don't" is more trustworthy than an agent that confidently produces wrong answers. Technical founders who have built explicit uncertainty handling into their architecture are usually thinking about the right things.
How we evaluate the agent-human interaction model.
The interaction model is where most agentic startups make their most consequential early design decisions, and where we see the most variation in quality. The question is not just "how does the human talk to the agent?" It is "how much does the human need to be involved, and at what points?"
The best agentic products we have seen share a common property: they are designed around a specific trust calibration. They know exactly which decisions users want to make themselves and which decisions users want the agent to make. They do not try to automate everything. They automate the right things, and they surface the right things for human review. This calibration is not a product decision. It is a deep understanding of the user's workflow, risk tolerance, and definition of a good outcome.
Arcade.dev, one of our portfolio companies, is a good example of this. They are building authorization infrastructure for AI agents, which is essentially a framework for specifying exactly what an agent is allowed to do on behalf of a user. The insight is that the interaction model is not just about UX. It is about trust boundaries. Who can the agent act as? What can it do? What requires explicit approval? These are not UI questions. They are architectural questions, and the founders who treat them as such are the ones building durable companies.
We are skeptical of interaction models that require constant human supervision. If the user has to watch every step, the agent is not saving time. It is just moving the work from doing to watching. The interaction models we find most compelling are the ones where the human sets a goal, the agent executes, and the human reviews an outcome rather than a process. The agent earns more autonomy over time as it demonstrates reliability.
What "power user" means in the context of AI agents.
We use the term "power users" deliberately, and it is worth explaining what we mean. We do not mean users who use the product a lot. We mean users who have a deep enough understanding of their own workflow that they can specify what they want precisely enough for an agent to execute it reliably. Power users are not necessarily technical. They are domain experts who know their work well enough to delegate it.
This matters because the hardest part of building agentic software is not the agent. It is the specification problem. Getting a user to articulate what they want precisely enough for an agent to do it without constant correction requires either a very sophisticated natural language interface or a user who is already good at specifying tasks. Power users are the latter. They are the early adopters who make agentic products work before the interfaces are good enough to work for everyone.
Gobi, our agentic map company, is a good example of this dynamic. Their early users are not casual map users. They are people who have a specific, complex spatial reasoning task to accomplish and enough domain knowledge to specify it clearly. The agent can execute because the user can specify. As the interface improves, the user base expands. But the product is built on a foundation of power users who proved the core value proposition.
When we evaluate an agentic startup, we want to understand who the power user is. Not the eventual mainstream user. The person who will use this product in the first six months and tell their colleagues it changed how they work. If the founder cannot describe that person in specific, concrete terms, they usually have not found product-market fit yet.
How to think about defensibility in the agent layer.
The most common objection to agentic startups is that they are not defensible. The underlying models are commoditizing. The interfaces are easy to replicate. A larger company could build the same thing with more resources. We hear this concern from other investors regularly, and we think it misunderstands where the value in agentic software actually accumulates.
Defensibility in the agent layer comes from three sources, and the best agentic companies are building all three simultaneously. The first is proprietary data. Every task an agent completes generates signal about what works, what does not, what the user actually wanted versus what they said they wanted, and how to handle edge cases. This data is not available to competitors. It compounds over time. A company with two years of production agent data has a meaningful advantage over a company starting from scratch, even if the underlying models are identical.
The second source of defensibility is workflow integration. Agents that are deeply embedded in a user's existing tools and processes are hard to displace. Not because switching is technically difficult, but because the agent has learned the user's preferences, integrated with their existing systems, and accumulated context that would be lost in a switch. This is the same dynamic that makes enterprise software sticky, applied to agents.
The third source is trust. This is the hardest to quantify and the most durable. Users who have learned to trust an agent, who have seen it handle failures gracefully and recover from mistakes, who have built their workflow around its capabilities, do not switch lightly. Trust is not a feature. It is a relationship. And relationships take time to build and are expensive to replicate. The agentic companies we are most excited about are the ones that understand they are in the trust business, not the software business.
What a strong technical demo looks like at the earliest stages.
We have seen hundreds of agentic demos. The ones that stand out share a few properties that are worth describing explicitly, because most founders get this wrong.
The best demos show the agent doing something that would take a human a meaningful amount of time to do manually. Not ten seconds. Not a task that is already automated by existing tools. A real task that a real user would actually delegate if they trusted the agent to do it. The demo should make the investor feel the time savings viscerally, not just understand them intellectually.
The best demos also show failure. Not catastrophic failure. Graceful failure. The agent encounters an ambiguity, flags it, asks for clarification, and continues. The agent makes a mistake, detects it, corrects it, and explains what happened. This is counterintuitive. Most founders want to show the happy path. But investors who back agentic software know that the happy path is not the product. The failure handling is the product. Show us how your agent behaves when things go wrong, and we will tell you whether we trust it.
The demos that lose us are the ones that are clearly scripted to avoid edge cases. We have learned to probe. We will ask the founder to try a slightly different input, or to interrupt the agent mid-task, or to give it an ambiguous instruction. If the demo falls apart under light pressure, the product is not ready. If it handles the pressure gracefully, we are interested.
One more thing: the best demos are run by the technical founder, not the sales founder. We want to see the person who built the agent use it. We want to see them explain what is happening under the hood as it runs. We want to see them comfortable with the uncertainty of a live demo. That comfort, or lack of it, tells us more about the product than any slide deck.
Specifically what Premise looks for.
We invest $500K to $3M at pre-seed and seed in technical founders building AI-native tools and agentic software for power users. That sentence contains most of our criteria, but it is worth unpacking.
Technical founders means founders who built the thing. Not founders who hired someone to build it. Not founders who are planning to hire a technical co-founder. Founders who can explain every architectural decision in the product and who will be writing code for at least the next two years. This is not a bias against non-technical founders in general. It is a specific requirement for the kind of agentic software we back, where the technical decisions made in the first year determine whether the product is defensible in year five.
AI-native means the product could not exist without the current generation of AI capabilities. Not "we added AI to an existing workflow." Not "we use AI to make our product better." The AI is the product. The agent is the core value proposition. If you removed the AI, there would be nothing left.
Power users means you know who your early adopter is and why they are the right person to prove the concept. It means you have talked to them, built for them, and have evidence that they find the product genuinely useful. Not "interesting." Not "promising." Useful enough to change how they work.
If that describes what you are building, we want to hear from you. Send a brief note to [email protected] explaining what you are building, who you are building it for, and what you have learned so far. We read everything and respond to what resonates.
Building an agentic startup?
We invest $500K–$3M at pre-seed and seed in technical founders building AI-native tools and agentic software for power users.
[email protected]