Skip to content
Mathieu Mafille
Go back

Quoven — Building the Evidence Layer for Product Decisions

9 min read

At the end of my last article, I said I was finally ready to talk about Quoven — a project I’ve been building on the side for months. It’s now live at quoven.io, so this is the build-in-public breakdown: what it does, why I built it, and the technical decisions that make it more than a wrapper around an LLM.

Quoven — the evidence layer for product decisions AI-generated image via Google Nano Banana 2

Table of contents

Open Table of contents

The problem: product decisions made on hunches

Every product decision starts the same way. Should we build PDF export? Is there real demand for a privacy-focused Slack alternative? Should we pivot toward enterprise? And almost every time, the answer comes from the same places: a few loud customers, a gut feeling, and whatever happened to surface in last week’s standup.

The evidence is out there — people complain about your competitors on Reddit, rave or rant on G2, ask for exactly your feature on Hacker News, describe their workflow in their own words on X. But gathering it is a slog. You’d need to hire a researcher or spend two days yourself reading threads and tabulating what you find. So most teams don’t, and the decision ships on vibes.

Quoven is my answer to that. Its tagline says it plainly: the evidence layer for product decisions.

What Quoven does

You write a brief — a freeform sentence or two describing the decision you’re weighing. You optionally drop in some reference links. You hit submit. A few minutes later you get back a structured, fully-sourced dossier:

It handles four kinds of brief: validating a new idea, prioritizing a feature, analyzing a competitor, or pressure-testing a pivot. The output isn’t a wall of text — it’s a dossier you can act on, with every claim traceable to a real source on the public web.

The core idea: a research pipeline, not a chatbot wrapper

This is the part I care most about, because it’s the difference between something useful and “I asked ChatGPT and pasted the answer.”

Quoven runs a multi-stage pipeline, not a single prompt. When a brief comes in, it goes through four stages:

1. Route

A first model call classifies the brief: which of the four use cases is this, what angles matter, and — crucially — which metrics to score. Each use case has a catalogue of possible metrics (the “idea” path alone has thirteen), and the router picks the four to six that actually fit your question. That keeps every later prompt focused instead of scoring fifteen irrelevant dimensions.

2. Research (fan-out)

This is where the depth comes from. Rather than one big “go research this” call, Quoven fans out into parallel probes, each scoped to a research question and a platform axis — Reddit, Hacker News, G2, Capterra, the App Stores, X, the open web. The model has live web search tools, so each probe goes and reads the actual current internet. On the basic tier that’s four parallel probes; on the advanced tier it’s up to twenty axis-scoped probes plus aggregators.

3. Deduplicate

Probes overlap — the same Reddit thread shows up in three of them. So before synthesis, Quoven normalizes every cited URL (lowercase the host, strip www, drop tracking params and hashes) and merges duplicates into one global source list, keeping the best title and excerpt it saw. On the advanced tier this runs twice — within each aggregator, then across them — so the final stage always sees one flat, clean list of sources.

4. Synthesize

A final call compiles everything into the structured report: verdict, confidence, the selected metrics, and sections where each claim points back to specific source IDs. Those IDs become the [1], [2] footnote-style citations you see in the dossier.

The whole thing is orchestrated as an async job, so the web app stays snappy and the heavy work happens in the background with retries and observability.

The stack, and why each piece

I’m a fan of honest build-in-public, so here’s the actual stack — no “a leading cloud provider” hand-waving:

LayerChoiceWhy
Web appNext.js 16 + Tailwind v4 + shadcn/uiFull-stack React, App Router, fast to build a polished UI
OrchestrationTrigger.dev v4Built-in retries, fan-out, realtime status streaming, and a run timeline I can actually debug
AIGrok (xAI) via the Vercel AI SDKNative live web search tools — essential when the whole product is “go read the current internet”
DatabaseNeon PostgresServerless Postgres over HTTP, source of truth for everything
ORMDrizzleType-safe schema, migrations, and shared types
AuthClerkAuth I don’t want to build myself
BillingPolarNative usage-based billing and cost insights, not just flat subscriptions
EmailResendReact Email templates + an automation builder I can mirror in code
AnalyticsPostHogProduct events and error tracking, configured cookieless

It all lives in a single pnpm monorepo — the web app, the worker, and shared packages for the database schema, email templates, and Zod schemas. Shared types across the boundary mean the brief I validate in the UI is the exact shape the worker consumes. No drift.

A few decisions I’m proud of

Cost ceilings, enforced mid-pipeline

LLM bills are how indie SaaS dies. Every run has a hard cost ceiling — $1 on basic, $5 on advanced — checked partway through the pipeline. And the check runs in strict mode: if it can’t read the spend so far, it aborts the run rather than risk a $50 runaway. I’d rather drop one legitimate analysis than wake up to a surprise invoice. Every model call also writes a usage row (model, tokens, tool calls, cost in USD), so I can see exactly where the money goes.

Citations that are real

The product promise is verifiable evidence, so the citation plumbing matters. Inside the prompts, sources use short IDs like s1, s2 so the model can cite concisely. At persist time, each unique source becomes a database row with a real UUID, and the report’s claims are rewritten to point at those UUIDs. The frontend joins claims against sources to render the numbered citations. The original short ID is kept in metadata for traceability. The upshot: every claim in a dossier links to a source you can open and check.

Degraded mode instead of silent failure

If one of the four probes fails, Quoven doesn’t pretend everything’s fine and it doesn’t throw the whole run away. It proceeds with what it has, but caps confidence at 60% and says so in the bottom line. Honest partial results beat a confident lie. And if a run does fail after retries, credits are refunded idempotently — you never pay for a dossier you didn’t get.

Analytics that are cookieless until you opt in

PostHog starts in memory-only persistence — no cookies, no localStorage — so anonymous analytics work out of the box without asking anyone for anything. Only if a visitor accepts the consent banner does it switch to localStorage+cookie and turn on (masked) session replay; decline, and it stays cookieless with replay off. Everything is reverse-proxied through a first-party path so ad blockers don’t eat the events, autocapture is limited to interactive elements, and sensitive containers — briefs, dossiers, form fields — are masked out. Auth still needs its strictly-necessary cookies via Clerk, but the optional analytics layer is genuinely consent-gated rather than on-by-default.

No transactions, careful SQL

Neon’s HTTP driver doesn’t support multi-statement transactions, which sounds like a limitation until it forces discipline. Anything that touches a credit balance — a debit, a refund — is a single Postgres CTE statement that updates the balance and appends to an audit ledger atomically. Check constraints (credits_balance >= 0) guard against logic bugs. The ledger is append-only, so I can always reconstruct how an account got where it is.

What’s actually hard

The model isn’t the hard part anymore — frontier models with web search are genuinely good at reading the internet. The hard part is everything around it: keeping costs bounded, making failures graceful, deduplicating messy real-world sources, and earning trust by making every claim checkable. A demo that works once is easy. A pipeline that behaves on the brief that confuses the router, the probe that times out, and the source that’s behind a paywall — that’s the work.

The other lesson: structure beats scale. A focused four-probe pipeline with good routing and clean sourcing produces a more trustworthy dossier than one giant prompt ever could. The moat isn’t the model. It’s the methodology around it.

Try it

Quoven is live at quoven.io. If you’re a founder validating an idea, a PM prioritizing a roadmap, or anyone about to make a product call you’d rather not make on a hunch — point it at your decision and see what the evidence says. I’d genuinely love feedback from this audience.

Wrapping up

Quoven started as a personal itch: I was tired of making product calls on vibes. It turned into a real exercise in building a production AI product where the AI is the easy part and the engineering around it — cost, failure modes, sourcing, trust — is the actual product. Building it in the open has been the most fun I’ve had with a side project in a while.

Thanks for reading! If you have questions about Quoven or how it’s built, reach out via email or LinkedIn.

Next up, I’m writing about Claude Fable 5 (codename Mythos) — Anthropic’s newest model, what’s genuinely new about it, and how it changes the way I build things like Quoven. If you’ve been wondering whether the latest frontier jump actually matters in practice, you’ll want that one. See you in two weeks!


Share this post on:

Next Post
Claude Design — The First AI UI Tool That Doesn't Feel Like AI