Existing Product Enhancements

What are existing product enhancements?

Existing product enhancements are AI features added into a software product that is already in production — without rebuilding it. The goal is to make the product noticeably better for users and more defensible against AI-native competitors, while preserving the workflows, integrations, and data your customers already depend on.

The mistake we see most often is treating "add AI" as a rewrite. It almost never is. A well-designed AI integration sits next to the existing system, calls into it through a thin internal API, and shows up to users as a feature — not a new product they have to relearn.

Key terms used on this page:

Retrieval-Augmented Generation (RAG): A pattern where an LLM answers using your data, fetched at query time from a vector database or search index, rather than from its training set.
Embedding: A vector representation of text, image, or other content used for semantic search and recommendations.
Eval (evaluation suite): An automated test set that scores AI output quality on representative inputs every time the prompt, model, or retrieval changes.
Fallback path: The non-AI behavior the product reverts to when the model is unavailable, low-confidence, or wrong.
Sidecar service: An AI service deployed alongside the legacy app, exposed through a small internal API, with no changes required to the core codebase.

How does AI feature integration actually work?

We use a sidecar pattern by default. The existing product keeps running; the AI capability lives in a separate, well-bounded service that the product calls when it needs intelligence.

1. Audit and opportunity mapping (1–2 weeks) — We review the codebase, the data model, and the user analytics to find where AI moves a real metric. Output: a ranked list of candidate features with rough effort, expected impact, and risk.

2. Spike and evaluation (2–3 weeks) — We build a thin prototype of the top candidate, evaluate it on real production-like data, and stress-test it against the failure modes that matter (latency, cost, accuracy, compliance). Output: a go/no-go with numbers, not vibes.

3. Production build (4–10 weeks) — We ship the feature behind a flag, with observability, an evaluation suite, a fallback path, and rollout controls. Output: feature in production, instrumented.

4. Validate and expand (ongoing) — We measure the metric we said we'd move, iterate on prompts and retrieval, and identify the next candidate from the original audit. Output: a compounding portfolio of AI features in your product.

A focused single-feature engagement runs 6 to 14 weeks. Multi-feature programs run in 90-day cycles.

How does smart search and retrieval-augmented generation (RAG) work?

Smart search and RAG are the most-requested enhancement, and the most-bungled. The standard pattern:

Indexing: Content (docs, products, listings, knowledge base, support tickets) is chunked, embedded, and stored in a vector database (Pinecone, Turbopuffer, pgvector, Weaviate) — usually alongside a traditional keyword index for hybrid retrieval.
Retrieval: At query time, the user's question is embedded and the top-k most relevant chunks are fetched. A re-ranker (Cohere Rerank, Voyage, or open-source) reorders them.
Generation: The top chunks plus the user query are sent to an LLM with a prompt that instructs it to cite the chunks it uses and refuse to answer if the chunks don't contain the answer.
Evaluation: A held-out set of real questions with expected answers, scored on factual accuracy, citation correctness, and refusal behavior. Run on every prompt or model change.

The two most common failure modes are skipping evaluation (the feature ships, gets praised in demos, and quietly hallucinates in production) and skipping the hybrid retrieval (vector search alone misses exact-match queries that keyword search would catch).

How do you add a natural language interface to an existing product?

Most products that add a chat interface do it badly. The interface is a blank text box that users don't know what to ask, the model has no access to the user's actual data, and the answers are generic. A useful natural language interface requires three things the underlying product already needs to expose:

Tools the model can call. Your existing API endpoints, exposed as tool definitions to the LLM (OpenAI function calling, Anthropic tool use, or an MCP server). The model becomes a router over your real product capabilities.
Context the model can use. The user's identity, current view, recent activity, and relevant data — included in the system prompt or fetched on demand.
Guardrails. Rate limits, scope restrictions (the assistant can only act on this user's data), and a refusal path for out-of-scope requests.

When these three are in place, the natural language interface stops being a chatbot and starts being a faster way to use the product. When they are not, it becomes a support liability.

When should you NOT add AI to an existing product?

We turn down enhancement engagements that fall into any of these patterns:

The product has a deeper problem the AI is being asked to paper over. A confusing UX, a broken core workflow, or a missing data model is not fixed by adding generative features on top.
The data isn't there. AI features are only as good as the content, signals, or history they have access to. Products with sparse data should fix the data layer before adding AI.
The "AI" is a marketing requirement, not a user one. Features added because the board demanded an AI story rarely get used and rarely retain.
The unit economics don't work. A free-tier user generating 50 LLM calls a day at frontier-model prices will erase your margin. We model this before, not after.
The compliance posture isn't ready. In regulated products (finance, legal, HR), AI features need data-handling and audit-logging work that some teams aren't prepared to do.

We'd rather lose the engagement than ship an AI feature that hurts the product.

Should you build, buy, or partner for AI features?

The build / buy / partner decision is sharper for product enhancements than for greenfield AI work, because you're integrating into an existing codebase with an existing user base and an existing margin profile. Honest comparison:

Option	Best for	Speed	Differentiation	Cost (3 yr TCO)	Lock-in
Buy a SaaS AI add-on (Glean, Sana, Hex, Algolia AI)	Internal-facing features (employee search, embedded analytics dashboards) where the AI doesn't need to be your moat	Days–weeks	Low — your competitors get the same product	Recurring, scales with seats / volume	High — vendor owns the prompt, model choice, and roadmap
Self-build with API access (OpenAI, Anthropic, Google direct)	Differentiated features where the AI is part of the product's identity, and you have an engineering team that can evaluate and operate AI	8–24 weeks per feature	Highest — you own the prompt, retrieval, model choice	High upfront, moderate recurring	Low
Open-source / self-hosted (Llama, Mistral, Qwen)	Cost-sensitive features at high volume, strict data residency, or specialized fine-tuning	12–28 weeks per feature plus infra	High	High upfront and operational, low per-call	Lowest
Partner with a specialist (our model)	Product teams that have an existing app, want the differentiation of self-built AI, and don't want to spend a year hiring an AI team	6–14 weeks per feature	High — built on your data and product	Predictable, paid back in retention or pricing	Low — you own the code

The pattern we recommend most often: buy SaaS for internal productivity (Glean for employee search, Hex for analytics), build directly on OpenAI / Anthropic for the differentiated user-facing features, and consider open-source self-hosting only when volume or data residency makes the math work.

What does an existing product enhancement engagement look like with us?

Most engagements start with a 1 to 2 week audit — we read the code, the schema, and the analytics, and produce a ranked list of AI features with effort, impact, and risk. Audit deliverables are useful even if you don't continue with us.

The build phase runs 6 to 14 weeks for a focused feature. We work in your repo, on a feature branch, in your stack — Python, TypeScript, Go, Ruby, whatever the existing product is built in. We don't introduce a new framework just because we like it. AI workloads sit on OpenAI, Anthropic, or self-hosted open-source behind a thin routing layer, with evaluations, observability, and a fallback path from day one.

We default to a senior pair — one engineer focused on the AI service, one focused on the integration points and product UX — plus a designer for the user-facing surface. Your team owns code review and the merge. Post-launch, we stay engaged for the first 4 to 8 weeks to tune prompts, retrieval, and unit economics on real traffic.

What does an existing product enhancement cost?

Realistic ranges for the engagements we run:

Audit and feature ranking: USD 15,000 to USD 40,000, fixed fee, 1 to 2 weeks.
Single-feature build (e.g., RAG-powered search, smart recommendations, NL interface): USD 60,000 to USD 180,000, 6 to 14 weeks.
Multi-feature program (3–5 features over 6–9 months): USD 250,000 to USD 750,000, structured in 90-day cycles.
Post-launch retainer (prompt tuning, eval expansion, cost optimization): USD 8,000 to USD 25,000 per month.

Sub-USD 25,000 builds exist, but they almost always skip evaluation, observability, or the fallback path — and they tend to be the features that quietly hallucinate in production.

For pricing on related services, see our Pricing page.

Frequently asked questions about existing product enhancements

Can you add AI to our product without rewriting it?

Almost always, yes. The pattern is an AI service layer that lives next to the existing application, exposes a small set of internal APIs, and is called from the parts of the product where it adds value. The legacy code keeps running; the AI is additive.

Should we use OpenAI directly, or a SaaS like Glean, Sana, or Hex?

If the AI feature is the differentiator (your search, your recommendations, your generative workflows), build directly on OpenAI, Anthropic, or an open-source model — you need to own the prompt, the data flow, and the model choice. If the feature is internal productivity (knowledge search for employees, dashboards), a packaged SaaS like Glean or Hex is often a better buy.

How long does it take to ship an AI feature into an existing product?

Six to fourteen weeks for a focused feature with real evaluation, observability, and a non-AI fallback. Quick demos can ship in a sprint; production-grade features that won't embarrass you take longer.

What happens to our existing product team during the engagement?

They stay in the loop and own the merge. We work in their repo, on a feature branch, with code reviews from their senior engineers. The goal is for the AI feature to be maintainable by your team after we leave, not a black box only we understand.

How do you handle hallucinations and bad AI outputs in a live product?

Three layers: an evaluation suite that runs on every prompt change, a runtime confidence-and-fallback path so the AI can refuse or defer, and clear UI affordances that let users correct or report bad outputs. Hallucinations are a product problem, not just a model problem.

Will the AI features make our product slower?

Only if they're built poorly. We design AI features with explicit latency budgets — async where possible, streaming for long generations, smaller models for hot paths, and aggressive caching. Most user-visible AI features land at p95 under 1.5 seconds.

How do we price AI features to customers?

Three patterns work: bundle into existing tiers if marginal cost is low and the feature drives retention; price per usage (calls, tokens, generations) if marginal cost is meaningful; or create a new tier for power users who get the AI features at higher quotas. We model the unit economics with you before the feature ships.