When do I actually need custom AI vs an off-the-shelf tool?
When the off-the-shelf vendor cannot reach your data, cannot follow your process, or cannot be evaluated against your real cases. The honest answer most weeks is: stay on the SaaS. We build custom AI when there is a hard reason - your knowledge lives behind a VPN, your workflow is multi-step with branching, your customers expect an answer that no public model has been trained on, or a SaaS would lock you into a per-seat curve that does not scale. We will tell you on the intro call if the build does not pass that bar.
What is an MCP server and why would we want one?
Model Context Protocol is the open standard Anthropic shipped for letting AI models talk to your tools - your CRM, your warehouse, your repo, your internal admin app - in a typed, auditable way. An MCP server is the integration that exposes a specific tool to any MCP-aware model (Claude, Cursor, Continue, etc.) without having to glue a model to it bespoke every time. The build delivers the MCP servers your team will lean on for the next two years, with auth, scopes, and audit logging built in.
How is your RAG different from a vector-search demo?
Most RAG demos work on a clean PDF. Production RAG fails on the messy reality - PDFs that are scans, Notion pages with stale frontmatter, Confluence sprawled across three spaces, support tickets with screenshots. The build invests in the boring layer: ingest with metadata extraction, chunking tuned to your content, evals on real questions your team actually asks, and reranker plus citation enforcement so the model cannot answer without showing the source. Hallucination rate is measured, not assumed.
RPA on legacy systems - is that not a maintenance nightmare?
It can be. The build minimises that surface: we use APIs everywhere they exist; RPA is the last-resort bridge to systems with no machine-readable interface (SAP guis, hospital portals, government filings, ancient internal apps). Every RPA flow is wrapped with screenshot diffing, structured retries, and a watchdog that pages an owner if upstream UI changes. Where possible, we add a real API in front of the RPA flow so callers do not depend on screen scraping directly.
How do you stop an agent going off the rails in production?
Three layers. (1) Eval suite: hundreds of real cases run on every prompt or model change, with regression alerts. (2) Production guardrails: tool-call allowlists, structured output validation, refusal on out-of-policy cases, dollar caps on action loops. (3) Observability: every run logged with prompt, tool calls, and outcome - searchable and replayable. The agent does not get to do things you have not authorised, and you can see exactly what it did when something looks off.
Where does the data and model live?
In infrastructure you control. Agents, MCP servers, vector stores, and RPA workers deploy in your cloud account (AWS, GCP, Azure). Models can be Claude, OpenAI, or self-hosted (Llama, Mistral) depending on data sensitivity and cost profile - chosen per workload, not pre-baked. Production model usage is billed by the provider directly to your account. Your data does not train external models.
What happens after the build?
Three options: (1) take the repo, evals, and runbooks and run it internally, (2) keep us on retainer for prompt tuning, eval expansion, model upgrades, and outage response, (3) scope a follow-on build (a second agent, a wider RAG, a customer-facing version). No pressure to continue, no vendor lock-in.