1. What actually ships vs. what gets hyped
Twitter will tell you there are 47 must-have AI tools this quarter. Your project will tell you a different story. The tools I actually open every day — the ones that earn their monthly invoice — number fewer than ten. Everything else is a distraction.
This playbook is the filtered list. It's built from my own subscriptions, six published in-depth reviews on KPBoards, and the kinds of Slack threads senior engineers actually have at 10pm on a Tuesday. Take what works, skip what doesn't, and come back in six months — the list will have changed.
2. The senior-engineer core loop
Every AI stack for production work covers the same five layers. Pick one tool per layer, stop optimizing, and ship.
- IDE — the tool you write code in. Non-negotiable. One choice.
- Research — the tool you ask questions in. Used daily.
- Media — video, voice, audio. Only matters if you create content.
- Design — visuals, thumbnails, diagrams. Optional.
- Governance — cost caps, alerts, audit. Nobody talks about this. You will regret skipping it.
Most engineers stack four tools per layer and end up paying $400/mo for overlap and indecision. Pick one. Ship.
3. IDE layer: Cursor vs. Claude Code vs. Copilot
This is the decision that shapes your day. I use both Claude Code and Cursor, and the split is deliberate.
| Tool | Strength | Weakness | Best for |
|---|---|---|---|
| Claude Code | Long-context reasoning, agentic multi-file edits, terminal-native | Costs stack up on large codebases without caching | Large refactors, architecture work, multi-file migrations |
| Cursor | Best-in-class IDE ergonomics, fast inline edits, shared chat history | Completion quality depends heavily on the underlying model picked | Day-to-day feature work, tight inner loops, pair-style edits |
| GitHub Copilot | Lowest friction, zero setup, great for team-wide standardization | Reasoning ceiling is lower — struggles on complex multi-file tasks | Boilerplate, tests, small refactors, teams on GitHub |
My own routing rule: Cursor for the inner loop, Claude Code for the harder outer loop. Feature work, form validation, a single-component refactor — Cursor. Three-file migration, a stubborn bug that crosses server and client, a real rewrite — Claude Code. Copilot I've installed on team repos where consistency matters more than ceiling.
I reviewed this space in detail at Windsurf vs. Cursor for production React/Next.js — Windsurf is a real contender and worth trying if Cursor ever drifts.
4. Research layer: ChatGPT vs. Perplexity
I run both daily and they occupy different slots. ChatGPT is the brainstorm partner — loose, generative, fast. Perplexity is the research assistant — cited, structured, boring in the best way.
When a teammate asks me "hey, should we use pgvector or Pinecone?", I open Perplexity. It quotes pricing pages, vendor docs, and benchmarks. When I'm writing a 1-pager and need to generate ten options, I open ChatGPT.
Full six-month comparison at ChatGPT vs. Perplexity for research in 2026.
5. Media layer: Descript, HeyGen, ElevenLabs
Only relevant if you publish. If you ship features and never record, skip this section. For the rest of us, the split:
- Descript — the editor. I treat video and podcast audio as a text-editable document and it saves me hours per week. (4-month review)
- HeyGen — avatar video. Great for corporate training and faceless YouTube in multiple languages. (3-month review)
- ElevenLabs — voice cloning and narration. My bilingual faceless YouTube channel runs on it. (6-month review)
6. Design layer: Midjourney vs. DALL-E
For thumbnails, marketing imagery, and one-off illustrations, I run both and pick per task. Midjourney still wins on style, DALL-E still wins on following text instructions verbatim and handling non-English text inside the image.
10-round head-to-head at Midjourney V7 vs. DALL-E 4.
7. Cost discipline — the part everyone skips
Every senior engineer I know has at least one horror story: a stuck agent, a leaked key, a retry loop that burned the month's API budget in an afternoon. Providers give you soft alerts and overage warnings, not actual stops.
My rules, learned the expensive way:
- One API key per project.Never share keys across products. Billing goes from "who spent this?" to "project X is over."
- Hard caps per key, not just soft alerts.If the provider doesn't support it, put the key behind a proxy that does.
- Slack alerts at 50/80/100%. The 100% alert should rotate the key to read-only, not just notify.
- Audit per-model cost weekly.It's depressing how often gpt-4-turbo is doing work gpt-4o-mini could handle for 1/10 the cost.
Heads up: I'm building aicostcap.dev — a small tool that gives you real per-project hard caps, alerts, and a multi-provider dashboard. Founding-member pricing is locked for the first 100 waitlist signups.
8. What I tried and killed
Just as important as what I keep. In the last 12 months I ejected:
LangChain (production)
Too much indirection for too little value. Direct Anthropic + OpenAI SDKs plus a thin routing layer wins.
Autonomous agent frameworks
Looked great in demos, broke in real codebases. Claude Code agents are the exception — scoped tightly enough to work.
"AI productivity suite" apps
Most are a ChatGPT wrapper with a subscription. The real tools (Linear, Notion, Granola) ship AI faster than startups can.
Generic AI writing tools
I write in my voice or I don't ship. Generic output sounds like generic output.
9. A one-page summary you can hand to your team
If you only remember one block from this playbook:
- IDE: Claude Code for outer-loop, Cursor for inner-loop. Pick one if you must.
- Research: ChatGPT for brainstorm, Perplexity for anything that needs a citation.
- Media (optional): Descript + ElevenLabs + HeyGen.
- Design (optional): Midjourney + DALL-E, route per task.
- Governance (mandatory): per-project API keys, hard caps, Slack alerts at 50/80/100%.
- Delete quarterly. Keep the stack small. The inventory is the cost.
Want the full reviews behind each of these picks? Start with the tools matrix or browse the review archive.
FAQ
How do you actually pick between Claude Code and Cursor day-to-day?
Claude Code wins when the task spans many files or needs real reasoning — migrations, redesigns, debugging flaky state. Cursor wins when the task is narrow and you just want a fast inner loop. I have both installed and bounce between them; the cost of context-switching is lower than the cost of using the wrong tool for the job.
Is Perplexity actually worth $20/mo on top of ChatGPT Plus?
For engineers doing real research — evaluating libraries, reading RFCs, comparing architectures — yes. Perplexity cites sources and refuses to hallucinate as confidently as ChatGPT. For everyday brainstorming and writing, ChatGPT wins. Most senior engineers end up paying for both and routing deliberately.
What is the single biggest mistake teams make with the AI stack?
Not setting hard spending caps per project or per API key. Providers give you soft alerts and overage warnings, not real stops. One stuck agent overnight is a $500+ bill. Treat API spend like any other production cost — budgets, alerts, and a kill switch.
How often does this stack change?
Major shifts roughly every 6–9 months. Model releases, new IDE features, and pricing changes reset the analysis. I re-evaluate the full stack quarterly and publish updated reviews on KPBoards.
What did you try and kill?
LangChain for production — too much indirection for too little value. Generic AI "productivity suite" apps that wrap ChatGPT — the actual tools (Linear, Notion, Granola) got AI features built in faster. Most autonomous agent frameworks — they looked great in demos and broke in real codebases.