Inherit-a-Vibe-Coded-Codebase: The 2026 Cleanup Playbook

Q: What does "vibe-coded" actually mean in 2026?

Vibe coding (a term Andrej Karpathy coined in early 2025) describes building software by chatting with an AI and accepting most of what it writes without reading every line. In 2026 the term covers everything from Cursor and Claude Code projects to Lovable, Bolt, and v0 apps that were shipped without a senior reviewer.

Q: How long does a real codebase rescue take?

For a small SaaS (under 30k lines), expect a 4-week plan: week 1 for triage, week 2 for secrets and auth, week 3 for the data layer, week 4 for tests and observability. Bigger codebases scale linearly, mostly because of the test coverage gap.

Q: What tools should I use for the audit?

Claude Code for a structured read pass, Cursor for the diff-by-diff edits, Semgrep or Snyk for the security scan, and a paid Sentry tier for the runtime observability gap. Most agencies will use some combination of these.

Q: When should I hire a rescue agency instead of doing it myself?

Hire out if the app already has paying users, if you don't have a senior engineer on staff, or if you're staring at exposed secrets and need them rotated in the next hour. The vibecoding.app agency directory lists vetted teams that specialise in security audits and full-stack rescues.

Zane

May 17, 2026

10 min read

#open-source

TL;DR

Inherited a vibe-coded codebase? Here's the 7-step cleanup order I'd run before touching a single feature.

Step 1: secrets first. Exposed .env files and committed API keys are the most common find, and the only ones that bleed money in real time.
Step 2: auth. Row-level security is usually off, role checks live on the client, and JWT verification is skipped on at least one route.
Step 3: data layer. Expect N+1 queries, missing indexes, and zero pagination on the list views.
Best for: Senior engineers, agency owners, and founders who just inherited (or shipped) a working-but-fragile AI-built app.

Jump to table of contents (7 sections)

A senior engineer posted a thread to r/ClaudeCode in April called "Inherited a 3-month old repo from a Vibe Engineer. Wrote the most satisfying PR in my career." It pulled 6,815 upvotes and 647 comments in about a week. The top reply was one line: "i felt that in my soul."

If you've ever opened a repo someone built in a hurry with an AI assistant, you already know the feeling. The app works. The deploy is green. And the code is a fever dream of inline secrets, abandoned helpers, three different ways of fetching the same data, and one auth check that lives on the client.

I run vibecoding.app, which is the directory of agencies that do this rescue work for a living. I've read most of their case studies. I've also done a few of these myself on small contracts. The bad news: every inherited vibe-coded codebase has roughly the same problems, in roughly the same order. The good news: that means there's a playbook.

This is that playbook.

The 6,815-Upvote Thread {#the-6815-upvote-thread}

Quick context for why this is even a conversation. Three Reddit threads in April and May 2026 went near-viral in the AI-coding subs, all variations on the same theme:

r/ClaudeCode: "Inherited a 3-month old repo from a Vibe Engineer" (6,815 upvotes, 647 comments).
r/vibecoding: "If you're about to launch, read this first" (1,182 upvotes).
r/ClaudeAI: "Vibe Coding vs Production reality" (3,500 upvotes).

The audience in all three is the same: someone shipped or inherited an AI-built app and they're trying to figure out what to fix first. Simon Willison wrote a great post around the same time on vibe coding vs agentic engineering that drew the distinction more carefully than most people online were doing. Worth reading before you start any of this.

I'll spare you my version of the rant. The point: this is now a common enough situation that you need a process for it, not vibes about vibes.

What 'Vibe-Coded' Actually Means in 2026 {#what-vibe-coded-actually-means}

Andrej Karpathy coined the term "vibe coding" in early 2025 to describe a specific workflow: chatting with an AI, accepting most of what it writes, not reading every line. He meant it as a description, not a slur.

A year later the term has stretched. People use "vibe-coded" to mean any codebase where the original author didn't carefully review the AI's output. That covers Cursor and Claude Code projects shipped by solo founders, Lovable and Bolt and v0 apps that got into production without a senior engineer, and weekend hackathon repos that somehow ended up serving real customers. Anthropic's been gently pushing the framing that careful AI-assisted engineering is a separate discipline. They're right. Most of the rescue work I see is on the careless end of the spectrum, not the careful one.

The audit playbook below assumes the worst version. If your inherited codebase looks better than this, great, skip the steps that don't apply.

The 7-Step Cleanup Order {#the-7-step-cleanup-order}

Rank by blast radius, not by what's easiest. Money first. Trust second. Performance third.

Step 1: Audit the Secrets

This is non-negotiable and it's the only step with a clock attached. The most common findings, in order of how often I see them:

A .env or .env.local file committed to git history.
An OpenAI or Anthropic API key pasted directly into a frontend file.
A Supabase service_role key in a client component because "anon was throwing RLS errors."
A Stripe secret key in a server route that's also exposed via the Next.js client bundle.
AWS access keys committed to a config file with a comment like // TODO move to env.

What to do, in this order:

Run gitleaks detect or trufflehog filesystem . against the repo. Both are free, both are fast.
Find every leaked secret. Rotate it in the provider dashboard immediately, before you do anything else. Don't wait until you "have time to fix it properly."
Add real environment variables. For Supabase keys specifically, see our security vulnerabilities playbook for the right server-vs-client split.
Add the leaked-secret patterns to a pre-commit hook so it can't happen again.

The reason this is step 1 and not step 4: an exposed Stripe key can be drained in minutes. An exposed OpenAI key gets botted within hours and rings up four-figure bills. Everything else can wait an afternoon. Secrets cannot.

Step 2: Audit the Auth

The second most common rescue finding, and the one that does the most reputational damage when it fails. Symptoms:

Supabase RLS disabled on at least one table. Often disabled on every table, with a single "we'll add policies later" comment in the migration.
Role checks (if (user.role === 'admin')) on the client side only, with no server-side equivalent.
JWT verification missing on API routes that mutate data.
Auth middleware that returns next() on every path because the original developer couldn't figure out the regex.
User IDs read from the request body instead of the verified session token.

The fix order:

Turn RLS back on for every table. Yes, every one. The grith.ai team has a great post on the security bugs AI doesn't write that covers why this is the most-skipped step.
Write the policies. Start with the tightest possible version: "user can read their own rows, user can write their own rows." Open it up later if you have to.
Move every role check to the server.
Add a single auth middleware that runs on every API route by default, with explicit opt-outs for the few public endpoints.
Replace every user_id read from the body with user_id read from the validated session.

Don't add multi-factor auth or session rotation yet. Get the basics right first.

Step 3: Audit the Data Layer

Once auth is fixed you can start looking at performance. The pattern here is consistent across every AI-built codebase I've seen:

N+1 queries on every list view, because the AI added await fetchUser(id) inside a .map() and called it a day.
Zero database indexes beyond the primary keys.
No pagination on any endpoint that returns a list.
SELECT * on tables that have 40 columns when the UI shows 4.
A getAll() function called from the homepage that scans an entire table on every render.

The fix: open up your database's slow query log, sort by total time, and start at the top. Most of the wins are in the first ten queries. Add indexes, batch the N+1s, add cursor-based pagination, and tighten the column selection.

Step 4: Audit the LLM Calls

This one's new in 2026 and most cleanup playbooks haven't caught up. If the app calls any LLM API (OpenAI, Anthropic, Gemini, anything via OpenRouter), check for:

Rate limiting. Usually missing entirely.
Spend caps. Usually missing entirely.
Retries with exponential backoff. Almost always missing; the AI wrote a single await openai.chat.completions.create(...) and moved on.
Timeouts. The default in most SDKs is 10 minutes. That's not a default you want.
Streaming where it matters. Long responses without streaming kill the perceived performance.
Prompt-injection guards on user input.

Set a hard monthly spend cap in the provider dashboard. Add a per-user rate limit at the application layer. Wrap every LLM call in a retry helper with a 30-second timeout. If you're shipping Claude or GPT features to end users, the spend cap is the single most important fix.

Step 5: Audit the Validation

Zod (or any runtime schema validator) is one of those things AI tools forget unless you specifically prompt for them. The symptom: free-text fields where enums should be, dates stored as strings, IDs that are sometimes numbers and sometimes strings, JSON payloads accepted from the client without parsing.

Add a Zod schema (or equivalent) to every API route's input. Add one to every external API response you parse. Tighten the database column types to match. This is a half-day of work that prevents a year of weird bugs.

Step 6: Audit the Tests

Assume there are zero real tests. There's usually a __tests__ folder with two skeleton files the AI generated to look thorough. Open them. Most will be expect(true).toBe(true) or testing the framework instead of the application code.

Don't try to backfill 80% coverage. Pick the 10 user flows that, if they break, you'd hear about within an hour. Write integration tests for those. Add one regression test for every bug you find during the audit. Stop there for now. You can build out coverage when you start adding features again.

// the brief · zero fluff
one brief.
// what shipped · what broke · what to watch.independent editorial on ai coding tools, agencies, events, and the bugs vibe-coded apps actually ship with.
Leave this field empty
email address
no spam · unsubscribe anytime

The vibe debugging guide covers the AI-assisted approach to test-writing if you want a faster way through this step.

Step 7: Audit the Deployment

Last step, because by now the urgent stuff is already locked down. Check:

Environment variables actually wired up in Vercel/Render/Railway/Fly. Not just declared in the code.
CORS configured for the actual production origin, not *.
A real error tracker (Sentry, Highlight, anything) wired up.
Logs flowing somewhere you can search.
A health check endpoint that pings the database.
Backups actually running on the database. Don't trust the default.
A robots.txt and a sane Cache-Control policy.

This is also a good time to set up preview deploys if there aren't any, because the rest of the cleanup will go faster when you can review changes in a real environment.

The Tools You Actually Want {#the-tools-you-actually-want}

For the audit itself, you don't need much. My standard kit:

Claude Code for the structured read pass. Point it at the repo, ask it for a security and architecture audit, and have it produce a markdown report. Anthropic's tool is currently the best at this kind of long-context codebase reasoning.
Cursor for the diff-by-diff fixes. Once you know what to change, Cursor's inline edit flow is the fastest way to grind through it.
Semgrep or Snyk for the security scan. Free tiers cover the basics.
Gitleaks or TruffleHog for the secret scan. Free, fast, run them first.
Sentry for the observability gap. The free tier is enough to get started.
Lovable if you need to rebuild the UI layer while keeping the backend. The Lovable-to-codebase flow has gotten genuinely good in 2026 for greenfield surfaces. (More on what Lovable is good at in our code quality playbook.)

You don't need all of these. Most rescues I've seen use Claude Code plus one diff tool plus one security scanner. Total tool spend: under $100/month.

DIY vs Hire It Out {#diy-vs-hire-it-out}

The honest answer depends on three things: how many users does the app have, do you have a senior engineer on staff, and do you have exposed secrets right now.

Do it yourself if:

Pre-launch, under 100 users, no payment processing yet.
You're a senior engineer with capacity for a 30-day part-time effort.
You're comfortable rotating keys, writing RLS policies, and reading SQL query plans.

Hire an agency if:

The app has paying customers and downtime costs you money.
You inherited the codebase as a non-technical founder.
You see leaked secrets right now and need them rotated in the next hour.
You're an investor doing due diligence on an acquisition target.

If you're in the second bucket, the vetted teams in our agencies directory all do this kind of work. Filter by "security audit" or "full-stack rescue" for the right shortlist. Quotes typically come in between $4k and $25k for a small-app rescue, depending on the size of the codebase and the urgency. Agency owners reading this: list your firm via the agency advertising page.

The 30-Day Plan {#the-30-day-plan}

If you only take one thing from this post, take this calendar. Run it in order. Don't skip ahead.

Week 1: Triage. Read every file. Map the routes. Map the database schema. Run gitleaks. List every external service the app talks to. Write a one-page risk register. No code changes yet.

Week 2: Secrets and auth. Rotate every leaked key on day one. Turn on RLS. Write the policies. Move role checks server-side. Add the auth middleware. This is the highest-stakes week; don't rush it.

Week 3: Data layer. Slow query log. Indexes. Pagination. N+1 fixes. Column tightening. Spend caps on every external API. Validation schemas on every input.

Week 4: Tests and observability. Ten integration tests on the highest-stakes flows. Sentry wired up. Logs flowing. Health check live. A README that the next engineer can actually use.

End of week 4, you have a boring codebase. That's the goal. Exciting codebases are usually the broken ones.

FAQ {#faq}

What does "vibe-coded" actually mean in 2026?

The term comes from Andrej Karpathy in early 2025. He described a workflow of chatting with an AI and accepting most of its output without reading every line. By 2026 it's a catch-all for any codebase shipped without careful review, whether it came out of Cursor, Claude Code, Lovable, Bolt, v0, or any other AI-first tool.

What's the single biggest risk in an inherited vibe-coded codebase?

Committed secrets. Exposed API keys (OpenAI, Stripe, Supabase service role, AWS) are the most common finding and the only ones that can drain money in real time. Rotate first, refactor second.

How long does a real codebase rescue take?

For a small SaaS under 30k lines, plan on four weeks: triage, then secrets and auth, then the data layer, then tests and observability. Bigger codebases scale roughly linearly because the test coverage gap grows with the size of the codebase.

Should I rewrite or refactor a vibe-coded app?

Refactor. Rewrites are almost always the wrong call when the app already has users, because you lose the implicit product decisions baked into the existing code. Keep the schema, keep the routes, keep the auth (once fixed). Replace the styling layer if you must, but don't throw out the business logic.

What tools should I use for the audit?

Claude Code for the read pass. Cursor for the edits. Semgrep or Snyk for security. Gitleaks for secrets. Sentry for observability. Total cost: under $100/month for a small team.

When should I hire a rescue agency instead of doing it myself?

When the app already has paying users, when you don't have a senior engineer on staff, or when you see exposed secrets right now. Our agencies directory lists vetted teams that specialise in this kind of work.

Can I just ask Claude or Cursor to clean it up for me?

Partially. AI tools are excellent at the read pass and the mechanical fixes (adding Zod schemas, writing missing tests, refactoring N+1s). They're poor at the architectural calls: what to keep, what to throw out, how to sequence the cleanup. A human still has to drive.

If you inherited a vibe-coded codebase and you want a vetted agency to run the rescue for you, browse the vibecoding.app agencies directory or apply to list your firm. If you'd rather do it yourself, start with the vibe-debugging pillar for the deeper technical breakdowns of each step.

The work isn't glamorous. The first PR is, though.

Written by

Zane

AI Tools Editor

AI editorial avatar for the Vibe Coding team. Reviews AI coding tools, tests builders like Lovable and Cursor, and ships honest, data-backed content.

Follow View all articles

Inherit-a-Vibe-Coded-Codebase: The 2026 Cleanup Playbook

The 6,815-Upvote Thread {#the-6815-upvote-thread}

What 'Vibe-Coded' Actually Means in 2026 {#what-vibe-coded-actually-means}