ForkIt! Case Study: Things That Broke

The Code Failures

Bugs that shipped. Builds that broke. Things that should have been caught before anyone saw them.

The iOS Rejection

What happened

Apple rejected the v2 submission for missing subscription compliance. Auto-renewal disclosure text was missing. There was no Restore Purchases button. Terms of Service and Privacy Policy links were not accessible from the paywall screen. No LLM tutorial, no human tutorial, no documentation of any kind mentioned these requirements.

What it cost

Two days of rework. A full resubmission cycle. The realization that the LLM had confidently built an IAP flow that would never pass review.

What fixed it

Added all three compliance elements. Created a pre-submission checklist. The 31-review suite now includes Review 16 (IAP/Subscription Compliance) that checks for every requirement Apple and Google enforce.

The Refactor That Broke Production

What happened

During a code cleanup, Claude renamed API endpoints for consistency. Clean code. Better naming. Deployed to Vercel. The problem: every app already installed on users' phones was still calling the old endpoint names. Live users started getting errors. The old URLs returned 404s.

What it cost

Live users hit errors until backward-compatibility rewrites were added. Those rewrites still exist in the Vercel config today as tech debt (#115).

What fixed it

Added rewrites in vercel.json to map old paths to new endpoints. Established a rule: never rename a live API endpoint without a redirect. The review suite now includes Review 21 (API Endpoint Hygiene) and Review 30 (Migration and Upgrade Safety).

ANR on Android (App Not Responding)

What happened

Android users with non-Chrome default browsers (DuckDuckGo, Firefox) experienced hangs during Clerk SSO sign-in. The app froze completely. Chrome Custom Tabs opened in the wrong browser, which couldn't handle the redirect back. The main thread blocked. Android killed the app.

What it cost

Hours of debugging a platform-level limitation that has no real fix. Users with non-Chrome defaults still see friction.

What fixed it

Added a hint toast on Android sign-in failure explaining the Chrome requirement. Documented it as a known limitation in the project guide. This is a platform constraint, not something app code can solve. The fix was transparency, not engineering.

The Demo Failure

The stakeholder engagement was right. The preparation was wrong.

The Annapolis Demo

What happened

Work trip to Annapolis. Three colleagues who had heard about the app wanted to try it. Perfect opportunity: real users, real context, at a restaurant, ready to pick a place for dinner. Opened the app. It didn't work.

The backend had been updated during development without a matching app build. The dev build on the phone was calling endpoints that had changed. The app couldn't fetch restaurants. At a restaurant. In front of three potential users.

What it cost

Three users. Not hypothetical users. Three real people who were interested, present, and ready to try it. They watched it fail and moved on. They never came back to it.

Every demo is a stakeholder touchpoint. This one failed because the build wasn't tested on the device, in the environment, within the hour. The engagement instinct was right. The preparation discipline didn't exist yet.

What fixed it

Demo prep protocol: before any stakeholder touchpoint, verify the build is production (not dev), test on the actual device, confirm the backend matches, and have a fallback plan if it crashes. This became part of the project's pre-push checklist.

The LLM Failures

This is the centerpiece. Three failures with the same root cause: the LLM gave a confident answer, I trusted it, and the answer was wrong. Each time, the cost compounded because the wrong answer became the basis for the next decision.

The Build Credit Burn

What happened

Asked Claude how many EAS builds were included on the Starter plan. Answer: "Unlimited!" Planned accordingly. Weeks later, asked again. Claude read its own previous memory file. Same answer: "Unlimited!" Felt confirmed.

Then an email arrived from Expo: 80% of build credits consumed with two weeks left in the billing cycle.

The LLM had been wrong the first time. Then it cited itself as a source the second time. A hallucination, reinforced by its own memory, presented as verified information.

What it cost

Scrambled to cancel unnecessary builds. Established a rule to cancel superseded builds immediately. Nearly burned through a paid plan's allocation on builds that were never used.

What fixed it

New rule: never trust LLM pricing knowledge. Always verify vendor pricing pages directly before making cost-driven decisions. Added to project memory as a permanent instruction.

The Google API Pricing Lie

What happened

Claude stated that Google Maps Platform included a $200/month free credit. Architecture decisions, cost projections, and the free tier design were all built on this assumption. The entire "as free as possible" pricing model assumed that credit existed.

It didn't. Google had eliminated the $200 monthly credit over a year before the project started. The LLM's training data was stale. Every cost calculation based on that number was wrong.

What it cost

Rearchitecting the API call strategy. Adding pool caching, client-side filtering, and aggressive deduplication to reduce API spend. The free tier limits (20 searches/month) exist partly because the expected credit never materialized.

What fixed it

Verified Google's actual pricing page. Rebuilt cost projections from real numbers. Added "verify vendor pricing" as a permanent memory instruction. Every cost-driven decision now requires a primary source, not an LLM assertion.

The Billing Exposure

What happened

The backend API had no rate limiting or origin checking for weeks. No user data was at risk (the search endpoint doesn't store or transmit personal data), but anyone who discovered the URL could have run up the Google Places API bill on my account.

The LLM built it. The LLM didn't flag the gap. I didn't know to check. It was discovered during a code review, not because anyone exploited it.

What it cost

Weeks of financial exposure. Nobody found it. The risk was to my billing account, not to user data.

What fixed it

Added rate limiting (30 req/min per IP), origin checking, and security middleware. The review suite includes Review 7 (Security), Review 18 (Operational Readiness), and Review 21 (API Endpoint Hygiene). Every endpoint now validates before processing.

The Auth Gap

What happened

Separate from the billing exposure: API endpoints that touch user data (favorites, history, sync) accepted a user ID in the request body but never verified it belonged to the person making the request. Someone could have read or written another user's data by guessing their ID.

What it cost

Discovered during a manual code review session. Patched immediately. No exploitation occurred. But the gap was real: user data endpoints without identity verification.

What fixed it

Added Clerk JWT verification on all protected endpoints. Established Review 19 (Auth and Identity) in the review suite. Auth is now verified server-side on every request that touches user data.

The Cross-Project Contamination

What happened

Claude picked up stray files from a different project (HabitCoach, a blue and gold themed app) and built the entire onboarding tour in the wrong brand colors. The tour was fully functional, well-structured, and completely off-brand. Blue and gold instead of orange and teal.

What it cost

Multiple design reviews missed it. I didn't catch it until later, when the gold "felt off." The LLM cross-contaminated two projects, and the human reviewing it didn't notice because the tour worked. Functional correctness masked visual wrongness.

What fixed it

Rebuilt the tour in the correct brand. Established the color theory doc (orange = problem, teal = solution) and the brand style guide in CLAUDE.md so the LLM has an authoritative reference. Review 22 (Brand Continuity) now checks every screen for theme consistency.

Premature Surrender

What happened

LLMs will say "I can't do that" or "you'll need to do this manually" about things they absolutely can do. The first time it happened, I accepted it and spent time on manual work that wasn't necessary. It kept happening. Each time, the cost was small — a few minutes here, a workaround there. But it accumulated.

What it cost

No dramatic failure. No outage. Just accumulated time lost to unnecessary manual work, spread across dozens of interactions. The kind of cost that doesn't show up in a postmortem because no single instance was big enough to flag.

What fixed it

A permanent instruction in the project memory: don't say "I can't" without trying workarounds first. Push back, ask to try, or rephrase. The lesson: don't accept the first "no" from an LLM. The default is surrender, not effort.

The pattern across all six: the LLM gave a confident output. I had no framework to evaluate it. The output became infrastructure. When it turned out to be wrong, the fix was expensive because decisions had been built on top of it.

The Process Failures

Not code bugs. Judgment bugs. Decisions that cost time because the framework for evaluating them didn't exist yet.

The Mom Group Trap

What happened

Weeks of design work on features driven by feedback from a mom group on social media. The feedback was enthusiastic. The problem: the people giving it weren't users. They were repeating what sounded interesting, filtered through their own context, with no connection to the actual use case.

The features they described (social check-ins, recipe integration, gamification) sounded reasonable in isolation. They were completely wrong for the product. The app picks a random restaurant. It doesn't need a social feed.

What it cost

Weeks of design iteration on the wrong mental model. Time that could have been spent on features actual users needed (like the exclude filter, or walk mode).

What fixed it

Built a feedback taxonomy with five types (behavioral, voiced pain, feature requests, ambient noise, market signal). The mom group feedback was genuine — the mistake was in how I weighted it. The signal-to-noise rubric now requires decomposing every request into the underlying need before evaluation.

The Promo Code Gap

What happened

Built a promo code system for early adopters and testers. Had codes ready to distribute. The problem: the highest-value testers were already gone. The hostel strangers who gave the exclude-filter feedback, the walker who surfaced the walk-mode gap, the colleague who couldn't demo to a friend. No contact information was captured during those interactions.

What it cost

Lost the ability to close the loop with the people who shaped the product most. Could not reward the testers who mattered. Could not get follow-up feedback from the users whose input was most valuable.

What fixed it

Created STAKEHOLDERS.md in the repo root: a lightweight register tracking who gave feedback, what channel, what type, what it became, and whether the loop was closed. Includes a capture checklist for future sessions. Not a formal tool; just a file that gets updated. The hostel testers are still lost, but the next round won't be.

What Got Built in Response

For every category of failure, a system was built to prevent recurrence. The failures are orange. The responses are teal.

Code Failures

31-review suite with automated checks (ESLint, Prettier, Secretlint, Knip, npm audit) plus 26 manual deep-dive reviews. Runs before every deploy.

Review 16: IAP/Subscription Compliance
Review 21: API Endpoint Hygiene
Review 30: Migration and Upgrade Safety

Demo Failures

Demo prep protocol. Before any stakeholder touchpoint: verify the build is production, test on the actual device, confirm the backend matches, have a fallback.

Pre-push impact check (all platforms)
Never push untested builds
Verify on physical device via USB

LLM Failures

Verify-before-trust protocol. Never trust LLM assertions about pricing, platform limits, or compliance requirements. Always check the primary source.

Verify vendor pricing pages directly
Cancel superseded builds immediately
Security review on every endpoint
Review 7: Security (hardcoded secrets, eval, injection)

Process Failures

Feedback taxonomy and stakeholder map. Classify every piece of feedback before acting on it. Capture contact info during engagement.

Five feedback types with weighted signals
Signal-to-noise rubric for requests
Lightweight stakeholder register
Loop-closing checklist

Things That
Broke
(And What They Cost)

The Code Failures

The iOS Rejection

The Refactor That Broke Production

ANR on Android (App Not Responding)

The Demo Failure

The Annapolis Demo

The LLM Failures

The Build Credit Burn

The Google API Pricing Lie

The Billing Exposure

The Auth Gap

The Cross-Project Contamination

Premature Surrender

The Process Failures

The Mom Group Trap

The Promo Code Gap

When Each Failure Happened

What Got Built in Response

Code Failures

Demo Failures

LLM Failures

Process Failures

The Common Thread

Things ThatBroke (And What They Cost)

The Code Failures

The iOS Rejection

The Refactor That Broke Production

ANR on Android (App Not Responding)

The Demo Failure

The Annapolis Demo

The LLM Failures

The Build Credit Burn

The Google API Pricing Lie

The Billing Exposure

The Auth Gap

The Cross-Project Contamination

Premature Surrender

The Process Failures

The Mom Group Trap

The Promo Code Gap

When Each Failure Happened

What Got Built in Response

Code Failures

Demo Failures

LLM Failures

Process Failures

The Common Thread

Things That
Broke
(And What They Cost)