Changelog

Every notable change to Pitch Ready — what shipped, when, and why. Security, reliability, observability, and product features grouped by release.

Production-readiness pass 2 — security, scoring trust, and your data rights

#2026-06-10-production-readiness-pass-2

A full-codebase audit followed by remediation across security, AI cost controls, scoring integrity, and data rights. Scores are now deterministic and evidence-verified, and account deletion / data export are self-serve.

  • Security

    Server-side request forgery on website imports fully closed

    URL imports now validate every resolved address at connection time (DNS rebinding and redirect tricks no longer work), follow redirects manually with re-validation, and cap response size.

  • Security

    Sign-in endpoints strictly rate limited

    Magic-link sends are capped per IP at the NextAuth endpoint itself, closing an email-bombing bypass of the earlier limit.

  • Security

    Upload confirmation can only attach your own files

    Storage paths are verified against your own upload prefix and real object sizes are re-checked server-side.

  • Reliability

    Identical pitches now score identically

    All scoring AI calls run at temperature 0. Re-submitting the same deck and video produces the same scores — previously they could drift run to run.

  • Reliability

    Every quoted line of evidence is verified against your transcript

    Quotes shown as "From your pitch" (vertebrae evidence, rewrites, component extracts, PL thesis evidence) are checked against what you actually said; anything unverifiable is dropped instead of shown.

  • Reliability

    PL Demo Day scoring is transparent about degradation

    If the investment-thesis evaluation can't run, the report now says so explicitly instead of silently scoring without the Web3 Thesis Fit dimension. The dimension score is also derived from its section scores rather than the model's own aggregate.

  • Reliability

    Stuck analyses recover automatically

    Double-clicks can no longer start duplicate pipelines, a failed queue send rolls your assessment back so you can retry, and an hourly reaper fails any run stuck over two hours instead of leaving an eternal progress bar.

  • Added

    Delete your account — for real

    Settings → Account now has permanent account deletion (all files, analyses, and profile data, hard-deleted) and a one-click JSON export of everything we hold about you.

  • Added

    Avatar upload, reduce-motion, and assessment pagination

    Profile pictures are live, the reduce-motion preference now actually reduces motion, and long assessment lists paginate.

  • Fixed

    Honest settings and privacy policy

    Removed notification toggles that had no sender behind them; the privacy policy now discloses 90-day frame-sample retention and describes deletion accurately. Analytics only load after cookie consent.

  • Operations

    Dependency security updates

    Next.js, jspdf, and the AWS SDK updated past their open advisories; per-user daily AI budgets added across all coaching and generation endpoints.

Analysis pipeline — robustness hardening

#2026-04-28-robustness-hardening

A three-track follow-up to the truth-in-pipeline initiative. The pipeline went from "honest about what it measures" to "resilient under failure modes that actually happen" — schema-validated AI outputs, cost & abuse guardrails, and idempotent state writes.

  • Reliability

    Zod schemas on every Claude parse boundary

    All four analyzers (PL evaluator, visual analyzer, video content, deck) now validate Claude responses against Zod schemas (PlEvaluationSchema, VisualAnalysisSchema, VideoAnalysisV2Schema with .passthrough(), DeckAnalysisV2Schema). New parseAndValidateClaudeJson + ClaudeOutputValidationError. Validation failures throw a typed error that withRetry retries (Claude shape errors fix on second attempt ~70-80% of the time per Anthropic guidance). Defensive null guards in calculator-v2.buildWeb3ThesisFitDimension via safeSection() — pre-schema DB rows or schema-bypass paths now degrade gracefully instead of crashing the report renderer.

  • Reliability

    Whisper transcript quality gate

    New validateTranscript helper detects empty results, repeated-phrase hallucinations (Whisper's classic silence-as-"uh uh uh" failure), suspiciously low word density (silent microphone), and language mismatches. Pipeline marks the transcription stage failed on severity=fail so content + PL eval don't get fed nonsense; the analysis_quality.transcript_quality block surfaces in the report so founders see "we couldn't hear your audio" instead of an unexplained low score. 7 new tests cover the heuristic edges.

  • Security

    Per-user daily analysis quota

    Configurable via MAX_ANALYSES_PER_USER_PER_DAY env (default 20). Counts non-draft assessments started in the last 24h. Returns 429 daily_quota_exceeded with used/limit numbers in the response. Bounds cost-burn from a compromised user account regardless of IP rotation — the per-IP rate limit (10/min) was insufficient on its own to prevent a determined attacker from running ~600 pitches/hour at $90-120/hr in API cost.

  • Reliability

    Inngest idempotency on the analysis run

    Analyze route now sends events with id: analyze:{assessmentId}:{analysis_started_at}. Inngest dedupes within its 24h window — a network blip, double-click, or HTTP-layer retry can no longer trigger duplicate $0.20 pipelines. Existing completion + failure events already had idempotency keys; now the inbound trigger does too.

  • Observability

    Token + cost tracking per assessment

    Migration 015 adds assessments.token_usage jsonb + cost_cents numeric(10,2). New CostCollector class threads through every analyzer via an optional recordUsage callback; pipeline aggregates and persists at save. Sonnet pricing table ($3 input / $15 output per MTok) lets us compute per-assessment cost without hitting Anthropic's API. Composite index on (user_id, analysis_started_at) enables fast cost-per-user dashboards. Whisper excluded — it bills per audio-minute, not per token.

  • Operations

    R2 frame-samples lifecycle cron

    New Inngest cron cleanupFrameSamples runs Sunday 04:23 UTC with 90-day retention. Lists R2 objects under frame-samples/ via paginated ListObjectsV2, batch-deletes via DeleteObjectsCommand (up to 1000 keys per request). Logs scanned/deleted/bytesFreed for visibility. Without this, ~2 MB/pitch in extracted PNG frames would accumulate forever.

  • Reliability

    Re-analysis idempotency (DELETE-before-INSERT)

    Pipeline now wipes existing assessment_scores + findings rows for the assessment before inserting the new run's results. Closes a duplicate-row accumulation gap exposed by the audit: any path that re-triggered the pipeline (Inngest retry, admin reset, future re-analyze feature) would otherwise fan-out V1+V2 score rows alongside each other, corrupting report queries. assessment_history is intentionally preserved — each run is a meaningful iteration. Extracted as testable helper clearAssessmentDerivedRows; 4 regression tests lock in the pattern.

  • Reliability

    Orphan frame cleanup on visual-stage failure

    When analyzeVisualDelivery fails after frame upload, the inline catch now best-effort-deletes the R2 frames + clears the uploads.video_frame_samples manifest before returning. Without this, frames squat in R2 for up to 90 days waiting for the cleanup cron, accumulating storage cost on every failed run.

  • Security

    Soft-delete enforcement on iteration history

    assessment-history.ts:getAssessmentIterations traversed parent_assessment_id chains without filtering deleted_at — a soft-deleted assessment could surface its descendants (and vice versa) in the iteration timeline. Both the entry-point query and the parent-walk query now filter deleted_at IS NULL. Other suspect routes (compare, investor-view, shared/[token]) audited and confirmed already protected.

  • Observability

    Pipeline stage write failure visibility

    Fire-and-forget pipeline_stage writes used to swallow errors silently — leaving the UI showing stale progress with no operator visibility. Stage write failures now Sentry.captureMessage with a pipeline-stage tag so a stuck progress bar shows up in observability instead of being invisible.

  • Quality

    Test suite expanded from 62 to 98 tests

    16 schema validation tests, 7 Whisper transcript-quality tests, 8 cost tracker tests, 4 idempotency regression tests, 1 calculator defensive-null test. Coverage targets the new failure modes the audit found, so a future regression that re-introduces the silently-discarded-error pattern will fail noisily in CI.

Analysis pipeline — truth-in-pipeline initiative

#2026-04-28-truth-in-pipeline

A five-phase audit and remediation of the analysis pipeline. Closes the gap between what the platform claims to measure and what the code actually produces, then deepens PL Demo Day evaluation with a dedicated investment-thesis dimension.

  • Added

    Real video frame analysis (Phase B)

    FFmpeg-driven frame sampler extracts up to 8 PNG frames per pitch (opening, closing, plus the start of each transcript segment >15s). All frames are stored under frame-samples/{assessmentId}/ in R2 and analyzed by Claude 3.5 Sonnet vision in a single call. Replaces the prior behavior where eye_contact_notes, energy_level, framing, lighting, background, and attire were all fabricated by Claude from transcript text. Wrapped behind ENABLE_VIDEO_FRAME_ANALYSIS env flag (default true in prod, false in dev/CI). Adds ~$0.04 per pitch and 3-8s of latency.

  • Added

    PL Demo Day investment-thesis evaluation (Phase C)

    New 7th scoring dimension — web3_thesis_fit (15%) — added when targetEvent=pl_demo_day, with the existing 6 dimensions scaled down proportionally so the total stays 100%. PL evaluation runs a dedicated Claude call covering web3 necessity, tokenomics, protocol-vs-application classification, community traction, composability, capital efficiency, and founder track record. Sourced from the PL_Pitch_Mastery.md Graph Paper Capital workshop notes and the PL Demo Day Playbook (Dec 2025). General mode keeps the legacy 6-dimension shape — no web3-specific scoring.

  • Added

    Server-side metadata extraction (Phase A)

    PDFs run through pdf-lib at upload-confirm time to extract authoritative page_count + first-page dimensions. Videos run through ffprobe (via @ffprobe-installer/ffprobe) over a presigned URL — fast, no full-buffer download. Replaces the prior behavior where the client could under-report page_count (dodging the PL single-page critical-error cap) or claim a 4-minute video was 3 minutes. Falls back to client-provided values on extraction failure.

  • Added

    Materialized benchmark distributions (Phase E)

    New benchmark_distributions table seeded with synthetic baselines (sample_size=0, is_synthetic=true). The /api/benchmarks route now serves materialized rows by mode (PL Demo Day vs general) instead of recomputing live aggregates per request. Inngest cron benchmarks/refresh runs nightly at 03:17 UTC and flips a bucket to is_synthetic=false once it has 30+ real completed assessments. Report UI now discloses 'Based on N=X completed PL Demo Day assessments' or 'Estimated baseline' so percentile rankings are never silently presented as data-backed when they aren't.

  • Fixed

    Fabricated visual delivery removed

    Before this release, Claude was prompted to assess eye_contact_notes, posture, attire, and energy_level from the transcript alone — observations it has no way to verify. The unified-assessment.ts content prompt now either receives a VisualDeliveryContext block from the visual analyzer (real frame observations) or an explicit instruction to leave eye_contact_notes / energy_level null. No more invented visual delivery.

  • Fixed

    PDF page_count silently defaulting to 1

    Pipeline previously read deckUpload.page_count || 1, which meant every multi-page deck was treated as single-page — silently dodging the PL Demo Day critical-error cap (deck_multiple_pages, score capped at 5). Page count is now extracted server-side at upload-confirm and the cap fires correctly.

  • Fixed

    Misnamed prompt files renamed (Phase C.1)

    prompts/yc-criteria.ts → demo-day-best-practices.ts (it never contained YC-specific evaluation — only generic Demo Day craft). prompts/pl-criteria.ts → pl-technical-criteria.ts (technical specs only — content evaluation now lives in the new pl-evaluation-criteria.ts module). The Vertebrae four-question framework is now properly attributed to Geoff Ralston (Y Combinator) in code comments, the analysis prompt, and the report UI.

  • Quality

    V1 dead code deleted

    Removed runAnalysisPipeline (V1, ~320 lines), src/lib/scoring/calculator.ts (V1 calculator, ~620 lines), and the V1 analyzeVideoContent / analyzeDeck functions. Production-only callers were already on V2 (analyze route + Inngest worker); deletion reduces audit surface without behavior change.

  • Quality

    Calibration regression tests

    Two new Vitest tests in calculator-v2.test.ts lock in scoring on representative pitch fixtures. 'Strong-but-not-perfect PL pitch' must land in [7.5, 9.5] with every dimension ≥6. 'Weak pitch with 2 critical errors' must cap at ≤5 with errors_detected.length ≥ 2. These guard against silent score drift when prompts or weights change.

  • Operations

    Marketing copy aligned to code (Phase D)

    /about page: replaced fabricated 30/25/25/20 evaluation buckets with the real 6 dimensions and weights from src/lib/scoring/weights-v2.ts. Replaced unsupported '500+ Pitches Analyzed' / '85% Avg. Score Improvement' with capability stats (6+ Scoring Dimensions, 8 Frames Sampled per Video). 'Official PL Demo Day requirements' → 'PL Demo Day technical and content requirements (per PL Network's published Demo Day Playbook)'. Privacy policy now discloses extracted video frame storage. Landing animated-sections + /assessments/new mode picker updated to mention real visual analysis and the Web3 Thesis Fit dimension.

  • Observability

    Visual + PL evaluation Sentry breadcrumbs

    visual-analyzer.ts and pl-evaluator.ts add breadcrumb categories analysis with messages visual-analysis-complete and pl-evaluation-complete. Filter by these in Sentry to monitor cost / failure rates without mixing them into general analysis traces.

Deploy stabilization

#2026-04-28-deploy-stabilization

Hotfixes resolving a chain of Vercel build failures uncovered after our first production push. End-to-end verified live: edge rate limiter active, healthcheck returning 200, landing p95 ~740ms.

  • Fixed

    Vercel deploy chain unblock

    Three coordinated fixes: bumped inngest ^3.49.1 → ^3.54.0 to satisfy Vercel's marketplace CVE check; pinned @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner to exact 3.967.0 because newer versions transitively pulled ESM-only @nodable/entities@2.1.0 which broke Next.js page-data collection with ERR_REQUIRE_ESM; updated vercel.json installCommand to npm ci --legacy-peer-deps to bypass an irrelevant @sveltejs/kit peerOptional conflict.

  • Fixed

    E2E test fixes from first CI run

    MVP banner test now matches the actual copy. Get-Started + About navigation tests use page.waitForURL with a longer timeout to handle dev-mode first-compile latency. Playwright output directories added to .gitignore.

Production-hardening sprint

#2026-04-27

Multi-phase production-readiness arc covering security, reliability, observability, quality, performance, and operational readiness. The platform graduated from MVP-banner-only to defensibly production.

  • Security

    OAuth account-linking lockdown

    Removed allowDangerousEmailAccountLinking: true from Google + GitHub providers. Eliminates a silent-account-merge vector via OAuth same-email.

  • Security

    AI provider opt-out signaling

    Added X-Pitch-Ready-Origin: commercial-api headers to all OpenAI / Anthropic SDK clients across the analysis pipeline and chat / script / roadmap / investor-view routes. Documents the commercial-API stance explicitly for auditors and abuse monitoring.

  • Security

    Explicit S3 server-side encryption

    PutObjectCommand now sets ServerSideEncryption: AES256 with a matching x-amz-server-side-encryption header on the upload client so the SigV4 signature validates. Defense in depth on top of R2's default at-rest encryption.

  • Security

    Privacy policy data-retention disclosure

    Added an explicit "no automatic deletion" section. Encryption claim clarified to AES-256. Last-updated date refreshed.

  • Security

    Vulnerability disclosure policy

    SECURITY.md added with disclosure email, safe-harbor language, and scope guidelines.

  • Reliability

    Distributed rate limiter (Upstash)

    Replaced the in-memory Map with @upstash/ratelimit sliding-window backed by Vercel KV. Sliding window prevents fixed-window boundary bursts. Fail-open with Sentry warning so a Redis hiccup doesn't take down the API. ephemeralCache for in-process repeat-rejection speedup.

  • Reliability

    Edge rate limiting in middleware

    120/min per-identity ceiling on /api/* (excluding /api/auth and /api/inngest). Identity hierarchy: authenticated userId > IP. Rejections happen at the edge before reaching the route function — saves invocation cost during abuse.

  • Reliability

    Mock-fallback fail-fast

    /api/assessments/[id]/analyze now returns HTTP 503 in production when AI keys are missing instead of silently mocking. Mock requires an explicit MOCK_ANALYSIS=true opt-in (dev only). Boot fails loudly on Vercel if OpenAI / Anthropic / KV env vars are missing.

  • Reliability

    Sentry tag consistency

    Every /api/* catch block now calls Sentry.captureException with a route tag. Filterable by route in Sentry Issues for faster triage.

  • Observability

    Structured logging foundation

    Adopted pino. Production JSON output, dev-mode pretty-print. Service tag on every line. Redaction for authorization / cookie / password / token paths. Migrated all server console calls in the analysis pipeline, Inngest functions, and 18+ API routes to structured logger calls — Vercel logs are now filterable by route, userId, assessmentId, and stage.

  • Observability

    Healthcheck endpoint

    GET /api/health pings Supabase + Upstash in parallel. Returns 200 ok or 503 degraded with a sanitized checks object. No auth — designed for external uptime monitors. Cache-Control: no-store. Includes the deploy SHA for correlation.

  • Observability

    SLO targets documented

    docs/SLOs.md: latency p50/p95/p99 per endpoint, availability targets per surface (99.9% landing, 99.5% upload, 99.0% AI-dependent), error rate (<1% 5xx), error budget arithmetic, breach-response order of operations.

  • Observability

    k6 load testing

    Three scenarios (landing burst, healthcheck flood, rate-limit verification). Weekly Monday cron via GitHub Actions runs against production with 10% smoke load. Manual dispatch supports full load against preview deploys. Thresholds map directly to SLO numbers.

  • Observability

    Operational runbook

    docs/RUNBOOK.md: deploy rollback, rate-limiter KV flush, Inngest stuck-analysis re-trigger, OAuth-binding reset for lost-email-access users, incident-response-by-symptom matrix, quarterly Supabase backup-recovery drill (<30 min RTO target), Sentry/Vercel alert wiring.

  • Added

    Email assessment notifications

    Inngest events fire on analysis completion (gated on user opt-in) and failure (always sends — failure isn't promotional content). Inline-styled HTML + plain text email templates. Idempotency keys on every event prevent duplicate emails on retry.

  • Added

    DB-persisted user preferences

    Replaced localStorage notification settings with profiles table columns. Preferences now sync across devices. Migration 011 added email_on_complete, email_weekly_digest, email_product_updates, browser_notifications, and reduce_motion (the last one provisioned for an upcoming accessibility feature).

  • Quality

    Vitest unit test suite

    47 tests across 5 lib files (rate-limit, helpers, env, retry, scoring calculator). 78% line coverage on the tested surface with thresholds enforced in CI. Vitest 3 + jsdom + @vitest/coverage-v8.

  • Quality

    Multi-browser end-to-end testing

    Playwright now runs across chromium, firefox, and webkit in CI as a parallel matrix. @axe-core/playwright scans landing / login / privacy for serious or critical WCAG 2.1 AA violations.

  • Quality

    CI/CD pipeline

    GitHub Actions workflow with four jobs (lint+typecheck, unit tests, build, e2e × 3 browsers). Dependabot configured for weekly grouped npm updates and monthly Actions updates. Vercel deploy config explicit in vercel.json.

  • Quality

    All ESLint errors resolved

    Fixed three errors including a latent stale-closure bug in the assessment creation flow. MVP banner refactored to useSyncExternalStore for SSR-safe + cross-tab sync. Final state: 0 errors, 37 unused-import warnings (safe to leave).

  • Performance

    Bundle hygiene

    Removed unused html2canvas direct dep (PDF export uses jspdf programmatically). Added framer-motion to optimizePackageImports for tree-shaken motion components.

  • Performance

    Image optimization config

    next/image remotePatterns allowlist for YouTube thumbnails, Supabase storage, and Cloudflare R2. Prevents the image-optimization endpoint from being abused as an open proxy.

  • SEO

    JSON-LD structured data

    Root layout embeds Organization + WebApplication schema in a @graph. Helps search engines render rich snippets for the platform.

  • Operations

    Authentication & account recovery documented

    README now describes the passwordless-only auth stance and links to the manual recovery procedure in RUNBOOK.md for users who lose email access.

Earlier

Pre-2026-04-27 platform iterations

#earlier

Summarized from commit history. Detail is high-level — these features predate the changelog.

  • Added

    Pitch analysis pipeline

    File upload via presigned R2 URLs; Whisper transcription with FFmpeg fallback for >25MB videos; Claude V2 deck-vision + content analysis; 6-dimension scoring calculator with vertebrae framework; iteration tracking via parent_assessment_id.

  • Added

    Reports + exports

    Score gauge, breakdown components, vertebrae checklist, score simulator, roadmap-to-ready, script generator, archetype coaching, benchmark display. Markdown + PDF export.

  • Added

    Claude chat assistant

    Three modes — assistant, coach, investor — with company-profile-aware prompts. Sonnet for everyday assistance, Opus for deeper coaching and investor simulation.

  • Added

    Assessment management + comparison

    CRUD with soft-delete. Side-by-side comparison view showing dimension deltas, finding diffs, and vertebrae status changes between iterations.

  • Added

    Investor view + benchmarking

    Claude-generated FOMO indicators, red flags, predicted investor questions, and verdict. Rule-based fallback when generation fails. Aggregate score benchmarking endpoint.

  • Added

    Public sharing

    Token-based assessment shares with 30-day expiry. Read-only /shared/[token] view for sending reports to advisors or investors.

  • Added

    Authentication

    NextAuth.js v4 with magic-link (Resend SMTP) plus Google + GitHub OAuth. Supabase adapter. 30-day JWT session.

  • Added

    Demo mode

    DEMO_MODE=true env flag enables a credentials provider for local testing without infrastructure. Mocked assessment data for UI development.

  • Added

    Settings, About, Privacy, Terms pages

    Full account settings panel (profile, notifications, appearance, accessibility). Static legal + about pages.

  • Added

    Light/dark theme

    next-themes integration. Defaults to light mode. Landing page uses an amber accent scope that doesn't bleed into other pages.