Changelog

Every notable change to Pitch Ready — what shipped, when, and why. Security, reliability, observability, and product features grouped by release.

June 10, 2026

Production-readiness pass 2 — security, scoring trust, and your data rights

#2026-06-10-production-readiness-pass-2

A full-codebase audit followed by remediation across security, AI cost controls, scoring integrity, and data rights. Scores are now deterministic and evidence-verified, and account deletion / data export are self-serve.

Security
Server-side request forgery on website imports fully closed
URL imports now validate every resolved address at connection time (DNS rebinding and redirect tricks no longer work), follow redirects manually with re-validation, and cap response size.
Security
Sign-in endpoints strictly rate limited
Magic-link sends are capped per IP at the NextAuth endpoint itself, closing an email-bombing bypass of the earlier limit.
Security
Upload confirmation can only attach your own files
Storage paths are verified against your own upload prefix and real object sizes are re-checked server-side.
Reliability
Identical pitches now score identically
All scoring AI calls run at temperature 0. Re-submitting the same deck and video produces the same scores — previously they could drift run to run.
Reliability
Every quoted line of evidence is verified against your transcript
Quotes shown as "From your pitch" (vertebrae evidence, rewrites, component extracts, PL thesis evidence) are checked against what you actually said; anything unverifiable is dropped instead of shown.
Reliability
PL Demo Day scoring is transparent about degradation
If the investment-thesis evaluation can't run, the report now says so explicitly instead of silently scoring without the Web3 Thesis Fit dimension. The dimension score is also derived from its section scores rather than the model's own aggregate.
Reliability
Stuck analyses recover automatically
Double-clicks can no longer start duplicate pipelines, a failed queue send rolls your assessment back so you can retry, and an hourly reaper fails any run stuck over two hours instead of leaving an eternal progress bar.
Added
Delete your account — for real
Settings → Account now has permanent account deletion (all files, analyses, and profile data, hard-deleted) and a one-click JSON export of everything we hold about you.
Added
Avatar upload, reduce-motion, and assessment pagination
Profile pictures are live, the reduce-motion preference now actually reduces motion, and long assessment lists paginate.
Fixed
Honest settings and privacy policy
Removed notification toggles that had no sender behind them; the privacy policy now discloses 90-day frame-sample retention and describes deletion accurately. Analytics only load after cookie consent.
Operations
Dependency security updates
Next.js, jspdf, and the AWS SDK updated past their open advisories; per-user daily AI budgets added across all coaching and generation endpoints.

April 28, 2026

Analysis pipeline — robustness hardening

#2026-04-28-robustness-hardening

A three-track follow-up to the truth-in-pipeline initiative. The pipeline went from "honest about what it measures" to "resilient under failure modes that actually happen" — schema-validated AI outputs, cost & abuse guardrails, and idempotent state writes.

Reliability
Zod schemas on every Claude parse boundary
All four analyzers (PL evaluator, visual analyzer, video content, deck) now validate Claude responses against Zod schemas (PlEvaluationSchema, VisualAnalysisSchema, VideoAnalysisV2Schema with .passthrough(), DeckAnalysisV2Schema). New parseAndValidateClaudeJson + ClaudeOutputValidationError. Validation failures throw a typed error that withRetry retries (Claude shape errors fix on second attempt ~70-80% of the time per Anthropic guidance). Defensive null guards in calculator-v2.buildWeb3ThesisFitDimension via safeSection() — pre-schema DB rows or schema-bypass paths now degrade gracefully instead of crashing the report renderer.
Reliability
Whisper transcript quality gate
New validateTranscript helper detects empty results, repeated-phrase hallucinations (Whisper's classic silence-as-"uh uh uh" failure), suspiciously low word density (silent microphone), and language mismatches. Pipeline marks the transcription stage failed on severity=fail so content + PL eval don't get fed nonsense; the analysis_quality.transcript_quality block surfaces in the report so founders see "we couldn't hear your audio" instead of an unexplained low score. 7 new tests cover the heuristic edges.
Security
Per-user daily analysis quota
Configurable via MAX_ANALYSES_PER_USER_PER_DAY env (default 20). Counts non-draft assessments started in the last 24h. Returns 429 daily_quota_exceeded with used/limit numbers in the response. Bounds cost-burn from a compromised user account regardless of IP rotation — the per-IP rate limit (10/min) was insufficient on its own to prevent a determined attacker from running ~600 pitches/hour at $90-120/hr in API cost.
Reliability
Inngest idempotency on the analysis run
Analyze route now sends events with id: analyze:{assessmentId}:{analysis_started_at}. Inngest dedupes within its 24h window — a network blip, double-click, or HTTP-layer retry can no longer trigger duplicate $0.20 pipelines. Existing completion + failure events already had idempotency keys; now the inbound trigger does too.
Observability
Token + cost tracking per assessment
Migration 015 adds assessments.token_usage jsonb + cost_cents numeric(10,2). New CostCollector class threads through every analyzer via an optional recordUsage callback; pipeline aggregates and persists at save. Sonnet pricing table ($3 input / $15 output per MTok) lets us compute per-assessment cost without hitting Anthropic's API. Composite index on (user_id, analysis_started_at) enables fast cost-per-user dashboards. Whisper excluded — it bills per audio-minute, not per token.
Operations
R2 frame-samples lifecycle cron
New Inngest cron cleanupFrameSamples runs Sunday 04:23 UTC with 90-day retention. Lists R2 objects under frame-samples/ via paginated ListObjectsV2, batch-deletes via DeleteObjectsCommand (up to 1000 keys per request). Logs scanned/deleted/bytesFreed for visibility. Without this, ~2 MB/pitch in extracted PNG frames would accumulate forever.
Reliability
Re-analysis idempotency (DELETE-before-INSERT)
Pipeline now wipes existing assessment_scores + findings rows for the assessment before inserting the new run's results. Closes a duplicate-row accumulation gap exposed by the audit: any path that re-triggered the pipeline (Inngest retry, admin reset, future re-analyze feature) would otherwise fan-out V1+V2 score rows alongside each other, corrupting report queries. assessment_history is intentionally preserved — each run is a meaningful iteration. Extracted as testable helper clearAssessmentDerivedRows; 4 regression tests lock in the pattern.
Reliability
Orphan frame cleanup on visual-stage failure
When analyzeVisualDelivery fails after frame upload, the inline catch now best-effort-deletes the R2 frames + clears the uploads.video_frame_samples manifest before returning. Without this, frames squat in R2 for up to 90 days waiting for the cleanup cron, accumulating storage cost on every failed run.
Security
Soft-delete enforcement on iteration history
assessment-history.ts:getAssessmentIterations traversed parent_assessment_id chains without filtering deleted_at — a soft-deleted assessment could surface its descendants (and vice versa) in the iteration timeline. Both the entry-point query and the parent-walk query now filter deleted_at IS NULL. Other suspect routes (compare, investor-view, shared/[token]) audited and confirmed already protected.
Observability
Pipeline stage write failure visibility
Fire-and-forget pipeline_stage writes used to swallow errors silently — leaving the UI showing stale progress with no operator visibility. Stage write failures now Sentry.captureMessage with a pipeline-stage tag so a stuck progress bar shows up in observability instead of being invisible.
Quality
Test suite expanded from 62 to 98 tests
16 schema validation tests, 7 Whisper transcript-quality tests, 8 cost tracker tests, 4 idempotency regression tests, 1 calculator defensive-null test. Coverage targets the new failure modes the audit found, so a future regression that re-introduces the silently-discarded-error pattern will fail noisily in CI.

April 28, 2026

Analysis pipeline — truth-in-pipeline initiative

#2026-04-28-truth-in-pipeline

A five-phase audit and remediation of the analysis pipeline. Closes the gap between what the platform claims to measure and what the code actually produces, then deepens PL Demo Day evaluation with a dedicated investment-thesis dimension.

Added
Real video frame analysis (Phase B)
FFmpeg-driven frame sampler extracts up to 8 PNG frames per pitch (opening, closing, plus the start of each transcript segment >15s). All frames are stored under frame-samples/{assessmentId}/ in R2 and analyzed by Claude 3.5 Sonnet vision in a single call. Replaces the prior behavior where eye_contact_notes, energy_level, framing, lighting, background, and attire were all fabricated by Claude from transcript text. Wrapped behind ENABLE_VIDEO_FRAME_ANALYSIS env flag (default true in prod, false in dev/CI). Adds ~$0.04 per pitch and 3-8s of latency.
Added
PL Demo Day investment-thesis evaluation (Phase C)
New 7th scoring dimension — web3_thesis_fit (15%) — added when targetEvent=pl_demo_day, with the existing 6 dimensions scaled down proportionally so the total stays 100%. PL evaluation runs a dedicated Claude call covering web3 necessity, tokenomics, protocol-vs-application classification, community traction, composability, capital efficiency, and founder track record. Sourced from the PL_Pitch_Mastery.md Graph Paper Capital workshop notes and the PL Demo Day Playbook (Dec 2025). General mode keeps the legacy 6-dimension shape — no web3-specific scoring.
Added
Server-side metadata extraction (Phase A)
PDFs run through pdf-lib at upload-confirm time to extract authoritative page_count + first-page dimensions. Videos run through ffprobe (via @ffprobe-installer/ffprobe) over a presigned URL — fast, no full-buffer download. Replaces the prior behavior where the client could under-report page_count (dodging the PL single-page critical-error cap) or claim a 4-minute video was 3 minutes. Falls back to client-provided values on extraction failure.
Added
Materialized benchmark distributions (Phase E)
New benchmark_distributions table seeded with synthetic baselines (sample_size=0, is_synthetic=true). The /api/benchmarks route now serves materialized rows by mode (PL Demo Day vs general) instead of recomputing live aggregates per request. Inngest cron benchmarks/refresh runs nightly at 03:17 UTC and flips a bucket to is_synthetic=false once it has 30+ real completed assessments. Report UI now discloses 'Based on N=X completed PL Demo Day assessments' or 'Estimated baseline' so percentile rankings are never silently presented as data-backed when they aren't.
Fixed
Fabricated visual delivery removed
Before this release, Claude was prompted to assess eye_contact_notes, posture, attire, and energy_level from the transcript alone — observations it has no way to verify. The unified-assessment.ts content prompt now either receives a VisualDeliveryContext block from the visual analyzer (real frame observations) or an explicit instruction to leave eye_contact_notes / energy_level null. No more invented visual delivery.
Fixed
PDF page_count silently defaulting to 1
Pipeline previously read deckUpload.page_count || 1, which meant every multi-page deck was treated as single-page — silently dodging the PL Demo Day critical-error cap (deck_multiple_pages, score capped at 5). Page count is now extracted server-side at upload-confirm and the cap fires correctly.
Fixed
Misnamed prompt files renamed (Phase C.1)
prompts/yc-criteria.ts → demo-day-best-practices.ts (it never contained YC-specific evaluation — only generic Demo Day craft). prompts/pl-criteria.ts → pl-technical-criteria.ts (technical specs only — content evaluation now lives in the new pl-evaluation-criteria.ts module). The Vertebrae four-question framework is now properly attributed to Geoff Ralston (Y Combinator) in code comments, the analysis prompt, and the report UI.
Quality
V1 dead code deleted
Removed runAnalysisPipeline (V1, ~320 lines), src/lib/scoring/calculator.ts (V1 calculator, ~620 lines), and the V1 analyzeVideoContent / analyzeDeck functions. Production-only callers were already on V2 (analyze route + Inngest worker); deletion reduces audit surface without behavior change.
Quality
Calibration regression tests
Two new Vitest tests in calculator-v2.test.ts lock in scoring on representative pitch fixtures. 'Strong-but-not-perfect PL pitch' must land in [7.5, 9.5] with every dimension ≥6. 'Weak pitch with 2 critical errors' must cap at ≤5 with errors_detected.length ≥ 2. These guard against silent score drift when prompts or weights change.
Operations
Marketing copy aligned to code (Phase D)
/about page: replaced fabricated 30/25/25/20 evaluation buckets with the real 6 dimensions and weights from src/lib/scoring/weights-v2.ts. Replaced unsupported '500+ Pitches Analyzed' / '85% Avg. Score Improvement' with capability stats (6+ Scoring Dimensions, 8 Frames Sampled per Video). 'Official PL Demo Day requirements' → 'PL Demo Day technical and content requirements (per PL Network's published Demo Day Playbook)'. Privacy policy now discloses extracted video frame storage. Landing animated-sections + /assessments/new mode picker updated to mention real visual analysis and the Web3 Thesis Fit dimension.
Observability
Visual + PL evaluation Sentry breadcrumbs
visual-analyzer.ts and pl-evaluator.ts add breadcrumb categories analysis with messages visual-analysis-complete and pl-evaluation-complete. Filter by these in Sentry to monitor cost / failure rates without mixing them into general analysis traces.

April 28, 2026

Deploy stabilization

#2026-04-28-deploy-stabilization

Hotfixes resolving a chain of Vercel build failures uncovered after our first production push. End-to-end verified live: edge rate limiter active, healthcheck returning 200, landing p95 ~740ms.

Fixed
Vercel deploy chain unblock
Three coordinated fixes: bumped inngest ^3.49.1 → ^3.54.0 to satisfy Vercel's marketplace CVE check; pinned @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner to exact 3.967.0 because newer versions transitively pulled ESM-only @nodable/entities@2.1.0 which broke Next.js page-data collection with ERR_REQUIRE_ESM; updated vercel.json installCommand to npm ci --legacy-peer-deps to bypass an irrelevant @sveltejs/kit peerOptional conflict.
Fixed
E2E test fixes from first CI run
MVP banner test now matches the actual copy. Get-Started + About navigation tests use page.waitForURL with a longer timeout to handle dev-mode first-compile latency. Playwright output directories added to .gitignore.

April 27, 2026

Production-hardening sprint

#2026-04-27

Multi-phase production-readiness arc covering security, reliability, observability, quality, performance, and operational readiness. The platform graduated from MVP-banner-only to defensibly production.

Security
OAuth account-linking lockdown
Removed allowDangerousEmailAccountLinking: true from Google + GitHub providers. Eliminates a silent-account-merge vector via OAuth same-email.
Security
AI provider opt-out signaling
Added X-Pitch-Ready-Origin: commercial-api headers to all OpenAI / Anthropic SDK clients across the analysis pipeline and chat / script / roadmap / investor-view routes. Documents the commercial-API stance explicitly for auditors and abuse monitoring.
Security
Explicit S3 server-side encryption
PutObjectCommand now sets ServerSideEncryption: AES256 with a matching x-amz-server-side-encryption header on the upload client so the SigV4 signature validates. Defense in depth on top of R2's default at-rest encryption.
Security
Privacy policy data-retention disclosure
Added an explicit "no automatic deletion" section. Encryption claim clarified to AES-256. Last-updated date refreshed.
Security
Vulnerability disclosure policy
SECURITY.md added with disclosure email, safe-harbor language, and scope guidelines.
Reliability
Distributed rate limiter (Upstash)
Replaced the in-memory Map with @upstash/ratelimit sliding-window backed by Vercel KV. Sliding window prevents fixed-window boundary bursts. Fail-open with Sentry warning so a Redis hiccup doesn't take down the API. ephemeralCache for in-process repeat-rejection speedup.
Reliability
Edge rate limiting in middleware
120/min per-identity ceiling on /api/* (excluding /api/auth and /api/inngest). Identity hierarchy: authenticated userId > IP. Rejections happen at the edge before reaching the route function — saves invocation cost during abuse.
Reliability
Mock-fallback fail-fast
/api/assessments/[id]/analyze now returns HTTP 503 in production when AI keys are missing instead of silently mocking. Mock requires an explicit MOCK_ANALYSIS=true opt-in (dev only). Boot fails loudly on Vercel if OpenAI / Anthropic / KV env vars are missing.
Reliability
Sentry tag consistency
Every /api/* catch block now calls Sentry.captureException with a route tag. Filterable by route in Sentry Issues for faster triage.
Observability
Structured logging foundation
Adopted pino. Production JSON output, dev-mode pretty-print. Service tag on every line. Redaction for authorization / cookie / password / token paths. Migrated all server console calls in the analysis pipeline, Inngest functions, and 18+ API routes to structured logger calls — Vercel logs are now filterable by route, userId, assessmentId, and stage.
Observability
Healthcheck endpoint
GET /api/health pings Supabase + Upstash in parallel. Returns 200 ok or 503 degraded with a sanitized checks object. No auth — designed for external uptime monitors. Cache-Control: no-store. Includes the deploy SHA for correlation.
Observability
SLO targets documented
docs/SLOs.md: latency p50/p95/p99 per endpoint, availability targets per surface (99.9% landing, 99.5% upload, 99.0% AI-dependent), error rate (<1% 5xx), error budget arithmetic, breach-response order of operations.
Observability
k6 load testing
Three scenarios (landing burst, healthcheck flood, rate-limit verification). Weekly Monday cron via GitHub Actions runs against production with 10% smoke load. Manual dispatch supports full load against preview deploys. Thresholds map directly to SLO numbers.
Observability
Operational runbook
docs/RUNBOOK.md: deploy rollback, rate-limiter KV flush, Inngest stuck-analysis re-trigger, OAuth-binding reset for lost-email-access users, incident-response-by-symptom matrix, quarterly Supabase backup-recovery drill (<30 min RTO target), Sentry/Vercel alert wiring.
Added
Email assessment notifications
Inngest events fire on analysis completion (gated on user opt-in) and failure (always sends — failure isn't promotional content). Inline-styled HTML + plain text email templates. Idempotency keys on every event prevent duplicate emails on retry.
Added
DB-persisted user preferences
Replaced localStorage notification settings with profiles table columns. Preferences now sync across devices. Migration 011 added email_on_complete, email_weekly_digest, email_product_updates, browser_notifications, and reduce_motion (the last one provisioned for an upcoming accessibility feature).
Quality
Vitest unit test suite
47 tests across 5 lib files (rate-limit, helpers, env, retry, scoring calculator). 78% line coverage on the tested surface with thresholds enforced in CI. Vitest 3 + jsdom + @vitest/coverage-v8.
Quality
Multi-browser end-to-end testing
Playwright now runs across chromium, firefox, and webkit in CI as a parallel matrix. @axe-core/playwright scans landing / login / privacy for serious or critical WCAG 2.1 AA violations.
Quality
CI/CD pipeline
GitHub Actions workflow with four jobs (lint+typecheck, unit tests, build, e2e × 3 browsers). Dependabot configured for weekly grouped npm updates and monthly Actions updates. Vercel deploy config explicit in vercel.json.
Quality
All ESLint errors resolved
Fixed three errors including a latent stale-closure bug in the assessment creation flow. MVP banner refactored to useSyncExternalStore for SSR-safe + cross-tab sync. Final state: 0 errors, 37 unused-import warnings (safe to leave).
Performance
Bundle hygiene
Removed unused html2canvas direct dep (PDF export uses jspdf programmatically). Added framer-motion to optimizePackageImports for tree-shaken motion components.
Performance
Image optimization config
next/image remotePatterns allowlist for YouTube thumbnails, Supabase storage, and Cloudflare R2. Prevents the image-optimization endpoint from being abused as an open proxy.
SEO
JSON-LD structured data
Root layout embeds Organization + WebApplication schema in a @graph. Helps search engines render rich snippets for the platform.
Operations
Authentication & account recovery documented
README now describes the passwordless-only auth stance and links to the manual recovery procedure in RUNBOOK.md for users who lose email access.

Earlier

Pre-2026-04-27 platform iterations

#earlier

Summarized from commit history. Detail is high-level — these features predate the changelog.

Added
Pitch analysis pipeline
File upload via presigned R2 URLs; Whisper transcription with FFmpeg fallback for >25MB videos; Claude V2 deck-vision + content analysis; 6-dimension scoring calculator with vertebrae framework; iteration tracking via parent_assessment_id.
Added
Reports + exports
Score gauge, breakdown components, vertebrae checklist, score simulator, roadmap-to-ready, script generator, archetype coaching, benchmark display. Markdown + PDF export.
Added
Claude chat assistant
Three modes — assistant, coach, investor — with company-profile-aware prompts. Sonnet for everyday assistance, Opus for deeper coaching and investor simulation.
Added
Assessment management + comparison
CRUD with soft-delete. Side-by-side comparison view showing dimension deltas, finding diffs, and vertebrae status changes between iterations.
Added
Investor view + benchmarking
Claude-generated FOMO indicators, red flags, predicted investor questions, and verdict. Rule-based fallback when generation fails. Aggregate score benchmarking endpoint.
Added
Public sharing
Token-based assessment shares with 30-day expiry. Read-only /shared/[token] view for sending reports to advisors or investors.
Added
Authentication
NextAuth.js v4 with magic-link (Resend SMTP) plus Google + GitHub OAuth. Supabase adapter. 30-day JWT session.
Added
Demo mode
DEMO_MODE=true env flag enables a credentials provider for local testing without infrastructure. Mocked assessment data for UI development.
Added
Settings, About, Privacy, Terms pages
Full account settings panel (profile, notifications, appearance, accessibility). Static legal + about pages.
Added
Light/dark theme
next-themes integration. Defaults to light mode. Landing page uses an amber accent scope that doesn't bleed into other pages.

Production-readiness pass 2 — security, scoring trust, and your data rights

Server-side request forgery on website imports fully closed

Sign-in endpoints strictly rate limited

Upload confirmation can only attach your own files

Identical pitches now score identically

Every quoted line of evidence is verified against your transcript

PL Demo Day scoring is transparent about degradation

Stuck analyses recover automatically

Delete your account — for real

Avatar upload, reduce-motion, and assessment pagination

Honest settings and privacy policy

Dependency security updates

Analysis pipeline — robustness hardening

Zod schemas on every Claude parse boundary

Whisper transcript quality gate

Per-user daily analysis quota

Inngest idempotency on the analysis run

Token + cost tracking per assessment

R2 frame-samples lifecycle cron

Re-analysis idempotency (DELETE-before-INSERT)

Orphan frame cleanup on visual-stage failure

Soft-delete enforcement on iteration history

Pipeline stage write failure visibility

Test suite expanded from 62 to 98 tests

Analysis pipeline — truth-in-pipeline initiative

Real video frame analysis (Phase B)

PL Demo Day investment-thesis evaluation (Phase C)

Server-side metadata extraction (Phase A)

Materialized benchmark distributions (Phase E)

Fabricated visual delivery removed

PDF page_count silently defaulting to 1

Misnamed prompt files renamed (Phase C.1)

V1 dead code deleted

Calibration regression tests

Marketing copy aligned to code (Phase D)

Visual + PL evaluation Sentry breadcrumbs

Deploy stabilization

Vercel deploy chain unblock

E2E test fixes from first CI run

Production-hardening sprint

OAuth account-linking lockdown

AI provider opt-out signaling

Explicit S3 server-side encryption

Privacy policy data-retention disclosure

Vulnerability disclosure policy

Distributed rate limiter (Upstash)

Edge rate limiting in middleware

Mock-fallback fail-fast

Sentry tag consistency

Structured logging foundation

Healthcheck endpoint

SLO targets documented

k6 load testing

Operational runbook

Email assessment notifications

DB-persisted user preferences

Vitest unit test suite

Multi-browser end-to-end testing

CI/CD pipeline

All ESLint errors resolved

Bundle hygiene

Image optimization config

JSON-LD structured data

Authentication & account recovery documented

Pre-2026-04-27 platform iterations

Pitch analysis pipeline

Reports + exports

Claude chat assistant

Assessment management + comparison

Investor view + benchmarking

Public sharing

Authentication

Demo mode

Settings, About, Privacy, Terms pages

Light/dark theme