Phase 0 · Foundation Assessment — BEA-11 LLM-Optimized Product Copy (AEO)

The permanent base layer (Foundation Assessment Document) for all downstream work. Established before analysis; every later phase builds on these criteria, constraints, and assumptions. Skeptical lens: AEO is a known magnet for SEO-rebranded theater, so the foundation is written to make the single load-bearing question — "is LLM-mediated discovery a material purchase channel for Beats earbuds at all?" — visible and gateable from the start, not buried.

1. Success criteria (measurable, time-bound)

[ ] Channel-materiality entry gate answered (precondition for everything else). A ~$15–25k channel-sizing study returns, within 90 days of project start, a defensible point estimate for the share of Beats purchase-consideration that touches an LLM engine (ChatGPT / Perplexity / Gemini). Pass threshold: ≥2% of measured consideration touches an LLM or a credible 18-month trend crossing 2%. If the study returns below threshold, the project does not proceed past the study itself. Metric: measured/estimated LLM-touched consideration %; timeframe: 90 days.
[ ] A clean, statistically-defensible share-of-voice (SoV) baseline exists. Architecture A (Measurement-First Observatory) produces a 60–90-day stable SoV baseline for ≥50 priority queries across ≥2 engines, with a parser validated at ≥0.90 F1 for brand-presence and a separately-reported F1 for competitor-displacement, and SoV reported with a confidence interval that has passed a variance-decomposition step (within-session noise separated from between-session drift). Metric: F1 scores + CI width vs. detectable-effect threshold; timeframe: 150 days from start (post channel-gate).
[ ] The instrument changes copy, not just dashboards (anti-theater criterion). Within 180 days of the SoV baseline going live, ≥1 documented copy/schema change has been made on a Beats-controlled surface in direct response to a "we lost this query to competitor X" backlog item, under a pre-committed "lost-query → ticket within 10 business days" SLA owned by a named copy owner. Metric: count of backlog items resolved via an actual copy change (target ≥5 in first 180 days); timeframe: 180 days.
[ ] Causal-lift go/no-go decision for Architecture B is made on evidence, not faith. Before any catalog-wide authoring commitment, a controlled before/after on one pilot product family shows a probe-measured mention-rate change that is outside the confidence interval (lift or null), producing an explicit scale / don't-scale decision. Metric: pilot lift Δ vs. CI; timeframe: within one full engine-refresh cycle after baseline (≈9–12 months from start).

2. Decision-maker profile (Three Ledgers)

Public ledger (what we pitch): "Beats has no visibility into how often AI engines recommend us versus Bose/Sony/Sonos when consumers ask for earbuds. We will build a measurement instrument to see it, then optimize our product copy in a structured, extractable way to win more of those AI recommendations — a centralized, AEO-scored copy library."
Shadow ledger (what the decision-maker actually optimizes for): Being demonstrably ahead of the "AI changes search" curve without an embarrassing, expensive, or brand-risky misstep on a flagship Apple property. The owner is optimizing for defensible, low-blast-radius progress they can show leadership — a credible early-mover story that survives a CFO's "what did this do to revenue?" and Apple comms' "could this become a story?" The true optimization is reputational and political safety plus a cheap option on a possibly-real channel — not maximum AEO lift.
True ledger (what will actually get built): Architecture A in a compliant, sampled (not daily-exhaustive), diagnostic form — a measurement observatory framed as buying information to de-risk a bigger decision — gated behind the channel-materiality study. B is built only if the channel gate passes and A proves causal lift on a pilot, and even then ships as a non-comparative, own-product-spec library first. C is not built as a program; at most a cheap "canary-attribution-on-owned-surfaces" kernel survives, and only with a standing Apple corporate-comms veto in hand. Net: what gets built first is a ~$120–200k diagnostic, not a ~$565–750k copy program.
Owner / sponsor: Christina (per manifest owner). Single analyst-owner today — a known bus-factor risk (pre-mortem #9); the foundation assumes a documented runbook and a cross-trained backup before A's value loop is relied upon.

3. Constraint inventory

Constraint	Type (hard / soft / assumed)	Notes
LLM vendor API ToS prohibit/throttle automated competitive benchmarking; compliant path is a sampled, attributed enterprise agreement	hard	Brandt (Tier 2) wrote such policy: daily-exhaustive 300–500 × 3 × N≥5 probing is squarely in abuse-detection territory; enterprise terms cost ~3–5x estimate. Forces sampled, non-daily cadence — reshapes A's cost/resolution.
Apple / Beats brand & marketing governance over live flagship product copy	hard	Centralized, conservative. No external/centralized copy library may feed live PDPs without a written, committed publish path (pre-mortem #12). Governance approval is a hard milestone, not a line item.
Apple legal posture on comparative advertising ("X% better than Sony")	hard	Rivera (Tier 1): comparative-claim allowlist is a 6–9-month process per claim family and resets when a comparator's product changes. Forces non-comparative own-product framing as B's base case.
Apple corporate-comms reputational exposure ("Apple is gaming AI" risk)	hard	Brandt + Okafor: covert tracers, shadow pages, and synchronized identical-phrasing syndication are corporate-comms liabilities that dwarf any earbud SoV gain; a comms veto is required before any C-style mechanism.
Engine non-determinism and non-stationarity (per-session, intra-day drift; retrieval/personalization injection)	hard (physical property of the surface)	Natarajan + Li: N repeats are not i.i.d.; CIs built on i.i.d. assumptions are too narrow → false trends. Requires variance-decomposition before any "mention rate moved" claim.
API surface ≠ consumer app surface (retrieval, ads, personalization, memory)	hard	Foundational systemic risk across all phases; the consumer truth lives in apps. Requires a periodic manual app-capture calibration set; API number must never be presented as consumer truth.
Beats PDP/CMS structured-data eligibility & crawlability (Apple-governed, likely closed/server-rendered)	assumed	Feld (Tier 1): integration + schema-eligibility is the real cost ($320k) and the go/no-go for B; probability of clean write-access + eligibility on a flagship Apple surface in Y1 likely <50%. Treated as assumed until a week-1 feasibility spike.
Budget for a diagnostic-first spend (~$120–200k) vs. full program (~$565–750k)	soft	Whitfield (Tier 3): the full program is "dead on arrival" with a CFO; a cheap diagnostic framed as de-risking research is fundable. The ask must be staged.
Headcount: single analyst-owner (Christina), no dedicated ML team	soft	A is feasible at 1.0–1.5 eng + 0.3 analyst FTE; B needs CMS integration engineering; C needs scarce ML talent + four-org coordination Beats lacks authority over. Headcount caps which architectures are reachable.
Schema-as-extraction-surface helps retrieval engines (Perplexity/Google) but ~0 for parametric-recall base ChatGPT	hard (mechanism asymmetry)	Feld: Principle 5 is not universal. AEO targets must be split by engine mechanism (retrieval-augmented vs. parametric-recall); the working copy levers differ per bucket.

4. Scope boundary

In scope: (1) A channel-materiality study as the entry gate. (2) A compliant, sampled, diagnostic measurement instrument (Architecture A) producing a validated SoV baseline + competitor-displacement signal on Beats-controlled-relevant queries. (3) A non-comparative, own-product-spec authoring template + JSON-LD twin + discrepancy linter (Architecture B) on a single pilot product family only after the channel gate passes and A proves causal lift. (4) A pre-committed downstream copy-owner + SLA so the instrument changes copy. (5) Split of AEO targets by engine mechanism.
Out of scope: (1) Catalog-wide copy rewrite before a proven causal pilot. (2) Comparative ("better than Sony") claims absent a pre-cleared legal allowlist. (3) Covert canary tracers in live brand copy, non-public shadow product pages, synchronized multi-domain identical-phrasing syndication, and probe-as-RLHF (Architecture C as a program — near-disqualified by Tier 2 on reputational + scientific grounds). (4) Daily-exhaustive probing (ToS-noncompliant). (5) Presenting an API-derived number as the consumer truth. (6) Any claim that mention-rate equals revenue absent an established link.
Recursion budget acknowledged: max 3 per phase. recursionCount is currently 0. Phase 5 already recommends exactly one targeted Phase-4 recursion (re-model A for sampled cadence, re-weight B to non-comparative base case, reclassify C downward) plus two new entry preconditions — well inside budget; not a return to Foundation.

5. Assumption log

#	Assumption	Confidence (0–1)	Validated?
1	LLM-mediated discovery is (or is credibly becoming) a material purchase channel for Beats earbuds (the load-bearing premise of the whole project)	0.35	No — explicitly unvalidated; elevated from kill-criterion to entry gate. Marsh argues earbuds are low-consideration/high-impulse, where LLM research behavior is weak.
2	Public LLM API outputs are a faithful-enough proxy for what consumers see in the apps	0.45	No — requires periodic manual app-capture calibration; >25% sustained divergence kills A's headline metric.
3	A compliant, sampled probing cadence still yields a statistically defensible SoV signal after variance-decomposition and observer-effect control	0.60	No — Natarajan (non-stationarity), Li (observer effect on RAG engines), Brandt (ToS forces sampling) all qualify this; testable in A's first 90 days.
4	Beats PDPs are structured-data-eligible and crawlable, with an accessible integration surface for a linter/auto-publish	0.40	No — Feld rates Y1 probability <50%; resolved only by a week-1 integration feasibility spike.
5	The three-layer template + schema twin causally lifts mention rate (not merely correlates), and does so across all three target engines	0.35	No — the core B bet; schema lift is real for retrieval engines and ~0 for parametric-recall ChatGPT (Feld); must be proven on a pilot before scaling.
6	Apple/Beats governance will grant a committed live-publish path to at least one Beats-controlled surface within ~3 months	0.40	No — dominant org risk; hard milestone (pre-mortem #12). Likely first granted on a non-flagship surface, if at all.
7	Canary phrases survive engine ingestion and are elicitable verbatim (Architecture C's attribution premise)	0.10	No — Li judges verbatim survival effectively zero post-alignment after NYT litigation; the "distinctive fact pattern" fallback is unfalsifiable. Treated as near-dead.
8	Leadership/finance will accept "mention rate" as a meaningful, fundable metric — but only when framed as diagnostic information, not as a revenue-lift claim	0.65	Partially — Whitfield confirms the diagnostic framing is fundable (~$120–200k) while the program framing (~$565–750k) is not. The framing, not the metric, is the validated part.

Gate met for 0-foundation: success criteria defined (channel-materiality entry gate, validated SoV baseline, anti-theater copy-change criterion, evidence-based B causal go/no-go — all measurable and time-bound); constraint inventory mapped across data/API-ToS, Apple brand/legal/comms governance, engine non-determinism, API-vs-app fidelity, CMS/schema eligibility, budget, and headcount with hard/soft/assumed typing; decision-maker identified via Three Ledgers (owner/sponsor Christina) with the true ledger reconciled to the validated A-first, diagnostic-first, channel-gated plan; assumption log with eight project-grounded assumptions and confidences recorded → set phaseGates.0-foundation = passed in manifest.json.

CopyIQ

UPF Phase Progression