Phase 0 · Foundation Assessment — BEA-11 LLM-Optimized Product Copy (AEO)
The permanent base layer (Foundation Assessment Document) for all downstream work. Established before analysis; every later phase builds on these criteria, constraints, and assumptions. Skeptical lens: AEO is a known magnet for SEO-rebranded theater, so the foundation is written to make the single load-bearing question — "is LLM-mediated discovery a material purchase channel for Beats earbuds at all?" — visible and gateable from the start, not buried.
1. Success criteria (measurable, time-bound)
- [ ] Channel-materiality entry gate answered (precondition for everything else). A ~$15–25k channel-sizing study returns, within 90 days of project start, a defensible point estimate for the share of Beats purchase-consideration that touches an LLM engine (ChatGPT / Perplexity / Gemini). Pass threshold: ≥2% of measured consideration touches an LLM or a credible 18-month trend crossing 2%. If the study returns below threshold, the project does not proceed past the study itself. Metric: measured/estimated LLM-touched consideration %; timeframe: 90 days.
- [ ] A clean, statistically-defensible share-of-voice (SoV) baseline exists. Architecture A (Measurement-First Observatory) produces a 60–90-day stable SoV baseline for ≥50 priority queries across ≥2 engines, with a parser validated at ≥0.90 F1 for brand-presence and a separately-reported F1 for competitor-displacement, and SoV reported with a confidence interval that has passed a variance-decomposition step (within-session noise separated from between-session drift). Metric: F1 scores + CI width vs. detectable-effect threshold; timeframe: 150 days from start (post channel-gate).
- [ ] The instrument changes copy, not just dashboards (anti-theater criterion). Within 180 days of the SoV baseline going live, ≥1 documented copy/schema change has been made on a Beats-controlled surface in direct response to a "we lost this query to competitor X" backlog item, under a pre-committed "lost-query → ticket within 10 business days" SLA owned by a named copy owner. Metric: count of backlog items resolved via an actual copy change (target ≥5 in first 180 days); timeframe: 180 days.
- [ ] Causal-lift go/no-go decision for Architecture B is made on evidence, not faith. Before any catalog-wide authoring commitment, a controlled before/after on one pilot product family shows a probe-measured mention-rate change that is outside the confidence interval (lift or null), producing an explicit scale / don't-scale decision. Metric: pilot lift Δ vs. CI; timeframe: within one full engine-refresh cycle after baseline (≈9–12 months from start).
2. Decision-maker profile (Three Ledgers)
- Public ledger (what we pitch): "Beats has no visibility into how often AI engines recommend us versus Bose/Sony/Sonos when consumers ask for earbuds. We will build a measurement instrument to see it, then optimize our product copy in a structured, extractable way to win more of those AI recommendations — a centralized, AEO-scored copy library."
- Shadow ledger (what the decision-maker actually optimizes for): Being demonstrably ahead of the "AI changes search" curve without an embarrassing, expensive, or brand-risky misstep on a flagship Apple property. The owner is optimizing for defensible, low-blast-radius progress they can show leadership — a credible early-mover story that survives a CFO's "what did this do to revenue?" and Apple comms' "could this become a story?" The true optimization is reputational and political safety plus a cheap option on a possibly-real channel — not maximum AEO lift.
- True ledger (what will actually get built): Architecture A in a compliant, sampled (not daily-exhaustive), diagnostic form — a measurement observatory framed as buying information to de-risk a bigger decision — gated behind the channel-materiality study. B is built only if the channel gate passes and A proves causal lift on a pilot, and even then ships as a non-comparative, own-product-spec library first. C is not built as a program; at most a cheap "canary-attribution-on-owned-surfaces" kernel survives, and only with a standing Apple corporate-comms veto in hand. Net: what gets built first is a ~$120–200k diagnostic, not a ~$565–750k copy program.
- Owner / sponsor: Christina (per manifest
owner). Single analyst-owner today — a known bus-factor risk (pre-mortem #9); the foundation assumes a documented runbook and a cross-trained backup before A's value loop is relied upon.
3. Constraint inventory
| Constraint | Type (hard / soft / assumed) | Notes |
|---|---|---|
| LLM vendor API ToS prohibit/throttle automated competitive benchmarking; compliant path is a sampled, attributed enterprise agreement | hard | Brandt (Tier 2) wrote such policy: daily-exhaustive 300–500 × 3 × N≥5 probing is squarely in abuse-detection territory; enterprise terms cost ~3–5x estimate. Forces sampled, non-daily cadence — reshapes A's cost/resolution. |
| Apple / Beats brand & marketing governance over live flagship product copy | hard | Centralized, conservative. No external/centralized copy library may feed live PDPs without a written, committed publish path (pre-mortem #12). Governance approval is a hard milestone, not a line item. |
| Apple legal posture on comparative advertising ("X% better than Sony") | hard | Rivera (Tier 1): comparative-claim allowlist is a 6–9-month process per claim family and resets when a comparator's product changes. Forces non-comparative own-product framing as B's base case. |
| Apple corporate-comms reputational exposure ("Apple is gaming AI" risk) | hard | Brandt + Okafor: covert tracers, shadow pages, and synchronized identical-phrasing syndication are corporate-comms liabilities that dwarf any earbud SoV gain; a comms veto is required before any C-style mechanism. |
| Engine non-determinism and non-stationarity (per-session, intra-day drift; retrieval/personalization injection) | hard (physical property of the surface) | Natarajan + Li: N repeats are not i.i.d.; CIs built on i.i.d. assumptions are too narrow → false trends. Requires variance-decomposition before any "mention rate moved" claim. |
| API surface ≠ consumer app surface (retrieval, ads, personalization, memory) | hard | Foundational systemic risk across all phases; the consumer truth lives in apps. Requires a periodic manual app-capture calibration set; API number must never be presented as consumer truth. |
| Beats PDP/CMS structured-data eligibility & crawlability (Apple-governed, likely closed/server-rendered) | assumed | Feld (Tier 1): integration + schema-eligibility is the real cost ($320k) and the go/no-go for B; probability of clean write-access + eligibility on a flagship Apple surface in Y1 likely <50%. Treated as assumed until a week-1 feasibility spike. |
| Budget for a diagnostic-first spend (~$120–200k) vs. full program (~$565–750k) | soft | Whitfield (Tier 3): the full program is "dead on arrival" with a CFO; a cheap diagnostic framed as de-risking research is fundable. The ask must be staged. |
| Headcount: single analyst-owner (Christina), no dedicated ML team | soft | A is feasible at 1.0–1.5 eng + 0.3 analyst FTE; B needs CMS integration engineering; C needs scarce ML talent + four-org coordination Beats lacks authority over. Headcount caps which architectures are reachable. |
| Schema-as-extraction-surface helps retrieval engines (Perplexity/Google) but ~0 for parametric-recall base ChatGPT | hard (mechanism asymmetry) | Feld: Principle 5 is not universal. AEO targets must be split by engine mechanism (retrieval-augmented vs. parametric-recall); the working copy levers differ per bucket. |
4. Scope boundary
- In scope: (1) A channel-materiality study as the entry gate. (2) A compliant, sampled, diagnostic measurement instrument (Architecture A) producing a validated SoV baseline + competitor-displacement signal on Beats-controlled-relevant queries. (3) A non-comparative, own-product-spec authoring template + JSON-LD twin + discrepancy linter (Architecture B) on a single pilot product family only after the channel gate passes and A proves causal lift. (4) A pre-committed downstream copy-owner + SLA so the instrument changes copy. (5) Split of AEO targets by engine mechanism.
- Out of scope: (1) Catalog-wide copy rewrite before a proven causal pilot. (2) Comparative ("better than Sony") claims absent a pre-cleared legal allowlist. (3) Covert canary tracers in live brand copy, non-public shadow product pages, synchronized multi-domain identical-phrasing syndication, and probe-as-RLHF (Architecture C as a program — near-disqualified by Tier 2 on reputational + scientific grounds). (4) Daily-exhaustive probing (ToS-noncompliant). (5) Presenting an API-derived number as the consumer truth. (6) Any claim that mention-rate equals revenue absent an established link.
- Recursion budget acknowledged: max 3 per phase.
recursionCountis currently 0. Phase 5 already recommends exactly one targeted Phase-4 recursion (re-model A for sampled cadence, re-weight B to non-comparative base case, reclassify C downward) plus two new entry preconditions — well inside budget; not a return to Foundation.
5. Assumption log
| # | Assumption | Confidence (0–1) | Validated? |
|---|---|---|---|
| 1 | LLM-mediated discovery is (or is credibly becoming) a material purchase channel for Beats earbuds (the load-bearing premise of the whole project) | 0.35 | No — explicitly unvalidated; elevated from kill-criterion to entry gate. Marsh argues earbuds are low-consideration/high-impulse, where LLM research behavior is weak. |
| 2 | Public LLM API outputs are a faithful-enough proxy for what consumers see in the apps | 0.45 | No — requires periodic manual app-capture calibration; >25% sustained divergence kills A's headline metric. |
| 3 | A compliant, sampled probing cadence still yields a statistically defensible SoV signal after variance-decomposition and observer-effect control | 0.60 | No — Natarajan (non-stationarity), Li (observer effect on RAG engines), Brandt (ToS forces sampling) all qualify this; testable in A's first 90 days. |
| 4 | Beats PDPs are structured-data-eligible and crawlable, with an accessible integration surface for a linter/auto-publish | 0.40 | No — Feld rates Y1 probability <50%; resolved only by a week-1 integration feasibility spike. |
| 5 | The three-layer template + schema twin causally lifts mention rate (not merely correlates), and does so across all three target engines | 0.35 | No — the core B bet; schema lift is real for retrieval engines and ~0 for parametric-recall ChatGPT (Feld); must be proven on a pilot before scaling. |
| 6 | Apple/Beats governance will grant a committed live-publish path to at least one Beats-controlled surface within ~3 months | 0.40 | No — dominant org risk; hard milestone (pre-mortem #12). Likely first granted on a non-flagship surface, if at all. |
| 7 | Canary phrases survive engine ingestion and are elicitable verbatim (Architecture C's attribution premise) | 0.10 | No — Li judges verbatim survival effectively zero post-alignment after NYT litigation; the "distinctive fact pattern" fallback is unfalsifiable. Treated as near-dead. |
| 8 | Leadership/finance will accept "mention rate" as a meaningful, fundable metric — but only when framed as diagnostic information, not as a revenue-lift claim | 0.65 | Partially — Whitfield confirms the diagnostic framing is fundable (~$120–200k) while the program framing (~$565–750k) is not. The framing, not the metric, is the validated part. |
Gate met for 0-foundation: success criteria defined (channel-materiality entry gate, validated SoV baseline, anti-theater copy-change criterion, evidence-based B causal go/no-go — all measurable and time-bound); constraint inventory mapped across data/API-ToS, Apple brand/legal/comms governance, engine non-determinism, API-vs-app fidelity, CMS/schema eligibility, budget, and headcount with hard/soft/assumed typing; decision-maker identified via Three Ledgers (owner/sponsor Christina) with the true ledger reconciled to the validated A-first, diagnostic-first, channel-gated plan; assumption log with eight project-grounded assumptions and confidences recorded → set phaseGates.0-foundation = passed in manifest.json.