Phase 0 · Foundation Assessment — BEA-10 Translations for Amazon Assets
Establish the foundation before any analysis. The permanent base layer (FAD) for all future work. Build type: backend-automation. Owner: Jaclyn / Lisa. One-liner: colloquial (not literal) translations across 13 languages, tuned to the Beats brand glossary.
1. Success criteria (measurable, time-bound)
These are the conditions under which BEA-10 is considered a real win, not innovation theater. They are written to be falsifiable and to survive the Tier-2 challenges already surfaced downstream (Priya: auto-publish likely prohibited; Marcus: translation may not be the binding constraint; CFO: show the human-in-the-loop economics). Criteria 1–3 are the hard go/no-go bar; 4–5 are the structural preconditions every architecture shares.
- [ ] Criterion 1 — Operationalized "colloquial" rubric exists and discriminates. By end of a scoped 2–3 locale pilot (target ~90 days from kickoff), a per-locale, written acceptance rubric for "colloquial not literal" exists with reference exemplars, AND a standing native reviewer can blind-rate pipeline output vs. the incumbent agency baseline with inter-rater agreement ≥ 0.7 and a measurable, positive voice-uplift delta over baseline. Without this, the project's reason to exist is denominated in an undefined unit (flagged in every downstream phase).
- [ ] Criterion 2 — Quality/compliance bar met under human-in-the-loop economics. On the pilot locales, ≥ 95% of shipped segments pass the native-reviewer accept gate on first or second pass, brand-glossary term-drift = 0 (zero forbidden-tier violations reach a live listing), and the chosen architecture still produces a defensible payback or cost-neutral case with the auto-publish tier set to zero (humans review every customer-facing line). Measured over the pilot window (~90–120 days).
- [ ] Criterion 3 — Translation is confirmed to be a constraint worth optimizing. Within the first 2–4 weeks (a data pull, not a build), confirm from real Beats asset-volume and turnaround records that translation latency/quality — not upstream asset creation or the approval chain — is a genuine bottleneck or brand-risk surface. Go/no-go gate: if translation is demonstrably a third-order lever with no operational pain, the initiative is rescoped or shelved before spend.
- [ ] Criterion 4 — Scale & coverage proof. The pilot architecture demonstrably handles the real Amazon A+ content module format (not just flat bullets) for the pilot locales without layout overflow/reflow defects, and a credible, costed path to all 13 locales exists with the low-resource locale human-review floor explicitly budgeted (not modeled as "eventually automated"). Demonstrated by end of pilot.
- [ ] Criterion 5 — Auditability & provenance baseline. Every shipped segment carries a queryable trace (source version, glossary snapshot, model/prompt version, register tag, QA scores, reviewer sign-off) such that any in-market complaint is forensically traceable and any glossary regression is reproducible/rollback-able. In place before the first live publish.
2. Decision-maker profile (Three Ledgers)
- Public ledger (what we pitch): "An AI-powered translation system that produces colloquial, on-brand Beats copy across 13 languages for Amazon, with the brand glossary enforced so brand terms never drift — faster and more consistent than ad-hoc agency translation." This is the headline that earns a slot in the one-year internal AI program.
- Shadow ledger (what the decision-maker actually optimizes for): The sponsor optimizes for defensible standing inside an AI initiative — a deliverable that visibly is AI (rules out "we licensed memoQ and hired translators"), that does not produce a brand-embarrassment incident in a language no one central can read (career risk), and that survives a year-end ROI defense to a skeptical CFO. The real optimization target is "a credible, non-embarrassing AI win I can defend," not maximum translation throughput.
- True ledger (what will actually get built): Given the downstream verdict (B recommended at 74/Refine; A as de-risking baseline; C killed as a day-one commitment) and the Tier-2 reality that Apple/Beats almost certainly prohibits auto-publishing AI copy to live customer-facing listings, the thing that actually gets built is an LLM register-tiered drafting + dual-signal QA pipeline running with humans-review-everything — i.e., the "B-shaped" pipeline reframed as a quality / consistency / auditability tool, not a cost-automation tool. The cost-elimination story is the most likely casualty between pitch and build.
- Owner / sponsor: Jaclyn / Lisa (program owners). Authority gap flagged: Jaclyn/Lisa can own the linguistic-asset supply chain and pipeline build, but they do not have authority to make the upstream copywriting org or 13 regional marketing teams change behavior — any cross-org dependency (notably Architecture C) requires a VP-level sponsor named before spend.
3. Constraint inventory
| Constraint | Type (hard / soft / assumed) | Notes |
|---|---|---|
| Apple/Beats brand & legal policy on AI-generated customer-facing copy (likely mandates human review on every published line) | hard | Per Tier-2 (Priya), the "no-human-on-happy-path" auto-publish tier is probably already decided against. Treat as hard until written policy says otherwise; it re-denominates the value from cost-automation to quality/auditability. |
| Beats brand glossary must be enforced as canonical law (zero forbidden-term drift to live listings) | hard | The single highest-transferability mechanism across all researched domains (0.95). Lock brand anchors; free surrounding syntax (preferred / admitted / forbidden tiers) to avoid manufacturing the wooden literalness the project exists to kill. |
| 13 target locales, including low-resource markets with thin qualified-linguist pools | hard | Coverage is a stated requirement; low-resource locales degrade in both generation quality AND QA-proxy visibility — a permanent human-review floor is required, not optional. |
| Amazon A+ content structured-module format (not flat strings) | hard | Rich modules + per-locale character/length budgets (German expands ~30%, Japanese contracts); ingest/export must respect layout or listings break in-market. |
| Apple procurement / security / data-residency review for any third-party tooling or corpus | hard | Uploading unreleased product copy to third-party SaaS (CAT termbase, scraped reference corpora) must clear review; can stall any architecture at the tooling stage. Pre-clear, don't discover mid-build. |
| Reference-corpus IP / licensing for naturalness scoring | soft | Scraped consumer text may violate ToS/copyright; sourceable from licensed/owned/public-domain alternatives if legal blocks scraping. Loss degrades but does not kill the QA gate. |
| Headcount / standing native reviewer per locale | soft | A standing per-locale native reviewer + rubric is the non-negotiable precondition all validators agreed on; staffing it is a budget/sourcing question, negotiable in scope (e.g., managed LSP for the long tail) but not eliminable. |
| Amazon per-marketplace performance/return/sentiment data access | assumed | Required only for Architecture C's feedback loop; programmatic 13-marketplace ingestion is unconfirmed and may be curtailed by seller-data agreements or regulation. Assume unavailable until proven. |
| Translation is the binding operational constraint | assumed | The premise the entire initiative rests on (Tier-2 Marcus disputes it). Must be validated by a cheap data pull before any build — challengeable and pivotal. |
| LLM + per-locale style guide produces native-acceptable colloquial copy | assumed | Medium-high confidence for high-resource locales (FR, DE, ES, JA, PT-BR); lower for low-resource. Validate on pilot locales before promising 13. |
4. Scope boundary
- In scope: Colloquial (not literal) translation of Beats Amazon retail assets — A+ content modules, enhanced bullets, titles, lifestyle captions — across the 13 target locales, anchored to an enforced Beats brand glossary; register-aware handling (headline/tagline vs. feature bullet vs. legal disclaimer vs. lifestyle caption); an automated QA stage (meaning-drift + naturalness signals) feeding a human-review queue; provenance/audit logging; and a 2–3 high-resource-locale pilot to validate the "colloquial" rubric, flag/accept rates, and human-in-the-loop economics before any commitment to all 13.
- Out of scope: Auto-publishing AI copy to live listings with no human review (assume prohibited); non-Amazon channels (retail packaging, web, in-app) — future option only; source English copy authoring and the source-discipline tooling (Architecture C's first loop — deferred, cross-org); in-market closed-loop glossary auto-evolution from Amazon performance signal (Architecture C — killed as day-one commitment, staged future option only); building bespoke translation models (use Claude + style guides + glossary validator, not custom MT training); and any commitment to a translation-drives-Amazon-revenue ROI claim until the constraint premise (Criterion 3) is validated.
- Recursion budget acknowledged: max 3 per phase.
5. Assumption log
| # | Assumption | Confidence (0–1) | Validated? |
|---|---|---|---|
| 1 | Beats/Apple brand & legal will require human review on every customer-facing published line (no auto-publish tier) | 0.80 | No — Tier-2 (Priya) asserts it is effectively decided; must be confirmed in writing at the Decision Gate. The whole value framing pivots on this. |
| 2 | "Colloquial brand voice" can be made measurable via a per-locale rubric + standing native reviewer (it is not fully reducible to a termbase) | 0.55 | No — unvalidated and named the project's single biggest exposure across all phases; a pilot precondition (Criterion 1). |
| 3 | Translation latency/quality is a genuine operational bottleneck or brand-risk surface (not a third-order lever dominated by price/availability/reviews/rank) | 0.40 | No — actively disputed (Tier-2 Marcus); a cheap data pull (Criterion 3) must confirm before any build. |
| 4 | The Beats brand glossary can be expressed as enforceable preferred/admitted/forbidden term tiers per locale | 0.75 | Partially — term-locking is the proven 0.95-transferability core; the colloquial voice layer is the unvalidated remainder. |
| 5 | Back-translation QA must use a model-independent path to be valid; same-model round-trips give false-green negatives | 0.85 | No — flagged (B12) and confirmed by Tier-1 (Hiroshi) as architectural, not tunable; must be specified into the pilot design. |
| 6 | Corpus-perplexity is a weak naturalness proxy that penalizes brand-distinctive copy (use as flagging-only, never auto-pass) | 0.70 | No — Tier-1 (Hiroshi/Sofia) expect proxy-to-human correlation below the r≈0.6 kill line for marketing register; to be measured on a labeled pilot set. |
| 7 | Jaclyn/Lisa lack authority over the copywriting org and 13 regional teams; cross-org work needs a VP sponsor named before spend | 0.85 | No — structural authority gap; confirmed across pre-mortem (C9) and Tier-2 (Priya). Determines whether C is even attemptable. |
| 8 | Apple procurement/security will gate third-party SaaS handling of unreleased product copy and scraped corpora | 0.70 | No — must be pre-cleared in Phase 4 rather than discovered mid-build (A6/B5). |
| 9 | An LLM + per-locale style guide yields native-acceptable colloquial output for high-resource locales; low-resource locales need a permanent human floor | 0.65 | No — to be measured on pilot locales; do not model low-resource locales as "eventually automated." |
Gate: success criteria defined (5, measurable and time-bound), constraints mapped (10, typed), decision-maker identified (Three Ledgers + owner Jaclyn/Lisa with authority gap flagged), assumptions logged (9, confidence-scored) → set phaseGates.0-foundation = passed in manifest.json.