Phase 1 — Foundations and responsible data use
You begin by mapping how raw operational data moves from source systems into analysis-ready tables, and why undocumented handoffs create silent errors downstream. Labs teach you to profile columns for completeness, spot skewed distributions before trusting averages, and write a short data card that lists refresh cadence, grain, and privacy constraints. Discussions connect spreadsheet habits to governance themes such as purpose limitation and retention, so you can explain trade-offs to legal and product partners without jargon. Peer reviews focus on reproducibility: every transformation you apply must be named, ordered, and reversible in principle, mirroring how production analytics teams audit work during incidents or regulatory reviews.
Phase 2 — From business questions to measurable work
This segment trains you to translate fuzzy stakeholder language into testable hypotheses, explicit metrics, and acceptance criteria before opening any tool. You practice breaking requests into population definitions, time windows, and comparison groups, then identifying confounders that could invalidate a headline chart. Short exercises use realistic briefs from marketing and operations so you learn to push back politely when success metrics conflict or when data simply cannot support the claim being asked for. You also draft a lightweight measurement plan suitable for a weekly sync, including fallback visuals when the ideal dataset is delayed or incomplete.
Phase 3 — Prepare and process data with SQL
Here you leave one-off spreadsheet gymnastics for repeatable SQL workflows that scale with larger extracts. Lessons cover filtering, joins, aggregations, subqueries, and basic performance hygiene such as selective columns and sensible grouping keys. You learn to catch join explosions early, document assumptions when imputing missing values, and version queries in a shared repository pattern so teammates can diff changes over time. Case studies highlight how small syntax mistakes alter revenue or funnel numbers, reinforcing defensive querying habits and peer review before numbers reach leadership slides.
Phase 4 — Analyze in R and communicate with visuals
You gain just enough R to tidy messy exports, compute reproducible summaries, and iterate exploratory plots without breaking provenance. Emphasis stays on readable scripts, sensible factor handling, and commentary that explains why a chart supports a recommendation. Visualization modules stress accessibility, truthful scales, and choosing encodings that match audience literacy, from executives who want directional clarity to specialists who ask for segment drill-downs. The unit closes with a structured narrative template: context, finding, uncertainty, and next step, so your final storyline stays defensible under skeptical questions.