End-to-end ML workflow
You practice framing prediction problems, building train and validation splits that respect time ordering when needed, and establishing naive baselines before reaching for complex models. Feature discussions emphasize leakage paths through target encoding, future information sneaking into joins, and why cross-validation must mirror deployment constraints. Documentation habits begin early: every experiment notes data version, random seed, and evaluation metrics so results stay comparable.
Model families for tabular data
Lessons compare linear models, regularized regression, tree-based ensembles, and when complexity buys marginal gains. You interpret feature importance cautiously, knowing correlated features can distort rankings. Hyperparameter topics stay practical: reasonable search ranges, validation curves, and stopping rules that avoid endless tuning. Labs use small, interpretable datasets so intuition develops before cloud-scale tooling.
Evaluation and deployment-aware metrics
You study accuracy pitfalls on imbalanced targets, precision-recall trade-offs, calibration for probability outputs, and threshold selection tied to business costs. Scenarios include asymmetric error costs such as fraud or medical triage proxies. Visualization exercises plot ROC-style summaries without overfitting to a single operating point.
Ethics, monitoring, and human oversight
The final block introduces fairness considerations, subgroup checks, model cards, and governance habits for production monitoring. Discussions cover feedback loops when models influence future data, and when human review must remain mandatory. You leave with a checklist for responsible handoffs to engineering teams who will own serving and alerts.
Applied lab extension
Optional deeper dives let you compare two algorithms on the same dataset with honest reporting of compute cost and interpretability trade-offs. You rehearse explaining model limitations to product managers who want certainty you cannot ethically provide. Supplementary notebooks introduce cross-validation nuances for small samples without pushing you into heavy mathematical proofs. Facilitators emphasize documentation habits so teammates can reproduce your splits and random seeds months later. Peer code reviews focus on clarity of probability outputs and on calling out data leakage risks before merge requests land in shared repositories. You may optionally present findings in a lunch-and-learn format.