i had a fuel-burn model that looked great, and the reason it looked great was a bug.
the task was estimating how much fuel a flight burns, trained on a season of european flight data. my gradient-boosted model was beating the baselines comfortably in cross-validation and i was a step away from calling it done. then i looked at how i was splitting the data.
i was splitting rows at random. that sounds fine until you remember what a row is. flights from the same aircraft, the same route, the same week share almost everything. scatter them randomly across train and validation folds and the model gets to peek at near-identical flights while it's being scored. it isn't learning to predict fuel burn. it's learning to recognise neighbours it has already seen. the cross-validation number wasn't measuring generalisation. it was measuring leakage.
the fix was to split by group and by time instead of at random. related flights stay together, and the validation set always comes from later than the training set, the way it would in production, where you predict the future and never the past. the moment i did that, the score got worse. noticeably worse. that drop was the model telling me the truth for the first time.
this is the part worth saying out loud. the worse number was the better result. the leaky setup was flattering, and flattering is dangerous, because it ships. a model that looks brilliant on a broken split goes to production and quietly underperforms on every flight it has never seen, and by then it's a debugging problem instead of a design one.
the tradeoff is that grouped, chronological validation is pessimistic. it throws away easy correlations you could have leaned on and makes your numbers smaller and your write-up less impressive. that's the point. a smaller number you trust is worth more than a bigger one you don't, and the only number worth optimising is the one that survives contact with data the model has never met.
i kept the boring split. the model got less impressive and more real. that's a trade i'll take every time.