A Grammar of Machine Learning Workflows: Rejecting Data Leakage at Call Time

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Simon Roth 1 min read

Key Points

arXiv:2603.10742v4 Announce Type: replace Abstract: Data leakage has been identified in 648 published papers across 30 scientific fields. The knowledge to prevent it has existed for over a decade; the problem persists because the tools do not enforce what the textbooks teach. This paper presents a grammar (eight typed primitives connected by a directed acyclic graph with four hard constraints) that makes the most damaging leakage types structurally unrepresentable within the grammar's scope. The core mechanism is a terminal assessment gate: the first call-time-enforced evaluate/assess boundary documented in the peer-reviewed ML methodology literature (to my knowledge, as of May 2026), backed by a specification precise enough for independent reimplementation. A companion landscape study across 2,047 datasets grounds the constraints in measured effect sizes. Two reference implementations (Python, R) are available.

Call Time arXiv:2603.10742v4 Announce Type (ORG) ML (ORG) Python (ORG)

Originally published by arXiv CS Read original →

Five men jailed for causing violent disorder at Henry Nowak protest Five men have been jailed for violent disorder after taking part in a riot in Southampton following the murder of Henry Nowak during which police were surrounded by a “baying mob throwing projectiles”. Father-of-two Daniel Frost, 44, from Southampton, was sentenced to two years and four months in prison for violent disorder and possessing an offensive weapon – a dog lead with a metal carabiner which he had fashioned into a...

Daily Mirror 18m ago

Amazon's 'Story So Far' feature is finally rolling out to Kindles

So Far' feature is finally rolling out to Kindles It's also headed to the Kindle app on iOS, but sadly not the Android version just yet. We may receive a commission on purchases made from links. Way back in September, Amazon announced a feature for Kindles designed to help you catch up on a book in case you've lost the plot or simply took a prolonged break.

Engadget 23m ago

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers'

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers' Voltron may sound like an ointment for back pain, but the reboot Legendary Defender demonstrates that there's more to the big stompy robots concept than meets the eye. Reboot is a dirty word when it comes to TV. Very rarely does a remade show receive its due.

Space.com 24m ago

Take a deep look at Halo: Campaign Evolved before it launches next month

Take a deep look at Halo: Campaign Evolved before it launches next month This remake of the first game's story hits consoles on July 28. We're around six weeks out from the launch of Halo: Campaign Evolved, which is a remake of the single-player campaign from the very first game in the franchise. You know what that means: the marketing is beginning to heat up.

Engadget 29m ago

A Grammar of Machine Learning Workflows: Rejecting Data Leakage at Call Time

Related Stories

Five men jailed for causing violent disorder at Henry Nowak protest

Amazon's 'Story So Far' feature is finally rolling out to Kindles

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers'

Take a deep look at Halo: Campaign Evolved before it launches next month