Home › Knowledge Base › AUC

AUC

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AUCp: Pseudo-AUC for Inference Model Selection with Unlabeled Validation Data in Abnormality Detection

arXiv:2606.08742v1 Announce Type: new Abstract: Abnormality detection is a crucial yet challenging task in medical image analysis. Distinguishing abnormalities from normal data by learning to reconstruct normal-only data alleviates the reliance on labeled datasets. However, many studies, even if unsupervised, rely on a labeled validation set to select the best model for inference from multiple training iterations.

arXiv CS 1d ago

Cheap Reward Hacking Detection

arXiv:2606.08893v1 Announce Type: new Abstract: A small transformer encoder is trained to map Terminal-Wrench trajectories onto a unit sphere where embedding distance approximates the $L_1$ distance between reward and metadata signals. A linear probe on top of that embedding detects reward hacking on the cleaned test split with AUC $0.9467$ and TPR@5%FPR $0.8296$, matching the TW sanitized LLM-as-judge AUC ($0.9510$ on the cleaned split) and exceeding its TPR@5%FPR ($0.7130$ vs $0.8296$) on...

arXiv CS 1d ago

Which Leakage Types Matter? A Quantitative Landscape Across 2,047 Benchmark Datasets

arXiv:2604.04199v2 Announce Type: replace Abstract: Twenty-eight within-subject counterfactual experiments across 2,047 iid tabular datasets, plus a boundary experiment on 129 temporal datasets, measure the severity of four data leakage classes in machine learning. Class I (estimation: fitting scalers on full data) is negligible: all nine conditions produce $|{\Delta}AUC| \leq 0.005$. Class II (selection: peeking, seed cherry-picking) is substantial: the measured effect is consistent with...

arXiv CS 8d ago

SIRT7 regulates dosage compensation and safeguards the female X chromosome

Abstract Sirtuins are deacetylases implicated in stress responses and longevity in mammals1,2. Although their differential impact on disease for the two sexes has been noted3,4,5,6,7, the underlying reasons are unclear. Here, using Sirt7 as a model in mice, we examine the mechanisms leading to sex differences and find that Sirt7−/− female mice have decreased fitness throughout their lifespan.

Nature 18h ago

Early Prediction of Liver Cirrhosis Up to Two Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 and APRI Scores

Announce Type: replace Abstract: Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis (LC) one and two years prior to diagnosis using routinely collected electronic health record (EHR) data and benchmark their performance against the FIB-4 and APRI clinical scores. Methods: We conducted a retrospective cohort study using de-identified EHR data from a large academic health system. XGBoost models were developed for 1- and 2-year prediction...

arXiv CS 8d ago

Quantizing Intent: Cross-Domain Semantic IDs from Organic Activity for Industrial Ranking

arXiv:2606.01396v1 Announce Type: new Abstract: Ads click-through rate (CTR) prediction is constrained by sparse user supervision: most users engage with ads infrequently while generating dense behavioral evidence in organic surfaces such as feed. Transferring these cross-domain signals into ads ranking is difficult due to domain mismatch, serving cost, and production complexity. We introduce cross-domain user Semantic IDs (SIDs) derived from organic feed activity and show that behavioral...

arXiv CS 8d ago

Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI

arXiv:2606.09026v1 Announce Type: new Abstract: We ask whether structural properties of intermediate grid states predict whether a symbolic ARC-AGI solver will succeed, framed as a test of conditional mutual information I(X;Y|task) > 0. Across 44,800 runs spanning two architecturally distinct solvers (beam search and Stochastic DFS), 400 ARC tasks, 28 configurations per solver, and both training and evaluation splits, hand-crafted grid descriptors measured at 50% trajectory completion...

arXiv CS 1d ago

Subtle Injection for Ground-truth Inference of LLM Training Data

Announce Type: new Abstract: As large language models (LLMs) are increasingly trained on scraped web corpora without authorisation, content owners require forensic methods to prove that their documents were included in a model's training set. We propose \textbf{SIGIL} (\textbf{S}ubtle \textbf{I}njection for \textbf{G}round-truth \textbf{I}nference of \textbf{L}LM training data), a framework that embeds imperceptible \emph{canary sequences} into protected text and code such that any LLM...

arXiv CS 2d ago

A Pathology Foundation Model for Gastric Cancer with Real-World Validation

arXiv:2606.04792v1 Announce Type: new Abstract: Gastric cancer remains a major cause of cancer mortality, yet its histological and molecular heterogeneity complicates diagnosis and risk stratification. General-purpose pathology foundation models (PFMs) often plateau on fine-grained endpoints central to gastric cancer care, and few have undergone rigorous prospective validation or clinical reader studies. We present GRACE, a Gastric-specific foundation model for Real-world Assessment and...

arXiv CS 6d ago

DeRes: Decoupling Residual Stability and Adaptivity for Scalable CTR Prediction

arXiv:2606.07980v1 Announce Type: new Abstract: Transformer-based CTR models face a growing bottleneck at the residual connection: under Pre-Norm, early user-interest signals are diluted layer by layer; the identity skip cannot forget stale interests; and each layer sees only its immediate predecessor, losing long-range cross-layer dependencies. Recent attention-based residual variants (AttnRes) address parts of this in language models, but drop the protective identity skip and have not been...

arXiv CS 1d ago