Sample Assignment
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
Announce Type: replace Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) typically samples multiple responses per prompt and assigns binary rewards based on individual correctness, yet the collective structure of the group output, specifically the distribution of errors, is largely discarded. We identify this as a missed opportunity: empirical analysis reveals that error diversity within a group is a strong predictor of training success, with problems eliciting diverse wrong...
Active Flow Expansion for Out-of-Distribution Discovery: from Theory to Molecules
arXiv:2606.08802v1 Announce Type: new Abstract: Standard flow and diffusion pre-training matches the distribution of available data (e.g., molecules), which often covers only a small fraction of the valid design space. In generative discovery, however, one aims to sample valid new-to-nature designs, assigned negligible probability under, and thus inaccessible to, standard models fitted to the observed data.
Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance
Announce Type: replace Abstract: We conducted a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. We then introduced (i) a clinically motivated aggregation of two hemiplegic subtypes following ICHD-3 {\S}1.2.3, (ii) a class-dependent hybrid augmentation strategy that assigns generation methods based on per-class sample size, and (iii) the concept of fidelity asymmetry, motivating proportionally constrained growth as...
Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization
arXiv:2605.29198v2 Announce Type: replace Abstract: Group-advantage-based reinforcement learning methods, such as GRPO and DAPO, have demonstrated strong performance across diverse domains, including mathematical reasoning and text-to-image generation. However, their reliance on sample-level rewards introduces a key limitation as uniform credit assignment across all tokens fails to capture fine-grained, token-level contributions. To address this issue, we propose Guidance Contrastive Policy...
Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection
arXiv:2506.10601v2 Announce Type: replace Abstract: Given its ability to reduce annotation costs, weakly supervised learning based on single-point annotations has emerged as a research focus in oriented object detection. Compared with the classical teacher-student paradigm, the simple model paradigm (e.g., PointOBB-v2) can substantially further reduce resources required for training while ensuring strong performance.
High-Precision APT Malware Attribution with Out-of-Scope Resilience
arXiv:2606.03523v1 Announce Type: new Abstract: Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups.
Improving Diffusion Planners by Self-Supervised Action Gating with Energies
arXiv:2603.02650v2 Announce Type: replace Abstract: Diffusion planners are a strong approach for offline reinforcement learning, but they can fail when value-guided selection favours trajectories that score well yet are locally inconsistent with the environment dynamics, resulting in brittle execution. We propose Self-supervised Action Gating with Energies (SAGE), an inference-time re-ranking method that penalises dynamically inconsistent plans using a latent consistency signal. SAGE trains...
A Reproducible UAV-Assisted VANET Dataset Generator for Fragmentation Risk Analysis in Intelligent Transportation Systems
arXiv:2606.01488v1 Announce Type: new Abstract: Vehicular Ad Hoc Networks (VANETs) are a key component of Intelligent Transportation Systems, enabling cooperative communication among vehicles and between vehicles and roadside infrastructure. However, their highly dynamic topology makes them vulnerable to network fragmentation, particularly in highway scenarios, low-density traffic conditions, localized accident zones, and communication-stressed environments. Although Unmanned Aerial Vehicles...
Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition
Announce Type: new Abstract: Large language models for code generation often need to use APIs that are absent from their pretraining data. This requires more than recalling a function name: models must coordinate signatures, module paths, input-output contracts, semantics, and executable usage patterns. Existing novel-API benchmarks are typically static, rely on coarse pass/fail metrics, or use synthetic APIs that may not reflect real library evolution.
MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop
arXiv:2601.22900v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning across domains, but outcome-only scalar rewards are often sparse and uninformative. This limitation is especially severe for failed samples, where scalar rewards indicate only that a solution is incorrect without explaining why the reasoning breaks down. In this paper, we leverage richer verbal feedback to guide RLVR on failed samples and convert...