Home › Knowledge Base › Sample Assignment

Sample Assignment

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning

Announce Type: replace Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) typically samples multiple responses per prompt and assigns binary rewards based on individual correctness, yet the collective structure of the group output, specifically the distribution of errors, is largely discarded. We identify this as a missed opportunity: empirical analysis reveals that error diversity within a group is a strong predictor of training success, with problems eliciting diverse wrong...

arXiv CS 2d ago

Active Flow Expansion for Out-of-Distribution Discovery: from Theory to Molecules

arXiv:2606.08802v1 Announce Type: new Abstract: Standard flow and diffusion pre-training matches the distribution of available data (e.g., molecules), which often covers only a small fraction of the valid design space. In generative discovery, however, one aims to sample valid new-to-nature designs, assigned negligible probability under, and thus inaccessible to, standard models fitted to the observed data.

arXiv CS 1d ago

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

Announce Type: replace Abstract: We conducted a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. We then introduced (i) a clinically motivated aggregation of two hemiplegic subtypes following ICHD-3 {\S}1.2.3, (ii) a class-dependent hybrid augmentation strategy that assigns generation methods based on per-class sample size, and (iii) the concept of fidelity asymmetry, motivating proportionally constrained growth as...

arXiv CS 5d ago

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

arXiv:2605.29198v2 Announce Type: replace Abstract: Group-advantage-based reinforcement learning methods, such as GRPO and DAPO, have demonstrated strong performance across diverse domains, including mathematical reasoning and text-to-image generation. However, their reliance on sample-level rewards introduces a key limitation as uniform credit assignment across all tokens fails to capture fine-grained, token-level contributions. To address this issue, we propose Guidance Contrastive Policy...

arXiv CS 9d ago

Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection

arXiv:2506.10601v2 Announce Type: replace Abstract: Given its ability to reduce annotation costs, weakly supervised learning based on single-point annotations has emerged as a research focus in oriented object detection. Compared with the classical teacher-student paradigm, the simple model paradigm (e.g., PointOBB-v2) can substantially further reduce resources required for training while ensuring strong performance.

arXiv CS 5d ago

High-Precision APT Malware Attribution with Out-of-Scope Resilience

arXiv:2606.03523v1 Announce Type: new Abstract: Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups.

arXiv CS 7d ago

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

arXiv:2603.02650v2 Announce Type: replace Abstract: Diffusion planners are a strong approach for offline reinforcement learning, but they can fail when value-guided selection favours trajectories that score well yet are locally inconsistent with the environment dynamics, resulting in brittle execution. We propose Self-supervised Action Gating with Energies (SAGE), an inference-time re-ranking method that penalises dynamically inconsistent plans using a latent consistency signal. SAGE trains...

arXiv CS 8d ago

A Reproducible UAV-Assisted VANET Dataset Generator for Fragmentation Risk Analysis in Intelligent Transportation Systems

arXiv:2606.01488v1 Announce Type: new Abstract: Vehicular Ad Hoc Networks (VANETs) are a key component of Intelligent Transportation Systems, enabling cooperative communication among vehicles and between vehicles and roadside infrastructure. However, their highly dynamic topology makes them vulnerable to network fragmentation, particularly in highway scenarios, low-density traffic conditions, localized accident zones, and communication-stressed environments. Although Unmanned Aerial Vehicles...

arXiv CS 8d ago

Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition

Announce Type: new Abstract: Large language models for code generation often need to use APIs that are absent from their pretraining data. This requires more than recalling a function name: models must coordinate signatures, module paths, input-output contracts, semantics, and executable usage patterns. Existing novel-API benchmarks are typically static, rely on coarse pass/fail metrics, or use synthetic APIs that may not reflect real library evolution.

arXiv CS 7d ago

MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

arXiv:2601.22900v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning across domains, but outcome-only scalar rewards are often sparse and uninformative. This limitation is especially severe for failed samples, where scalar rewards indicate only that a solution is incorrect without explaining why the reasoning breaks down. In this paper, we leverage richer verbal feedback to guide RLVR on failed samples and convert...

arXiv CS 8d ago