Home › Knowledge Base › CoreSet

CoreSet

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Coreset-Induced Conditional Velocity Flow Matching

Announce Type: replace-cross Abstract: We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a...

arXiv CS 5d ago

Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models

Announce Type: new Abstract: Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware coreset construction pipeline that conditions a tabular foundation model at inference time.

arXiv CS 5d ago

WildCat: Near-Linear Attention in Theory and Practice

arXiv:2602.10056v2 Announce Type: replace Abstract: We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast...

arXiv CS 8d ago

Towards Tight Bounds for Streaming Attention

Announce Type: new Abstract: The attention mechanism is a cornerstone of modern transformer architectures. However, its expressive power comes at the cost of quadratic runtime and linear space usage. In particular, the classical transformer architecture explicitly stores all previously seen input elements (tokens) in order to generate the next one.

arXiv CS 2d ago

FlashbackCL: Mitigating Temporal Forgetting in Federated Learning

arXiv:2606.03939v1 Announce Type: new Abstract: Federated Learning (FL) of foundation and edge models increasingly targets deployments where client data distributions drift over time, yet existing forgetting-mitigation methods assume each client's distribution is stationary. Flashback, the strongest recent FL method against cross-client (spatial) forgetting, uses monotonically accumulating per-class label counts as a knowledge proxy; this proxy becomes miscalibrated under temporal...

arXiv CS 7d ago

ALINC: Active Learning for Inductive Node Classification via Graph Sampling

Announce Type: new Abstract: Active learning (AL) for node classification typically focuses on selecting the most informative nodes for annotation within one or a few large graphs (e.g., in social network analysis). However, in other domains, such as molecular chemistry or electronic design automation, datasets consist of thousands of independent graphs. In many of these inductive settings, annotating an individual node requires a full-graph analysis, which effectively yields the remaining...

arXiv CS 6d ago

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Announce Type: new Abstract: AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings.

arXiv CS 5d ago

Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing

Announce Type: new Abstract: Real-world datasets often contain spurious correlations that are not causally related to the target label. When such correlations dominate the majority of training samples, models tend to rely on them, leading to misclassification of minority samples that do not exhibit the same spurious patterns. While a potential approach is to select subsets of data to better represent the minority samples, this may require access to group labels, which are typically unknown.

arXiv CS 7d ago

FOSTER: First-order Dataset Distillation for Text-based Sequential Recommendation

arXiv:2605.30772v1 Announce Type: new Abstract: Text-based sequential recommender systems, while greatly improving recommendation accuracy by incorporating item contexts, are undeniably more expensive to train. By condensing a large dataset into a compact set of synthetic samples for model training, dataset distillation offers a promising solution. However, its adoption in text-based sequential recommendation is non-trivial given the large pool of discrete items.

arXiv CS 9d ago

XSSR: Cross-Domain Self-Supervised Representative Selection for Efficient Annotation in Medical Image Segmentation

arXiv:2606.04301v1 Announce Type: new Abstract: Acquiring labeled medical image data is resource-intensive and a challenge further exacerbated in cross-domain scenarios where source and target datasets differ in imaging equipment, population, or clinical site. This study introduces XSSR (Cross-Domain Self-Supervised Representative Selection), a framework designed to minimize annotation effort in the target domain while maintaining robust segmentation performance. XSSR comprises three stages:...

arXiv CS 6d ago