Home › Knowledge Base › Continual Learning Bench

Continual Learning Bench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

arXiv:2606.05661v1 Announce Type: new Abstract: Continual learning, the ability of AI systems to improve through sequential experience, has attracted substantial interest, but no high-quality benchmark exists to evaluate it. We introduce Continual Learning Bench (CL-Bench), the first difficult, expert-validated benchmark designed to measure whether LLM-based systems genuinely improve with experience.

arXiv CS 5d ago

Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning

arXiv:2507.12612v3 Announce Type: replace Abstract: Supervised fine-tuning performance for large language models depends strongly on how training budget is distributed across a heterogeneous set of tasks. In practice, mixtures are often fixed using simple heuristics (e.g., uniform or size-proportional sampling) that ignore task interactions, which can hurt transfer and waste budget on redundant sources. We introduce TaskPGM, a framework for learning continuous task mixtures via an...

arXiv CS 5d ago

Learning Task Mixtures from Task Affinities: A Probabilistic Graphical Model for Supervised Fine-Tuning

arXiv:2507.12612v4 Announce Type: replace Abstract: Supervised fine-tuning performance for large language models depends strongly on how training budget is distributed across a heterogeneous set of tasks. In practice, mixtures are often fixed using simple heuristics (e.g., uniform or size-proportional sampling) that ignore task interactions, which can hurt transfer and waste budget on redundant sources. We introduce TaskPGM, a framework for learning continuous task mixtures via an...

arXiv CS 1d ago

MemoGen: Can Past Experience Improve Future Text-to-Image Generation?

arXiv:2606.03243v1 Announce Type: new Abstract: Modern text-to-image models have achieved strong visual synthesis, yet remain unreliable when prompts require implicit visual constraints, relational reasoning, or external knowledge. Existing retrieval-augmented and agentic generation methods mitigate this issue by acquiring external knowledge, references, or refined prompts for the current request, yet they typically treat each generation as an isolated episode and do not systematically...

arXiv CS 7d ago

CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications

arXiv:2512.15231v3 Announce Type: replace Abstract: The automated and intelligent processing of massive remote sensing (RS) datasets is critical in Earth observation (EO). Existing automated systems are normally task-specific, lacking a unified framework to manage diverse, end-to-end workflows--from data preprocessing to advanced interpretation--across diverse RS applications. To address this gap, this paper introduces CangLing-KnowFlow, a unified intelligent agent framework that integrates...

arXiv CS 5d ago

Allahabad HC takes 4 decades to decide on murder conviction appeal

The Supreme Court was extremely disappointed on Monday to find that a man — arrested as a 28-year-old in Nov 1983 for shooting dead a person — waited for more than four decades in the Allahabad high court for a decision on his appeal against conviction and a life term awarded by a trial court. A visibly disturbed bench of Justices Prashant Kumar Mishra and AS Chandurkar wondered as to what innovative measures could be taken to free the clogged wheels of justice in the Allahabad HC, where...

Times of India 1d ago

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

arXiv:2606.01993v1 Announce Type: new Abstract: Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the gap between human-oriented guides and agent-executable skills, we formalize this problem as guide-to-skill learning: converting in-the-wild guides...

arXiv CS 8d ago

EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale

arXiv:2604.17406v3 Announce Type: replace Abstract: The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale.

arXiv CS 1d ago

Ishigaki-IDS: An Open-Weight Verifier-Aware Model for Information Delivery Specification Drafting in Building Information Modeling

Announce Type: new Abstract: Building Information Modeling (BIM) projects require information requirements to be described as machine-checkable Information Delivery Specification (IDS) files in order to verify whether building models contain the required attributes. However, IDS authoring remains a practical bottleneck: practitioners must handle domain vocabulary, strict XML schema constraints, and external validator conformance while also checking whether the requirement itself is correctly...

arXiv CS 1d ago

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

arXiv:2606.05950v1 Announce Type: new Abstract: Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context editing, where users iteratively refine an image through a sequence of instructions.

arXiv CS 5d ago