Home › Knowledge Base › Long

Long

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?

arXiv:2505.19293v2 Announce Type: replace Abstract: Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM enables users to effortlessly process many originally exhausting tasks -- e.g., digesting a long-form document to find answers vs. directly asking an LLM about it. However, existing real-task-based long-context evaluation benchmarks have two major shortcomings. First, benchmarks like LongBench often do not provide proper...

arXiv CS 6d ago

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

arXiv:2505.11166v3 Announce Type: replace Abstract: Despite advances in pretraining with extended context sizes, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by data quality issues, training inefficiencies, and the lack of well-designed optimization objectives. To address these limitations, we propose a framework named \textbf{S}h\textbf{o}rt-to-\textbf{Lo}ng...

arXiv CS 6d ago

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

arXiv:2606.01629v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly used for long-form generation, reliably evaluating long-form outputs has become a critical challenge. LLM-as-a-judge offers a scalable alternative to human evaluation, yet its reliability in long-form output evaluation remains underexamined: existing meta-evaluation benchmarks focus mainly on short-form outputs. Compared with short-form evaluation, long-form evaluation is not merely a matter of...

arXiv CS 8d ago

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

Announce Type: replace Abstract: As large language models (LLMs) are increasingly used for long-form generation, reliably evaluating long-form outputs has become a critical challenge. LLM-as-a-judge offers a scalable alternative to human evaluation, yet its reliability in long-form output evaluation remains underexamined: existing meta-evaluation benchmarks focus mainly on short-form outputs. Compared with short-form evaluation, long-form evaluation is not merely a matter of output length;...

arXiv CS 7d ago

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Announce Type: replace Abstract: Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. In this work, we hypothesize that current limitations in reasoning stem, in part, from insufficient long-context capacity, motivated by empirical observations such as (1) higher context window length often leads to stronger reasoning performance, and (2) failed reasoning cases resemble failed long-context cases. To test...

arXiv CS 6d ago

Orthogonal Learner for Estimating Heterogeneous Long-Term Treatment Effects

arXiv:2604.00915v2 Announce Type: replace Abstract: Estimation of heterogeneous long-term treatment effects (HLTEs) is relevant for personalized decision-making in marketing, economics, and medicine, where short-term observational datasets are often combined with long-term observational datasets. However, HLTE estimation is challenging due to limited overlap in treatment assignments or in long-term outcomes for certain subpopulations, which can lead to unstable HLTE estimates with large...

arXiv CS 6d ago

Trump says Iran has taken too long to negotiate, will 'pay the price'

Trump says Iran has taken too long to negotiate, will 'pay the price' Iran has "taken too long to negotiate a deal that would have been great for them, now they will have to pay the price", says US President Donald Trump. US President Donald Trump said on Wednesday (Jun 10) Iran had taken too long to negotiate a deal and would now "have to pay the price", while Tehran said it would reassess diplomatic engagement with Washington after tit-for-tat strikes overnight. Iran launched missile and...

Channel News Asia 7h ago

T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

arXiv:2406.00636v2 Announce Type: replace Abstract: In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step.

arXiv CS 2d ago

VideoWeaver: Evaluating and Evolving Skills for Agentic Long Video Generation

arXiv:2606.08091v1 Announce Type: new Abstract: Recent agent frameworks such as Claude Code, Codex, and OpenClaw are strong at tool use and orchestration, but whether they can handle long video generation, a long-horizon multimodal task, remains underexplored. Unlike earlier video agents whose pipeline is handcrafted, these frameworks can build and refine their own workflows. We introduce VideoWeaver, an agent harness and benchmark that evaluates and evolves skills for long video generation,...

arXiv CS 1d ago

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

arXiv:2605.21028v2 Announce Type: replace Abstract: Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed allocation keeps early frames cached even when the current visual state has substantially diverged from them, while discarding potentially more relevant intermediate history. As a result, the retained long-range context...

arXiv CS 1d ago