GPT-4O mini
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Don't Ask the LLM to Track Freshness: A Deterministic Recipe for Memory Conflict Resolution
arXiv:2606.01435v1 Announce Type: new Abstract: LLM-based memory systems increasingly maintain facts that evolve over time, where a recurring failure is conflict resolution: when a fact has multiple contradictory values, which should the agent return? MemoryAgentBench (MAB; Hu et al., 2026) makes this explicit in its FactConsolidation task: facts are numbered, the counterfactual has the higher serial, and agents are told newer facts have larger serials. Yet every published system...
Using Large Language Models to Support High Volume Application Review for an Undergraduate Research Program
Announce Type: new Abstract: Undergraduate research programs such as the Summer Undergraduate Research Fellowship (SURF) at Purdue University receive thousands of applications every year, requiring significant time and effort for program staff to evaluate each submission consistently and within tight timelines. This work-in-progress paper describes the development and initial deployment of a large language model (LLM)-based tool to assist in the evaluation of approximately 1,200 student...
Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals
arXiv:2606.06460v1 Announce Type: new Abstract: As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client). We propose a third mode: a lightweight, published in-band deny signal -- the Recuse Signal -- that a server emits over a...
Annotation of Positive vs Negative User Interactions for Social Sign Prediction
Announce Type: new Abstract: Inferring the sign of social relationships from online interactions is a fundamental challenge in social network analysis. Existing approaches typically rely on sentiment analysis to label individual interactions as positive or negative, then aggregate these labels to assign a sign to the relationship.
Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach
Announce Type: replace Abstract: Large language models (LLMs) are increasingly deployed as "agents" for decision-making (DM) in interactive and dynamic environments. Yet, since they were not originally designed for DM, recent studies show that LLMs can struggle even in basic online DM problems, failing to achieve low regret or an effective exploration-exploitation tradeoff. To address this, we introduce Iterative Regret-Minimization Fine-Tuning (Iterative RMFT), a post-training procedure...
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Announce Type: replace Abstract: Vision-language models (VLMs) could power real-time assistants and autonomous agents, but they face a critical challenge: understanding near-infinite video streams without escalating latency and memory usage. Processing entire videos with full attention leads to quadratic computational costs and poor performance on long videos. Meanwhile, simple sliding window methods are also flawed, as they either break coherence or suffer from high latency due to redundant...