Home Knowledge Base Reflect & Refine

Reflect & Refine

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization

arXiv:2603.18388v2 Announce Type: replace Abstract: Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed,...

arXiv CS 1d ago

AbstRAG: Learning to Abstract for Retrieval Problems

arXiv:2606.09459v1 Announce Type: new Abstract: Retrieval-augmented generation often fails when the query, the document evidence, and the user's intent are expressed at different levels of abstraction. A query may ask about a class, a relation, or an event, while the document only states specific instances, indirect framings, or scoped formulations.

arXiv CS 1d ago

‘A slap-up meal for €12’: my search for the perfect old-school Turin tavern

Piòle are the Italian city’s working-class neighbourhood taverns. Of the few that survive, many have gone upmarket – but I was looking for the real deal and affordable home cookingTurin is one of Italy’s most serious food cities, shaped by the culinary legacy of the House of Savoy and, more recently, the slow food movement – a reputation reflected in its historic cafes and restaurants, where meals can feel refined. But that’s only part of the picture.

The Guardian UK 9d ago

Reasoning without Gold Standards: A Proxy-Judge Theory of Autoformalization

Announce Type: new Abstract: Complex reasoning tasks increasingly require systems to produce outputs whose correctness cannot be judged by exact match against a single reference. Autoformalization (AF) is a representative example; it asks a model to translate informal mathematical or logical reasoning into a formally checkable object, yet expert-validated formalizations do not scale beyond toy cases and a single informal argument can admit many valid formal renderings. Progress therefore...

arXiv CS 1d ago

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Announce Type: replace Abstract: Effective e-commerce risk management requires in-depth case investigations to identify emerging fraud patterns in highly adversarial environments. However, manual investigation typically requires analyzing the associations and couplings among multi-source heterogeneous data, a labor-intensive process that limits efficiency. While Large Language Models (LLMs) show promise in automating these analyses, their deployment is hindered by the complexity of risk...

arXiv CS 8d ago

SLMJury: Can Small Language Models Judge as Well as Large Ones?

arXiv:2606.07810v1 Announce Type: new Abstract: Large language models (LLMs) are widely used as judges for evaluating model outputs, but their high cost, latency, and opacity limit scalability. We introduce SLMJury, a framework for evaluating small language models (SLMs) as judges across two paradigms: closed-ended binary correctness and open-ended quality scoring. We benchmark 16 SLM judges (0.6B-14B parameters) from four model families across ten benchmarks: eight closed-ended tasks...

arXiv CS 1d ago

Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners

arXiv:2605.14709v2 Announce Type: replace Abstract: Recent unified models integrate multimodal understanding and generation within a single framework. However, an "understanding-generation gap" persists, where models can capture user intent but often fail to translate this semantic knowledge into precise pixel-level manipulation. This gap results in two bottlenecks in anything-to-image task (X2I): the attention entanglement bottleneck, where blind planning struggles with complex prompts, and...

arXiv CS 8d ago

How Software Engineering Students Use LLMs to Write Research Papers: An Experience Report

arXiv:2606.05114v1 Announce Type: new Abstract: Large language models are increasingly becoming part of software engineering education, including activities involving empirical software engineering and evidence synthesis. This paper reports an educational experience involving the integration of reflective LLM use into an empirical methods assignment in a third-year software architecture course. Students were asked to develop a short research paper using either a rapid review or a gray...

arXiv CS 6d ago

How Software Engineering Students Use LLMs to Write Research Papers: An Experience Report

arXiv:2606.05114v2 Announce Type: replace Abstract: Large language models are increasingly becoming part of software engineering education, including activities involving empirical software engineering and evidence synthesis. This paper reports an educational experience involving the integration of reflective LLM use into an empirical methods assignment in a third-year software architecture course. Students were asked to develop a short research paper using either a rapid review or a gray...

arXiv CS 1d ago

GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis

arXiv:2606.05860v1 Announce Type: new Abstract: Designing neural architectures for time-series forecasting and anomaly detection remains a resource-intensive task that often requires substantial domain expertise. Traditional Automated Machine Learning (AutoML) systems typically rely on static, predefined search spaces, limiting their ability to adapt to diverse data characteristics. We present GenAutoML, an agentic framework that leverages Large Language Models (LLMs) as neural architects to...

arXiv CS 5d ago