Reflect & Refine
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization
arXiv:2603.18388v2 Announce Type: replace Abstract: Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed,...
AbstRAG: Learning to Abstract for Retrieval Problems
arXiv:2606.09459v1 Announce Type: new Abstract: Retrieval-augmented generation often fails when the query, the document evidence, and the user's intent are expressed at different levels of abstraction. A query may ask about a class, a relation, or an event, while the document only states specific instances, indirect framings, or scoped formulations.
‘A slap-up meal for €12’: my search for the perfect old-school Turin tavern
Piòle are the Italian city’s working-class neighbourhood taverns. Of the few that survive, many have gone upmarket – but I was looking for the real deal and affordable home cookingTurin is one of Italy’s most serious food cities, shaped by the culinary legacy of the House of Savoy and, more recently, the slow food movement – a reputation reflected in its historic cafes and restaurants, where meals can feel refined. But that’s only part of the picture.
Reasoning without Gold Standards: A Proxy-Judge Theory of Autoformalization
Announce Type: new Abstract: Complex reasoning tasks increasingly require systems to produce outputs whose correctness cannot be judged by exact match against a single reference. Autoformalization (AF) is a representative example; it asks a model to translate informal mathematical or logical reasoning into a formally checkable object, yet expert-validated formalizations do not scale beyond toy cases and a single informal argument can admit many valid formal renderings. Progress therefore...
SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management
Announce Type: replace Abstract: Effective e-commerce risk management requires in-depth case investigations to identify emerging fraud patterns in highly adversarial environments. However, manual investigation typically requires analyzing the associations and couplings among multi-source heterogeneous data, a labor-intensive process that limits efficiency. While Large Language Models (LLMs) show promise in automating these analyses, their deployment is hindered by the complexity of risk...
SLMJury: Can Small Language Models Judge as Well as Large Ones?
arXiv:2606.07810v1 Announce Type: new Abstract: Large language models (LLMs) are widely used as judges for evaluating model outputs, but their high cost, latency, and opacity limit scalability. We introduce SLMJury, a framework for evaluating small language models (SLMs) as judges across two paradigms: closed-ended binary correctness and open-ended quality scoring. We benchmark 16 SLM judges (0.6B-14B parameters) from four model families across ten benchmarks: eight closed-ended tasks...
Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
arXiv:2605.14709v2 Announce Type: replace Abstract: Recent unified models integrate multimodal understanding and generation within a single framework. However, an "understanding-generation gap" persists, where models can capture user intent but often fail to translate this semantic knowledge into precise pixel-level manipulation. This gap results in two bottlenecks in anything-to-image task (X2I): the attention entanglement bottleneck, where blind planning struggles with complex prompts, and...
How Software Engineering Students Use LLMs to Write Research Papers: An Experience Report
arXiv:2606.05114v1 Announce Type: new Abstract: Large language models are increasingly becoming part of software engineering education, including activities involving empirical software engineering and evidence synthesis. This paper reports an educational experience involving the integration of reflective LLM use into an empirical methods assignment in a third-year software architecture course. Students were asked to develop a short research paper using either a rapid review or a gray...
How Software Engineering Students Use LLMs to Write Research Papers: An Experience Report
arXiv:2606.05114v2 Announce Type: replace Abstract: Large language models are increasingly becoming part of software engineering education, including activities involving empirical software engineering and evidence synthesis. This paper reports an educational experience involving the integration of reflective LLM use into an empirical methods assignment in a third-year software architecture course. Students were asked to develop a short research paper using either a rapid review or a gray...
GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis
arXiv:2606.05860v1 Announce Type: new Abstract: Designing neural architectures for time-series forecasting and anomaly detection remains a resource-intensive task that often requires substantial domain expertise. Traditional Automated Machine Learning (AutoML) systems typically rely on static, predefined search spaces, limiting their ability to adapt to diverse data characteristics. We present GenAutoML, an agentic framework that leverages Large Language Models (LLMs) as neural architects to...