SatIR: Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Cyrus Zhou, Yufei Jin, Yilin Xu, Yu-Chiang Wang, Chieh-Ju Chao, Monica S. Lam 1 min read

Key Points

Announce Type: replace Abstract: Many important retrieval problems are not merely problems of semantic similarity, but problems of constraint satisfaction: a retrieved item should be topically relevant to a query and satisfy explicit requirements involving negation, temporal conditions, numeric thresholds, exceptions, ontological relations, and incomplete evidence. We study this challenge in clinical trial matching, a high-stakes test bed where a useful trial must both address a patient's...

arXiv:2604.08849v2 Announce Type: replace Abstract: Many important retrieval problems are not merely problems of semantic similarity, but problems of constraint satisfaction: a retrieved item should be topically relevant to a query and satisfy explicit requirements involving negation, temporal conditions, numeric thresholds, exceptions, ontological relations, and incomplete evidence. We study this challenge in clinical trial matching, a high-stakes test bed where a useful trial must both address a patient's medical needs and satisfy complex eligibility criteria. We propose SatIR, a scalable constraint-based retrieval method for clinical trial matching. SatIR converts trial eligibility criteria and summaries into formal constraints, then retrieves patient--trial pairs by executing these constraints over a database. The system combines Satisfiability Modulo Theories (SMT), relational algebra, medical ontology grounding, and large language models (LLMs): formal methods provide executable and inspectable matching, while LLMs convert ambiguous, incomplete, and implicit clinical information into explicit, controllable constraint representations. Across the SIGIR 2016 patient--trial collection and TREC-2022-RetrievalSubset, a benchmark derived from TREC 2022, SATIR consistently improves eligibility-aware retrieval over similarity-based baselines. Relative to TrialGPT-style retrieval, SATIR retrieves 32%--72% more relevant-and-eligible trials per patient on SIGIR 2016 and achieves $1.8$--$3.2\times$ higher eligible-trial recall on TREC-2022-RetrievalSubset. Retrieval is fast, requiring only 146 milliseconds per patient over 3,621 SIGIR trials.

SatIR (PERSON) Satisfiability Modulo Theories (ORG) SMT (ORG) TREC 2022 (ORG) TREC-2022-RetrievalSubset. (ORG)

Originally published by arXiv CS Read original →

SatIR: Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching

Related Stories

Colts' Pierce could be out well into training camp...

First ever guidelines released to screen for condition that impacts 9 in 10 Americans

Patients in A&E with non-urgent ailments may be told to come back later under NHS plans

Family of Belfast knife horror issue second appeal for calm amid fears he could lose other eye