Home › Business & Finance › Visual Template Inference for Data Extraction from Documents

Business & Finance

Visual Template Inference for Data Extraction from Documents

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, Aditya G. Parameswaran 1 min read

Key Points

arXiv:2501.06659v2 Announce Type: replace Abstract: Many templatized documents are programmatically generated from structured data following a visual template. Such documents include invoices, tax documents, financial reports, and purchase orders. Effective data extraction from these documents is crucial to support downstream analytical tasks. Current data extraction tools often struggle with complex document layouts, incur high latency and/or cost on large datasets, and require significant human effort. The key insight of our tool, TWIX, is to infer the underlying template used to create such documents, and then extract the data, rather than extracting directly from documents. To do so, TWIX first infers the underlying fields, such as columns of tabular portions or keys in co-located key-value pairs, by leveraging their consistent location patterns (e.g., two fields in the same template repeatedly co-occur within a fixed distance apart across multiple records). TWIX then assembles these fields into a template by enforcing visual constraints, such as vertically aligning table rows with their column headers for tabular regions, and horizontally aligning keys with their values for key-value pairs. TWIX then uses this inferred template to accurately and efficiently extract data from templatized documents at a low cost. On one benchmark with 34 diverse real-world datasets, TWIX outperforms state-of-the-art structured data extraction tools (Evaporate, Textract, and Azure Document Intelligence), and vision-based LLMs like GPT-4-Vision, by over 25% in precision and recall. Another benchmark with 30 large datasets demonstrates TWIX's scalability: it is 520X faster and 3,786X cheaper than the most competitive compared tool, for extracting data from large document collections with over 2000 pages.

Data Extraction (ORG) TWIX (ORG) Textract (PERSON) Azure Document Intelligence (ORG) GPT-4-Vision (ORG)

Originally published by arXiv CS Read original →

Visual Template Inference for Data Extraction from Documents

Related Stories

Warburg CEO Calls IPO Market ‘Broken’ Even Amid Giant Offerings

'Partners and friends’: Trade and defence top of agenda at EU-South Korea summit

Trump signs $70 billion immigration funding bill after months of delay

Pay what you wish: the restaurant where customers can eat for free – if their conscience lets them