Home › Knowledge Base › Data Extraction

Data Extraction

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Visual Template Inference for Data Extraction from Documents

arXiv:2501.06659v2 Announce Type: replace Abstract: Many templatized documents are programmatically generated from structured data following a visual template. Such documents include invoices, tax documents, financial reports, and purchase orders. Effective data extraction from these documents is crucial to support downstream analytical tasks.

arXiv CS 1d ago

Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents

arXiv:2606.06242v1 Announce Type: new Abstract: Institutional documents contain substantial amounts of operational and analytical information embedded within figures and tables. Current approaches for extracting visual content from documents are largely built around generic document layout analysis, where figures and tables are treated as uniformly relevant document objects rather than semantically meaningful analytical artifacts. In this work, we introduce a benchmark dataset and evaluation...

arXiv CS 5d ago

Wind Turbine Maintenance Log Labelling Framework: LLM-Driven Data Correction and Enrichment via Semantic Extraction of Reliability Intelligence

Announce Type: new Abstract: As wind turbine fleets age, data-driven reliability engineering is essential to optimise their operation and maintenance for service life extension and levelised cost of energy reduction. Failure event descriptions within historical maintenance logs are a source of valuable reliability intelligence.

arXiv CS 9d ago

Peacemaker at ATE-IT: Automatic term extraction from Italian text for waste management data using encoder model

arXiv:2606.01469v1 Announce Type: new Abstract: The development of automatic term extraction has become increasingly important in modern technology. Automatic term extraction can be found in virtually every search engine that is currently available to users. Recent advancements have provided promising results for the extraction of automatic terms; however, accurate labeling is difficult because of several factors, such as the limited number of annotated documents available for training and...

arXiv CS 8d ago

Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan

arXiv:2606.09767v1 Announce Type: new Abstract: Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping. To ensure data sovereignty, this study introduces a data synthesis methodology to bootstrap NMT models without scraping target-language parallel text. Focusing on Q'eqchi' Mayan, we transformed community-sourced dictionaries into a massive synthetic corpus, utilizing...

arXiv CS 1d ago

FitED: A User-Centric, Extensible Software Environment for Robust Peak-Profile and General Functional Data Fitting

Announce Type: replace Abstract: Reliable parameter extraction from experimental data is essential for quantitative analysis across spectroscopy, diffraction, photoluminescence, chromatography, microscopy, and time-resolved measurements. However, nonlinear fitting often remains difficult to reproduce, especially when complex models, correlated parameters, uncertain derived quantities, and user-dependent fitting choices are involved. We present FitED, a Python-based desktop application for...

arXiv Physics 9d ago

FitED: A User-Centric, Extensible Software Environment for Robust Peak-Profile and General Functional Data Fitting

Announce Type: replace-cross Abstract: Reliable parameter extraction from experimental data is essential for quantitative analysis across spectroscopy, diffraction, photoluminescence, chromatography, microscopy, and time-resolved measurements. However, nonlinear fitting often remains difficult to reproduce, especially when complex models, correlated parameters, uncertain derived quantities, and user-dependent fitting choices are involved. We present FitED, a Python-based desktop application...

arXiv CS 9d ago

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

Announce Type: new Abstract: Systematic reviews rely on forest plots to synthesise quantitative evidence across biomedical studies, but generating them remains a fragmented and labour-intensive process. Researchers must interpret complex clinical texts, manually extract outcome data from trials, define appropriate interventions and comparators, harmonise inconsistent study designs, and carry out meta-analytic computations-typically using specialised software that demands structured inputs...

arXiv CS 8d ago

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

arXiv:2606.02403v2 Announce Type: replace Abstract: Systematic reviews rely on forest plots to synthesise quantitative evidence across biomedical studies, but generating them remains a fragmented and labour-intensive process. Researchers must interpret complex clinical texts, manually extract outcome data from trials, define appropriate interventions and comparators, harmonise inconsistent study designs, and carry out meta-analytic computations-typically using specialised software that...

arXiv CS 6d ago

Beyond One-shot: AI Agents for Learning in Field Experiments

arXiv:2606.02458v1 Announce Type: new Abstract: Organizations routinely run experiments for A/B testing, yet the data generated from one experiment is underutilized to inform subsequent intervention design. Significant barriers exist to extracting actionable knowledge from prior experimental data to inform new interventions. We study whether tool-augmented agentic AI can automatically learn from experimental data to generate new interventions in subsequent experiments.

arXiv CS 8d ago