Home › Knowledge Base › Agentic Data Engineering

Agentic Data Engineering

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Exploring Autonomous Agentic Data Engineering for Model Specialization

arXiv:2605.30407v2 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based data curation methods primarily rely on human-designed workflows, leaving it unexamined whether LLMs can autonomously execute an end-to-end data engineering pipeline for model specialization. We formalize Autonomous Agentic Data Engineering, a...

arXiv CS 1d ago

Exploring Autonomous Agentic Data Engineering for Model Specialization

arXiv:2605.30407v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based data curation methods primarily rely on human-designed workflows, leaving it unexamined whether LLMs can autonomously execute an end-to-end data engineering pipeline for model specialization. We formalize \textbf{Autonomous Agentic Data...

arXiv CS 9d ago

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

Computer Science > Software Engineering [Submitted on 20 Jan 2026] Title:Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering View PDF HTML (experimental)Abstract:LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing.

Hacker News 3d ago

TadA-Bench: A Million-Variant Benchmark for Future-Round Discovery Toward Agentic Protein Engineering

Announce Type: cross Abstract: AI for scientific discovery is entering an agentic era, where protein-engineering systems are expected to prioritize future wet-lab experiments rather than merely fit static measurements. We introduce TadA-Bench, a million-variant wet-lab replay benchmark from 31 TadA directed-evolution rounds for future-round discovery toward agentic protein engineering. TadA-Bench preserves the campaign chronology and defines a fixed-data replay task: given earlier...

arXiv CS 7d ago

Be Fair! Can Machine Learning Engineering Agents Adhere to Fairness Constraints?

Announce Type: new Abstract: Machine learning engineering (MLE) agents promise to automate end-to-end ML pipeline development from raw data and natural language instructions, potentially making ML accessible to non-technical domain experts. However, in sensitive and regulated domains, this abstraction creates a responsibility gap: end-users may lack visibility into design choices that affect correctness, robustness, fairness, and regulatory compliance.

arXiv CS 6d ago

Building a LangGraph pipeline for production data engineering

LangGraph is becoming the default framework for teams building agentic AI workflows. That is both a good thing and a problem. The good part: it has real production pedigree, is actively maintained, and is used by teams doing serious work.

Hacker News 10d ago

Embody4D: A Generalist Data Engine for Embodied 4D World Modeling

arXiv:2605.01799v2 Announce Type: replace Abstract: Embodied agents require robust and comprehensive 3D spatiotemporal representations to support spatial reasoning, manipulation understanding, and downstream decision making. However, existing robot data are typically captured from fixed or sparse viewpoints, providing only partial and view-dependent observations, which limits multi-view perception and generalization across viewpoints. Given the difficulty of collecting additional viewpoints...

arXiv CS 1d ago

Data Flow Control: Data Safety Policies for AI Agents

Announce Type: new Abstract: Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released.

arXiv CS 5d ago

FORGE: Multi-Agent Graduated Exploitation and Detection Engineering

arXiv:2606.03453v1 Announce Type: new Abstract: Vulnerability disclosure volumes now far exceed organizational assessment capacity, yet three adjacent research communities (proof-of-concept generation, vulnerability prioritization, and detection rule engineering) operate largely in isolation. Existing automated exploit generation systems report binary pass/fail outcomes, discarding partial progress and producing no signal for the other two communities. This paper presents FORGE, a...

arXiv CS 7d ago

MicroGrowAgents: An Agentic AI System for Microbial Cultivation Engineering

Microbial cultivation optimization remains labor-intensive and inefficient, requiring extensive experimental screening to identify suitable growth conditions. Traditional one-factor-at-a-time approaches are particularly ineffective for exploring complex, multidimensional nutrient parameter spaces. We present MicroGrowAgents, an AI-driven, agent-based system that automates the design of optimized growth media through integration of knowledge graphs, metabolic modeling, and optimal...

bioRxiv 5d ago