AgriGov: A Structured Multilingual Dataset Curation for Indian Government Schemes for Farmers

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Mohsina Bilal, Gopakumar G 1 min read

Key Points

arXiv:2606.08272v1 Announce Type: new Abstract: AgriGov is a curated, trilingual (English-Hindi-Marathi) dataset designed to address the scarcity of domain-grounded multilingual resources for agricultural policies and farmer welfare schemes. Initially, we collected and structured data from 50 government schemes sourced from trusted portals using automated scraping techniques, organizing it into predefined semantic fields (e.g., title, eligibility, application process, documents, exclusions). Translations were performed using a pipeline combining Google Translate API, MarianMT, and human post-editing, resulting in a domain-specific Hindi-Marathi dataset comprising approximately 2100 source segments. To enhance coverage, we augmented this dataset with sentences from the Samanantar corpus, leading to approximately 8,000 sentence-aligned Hindi-Marathi parallel pairs. The dataset now offers robust resources for fine-tuning machine translation models in this domain. AgriGov is designed for applications in domain-adaptive machine translation, question answering, information retrieval, and summarization systems. Its key contribution is a schema-driven, human-corrected multilingual alignment pipeline that ensures domain fidelity, provides provenance, and supports reproducible experiments, enabling retrieval-augmented applications for farmer-facing tools.

AgriGov (PERSON) Indian Government Schemes for Farmers arXiv:2606.08272v1 Announce Type (ORG) Hindi-Marathi (ORG) Samanantar (PERSON)

Originally published by arXiv CS Read original →

AgriGov: A Structured Multilingual Dataset Curation for Indian Government Schemes for Farmers

Related Stories

Labour slams ‘appalling’ Elon Musk after Belfast riots

Japanese manga fans urge Trump to stop using characters in his online posts

Taking a Week to Count Votes Is Doing It Wrong

The vanishing of Nicolás Maduro: how the former dictator is being erased from Venezuela