<aside>
Principal Authors: Michael Minkoff, Sivaramakrishnan Balasubramanian, Nupur Mishra, Kaustav Ghosal
</aside>
The authors gratefully acknowledge the generous time and insights shared by Jagadish Babu (COO, EkStep Foundation), JC Tripathi (Director, Agriculture, Wadhwani AI), Kumar Rajamani (Associate Director, CropIn), Praveen Pankajakshan (Chief AI Scientist, Urban Kisaan; Apex Committee member overseeing India's AI Centres of Excellence; Advisory Committee Member, Harvard Data Science Initiative), Parameswaran Iyer ([email protected]), and Nikhil Toshniwal (AgRevolution / DeHaat). Their consultations, reviews, and comments substantially enriched the clarity and scope of this discussion paper. We also acknowledge the contributions of several additional experts who offered guidance and commentary but preferred to remain unnamed. Finally, sincere thanks to the DevGlobal and DevAfrique teams for their thoughtful review, comments, and support on final production.
| Abbreviation | Full Form/Definition |
|---|---|
| ACRE Africa | Agriculture and Climate Risk Enterprise – Provides weather-index-based insurance to smallholder farmers in Africa. |
| AI | Artificial Intelligence – Systems that simulate human intelligence, used in agriculture for personalized advice, diagnostics, and predictions. |
| API | Application Programming Interface – A set of tools and protocols that allow different software systems to communicate and exchange data. |
| APS | American Phytopathological Society – Referenced here in context of image databases used for pest and disease identification. |
| CARE | Collective Benefit, Authority to Control, Responsibility, and Ethics – Ethical principles for data governance, especially for Indigenous and marginalized communities. |
| CGIAR | Consultative Group on International Agricultural Research – A global partnership of research institutions focused on food security and agriculture. |
| CRM | Customer Relationship Management – Systems used to manage interactions and data related to users (e.g., farmers). |
| DAAS | Digital Agriculture Advisory Services – Part of India’s AgriStack; it aims to serve as a federated data utility. |
| DBT | Direct Benefit Transfer – A system used in India to transfer subsidies or payments directly to beneficiaries’ bank accounts. |
| ETL | Economic Threshold Level – The pest population level at which control measures should be implemented to prevent crop loss. |
| FAIR | Findable, Accessible, Interoperable, Reusable – A set of principles that guide responsible data management and sharing. |
| FAO | Food and Agriculture Organization of the United Nations – An international agency working on hunger eradication and sustainable agriculture. |
| FCDO | Foreign, Commonwealth & Development Office (UK) – Cited for its evaluation of GODAN. |
| FPO | Farmer Producer Organization – A farmer-owned group that improves access to inputs, services, and markets. |
| GARDIAN | Global Agricultural Research Data Innovation & Acceleration Network – CGIAR’s platform for open agricultural data. |
| GODAN | Global Open Data for Agriculture and Nutrition – An initiative promoting open data for agriculture and nutrition. |
| ICAR | Indian Council of Agricultural Research – Apex body in India for agricultural education and research. |
| IDH | The Sustainable Trade Initiative – An organization supporting sustainable trade through partnerships and data sharing. |
| ILRI | International Livestock Research Institute – Research center focusing on livestock in developing countries. |
| INRIA | French Institute for Research in Computer Science and Automation – Referenced for its work on agriculture and digital technology. |
| IoT | Internet of Things – Network of connected physical devices that collect and exchange data (e.g., farm sensors). |
| IPCC SRCCL | Intergovernmental Panel on Climate Change – Special Report on Climate Change and Land – A global scientific report on climate risks to land systems. |
| IVR | Interactive Voice Response – Automated phone system used for delivering agricultural advisories in local languages. |
| KYC | Know Your Customer – A process to verify the identity of users; used in linking farmers to financial services. |
| LMICs | Low- and Middle-Income Countries – Countries with lower income levels, where most smallholder farmers are located. |
| MIS | Management Information System – A tool or platform used to store and manage data, such as livestock health records. |
| NASA POWER | Prediction of Worldwide Energy Resources – A NASA dataset useful for weather-based agricultural planning. |
| NGO | Non-Governmental Organization – Non-profit groups that often provide agricultural training, services, and data. |
| NOAA | National Oceanic and Atmospheric Administration – Provides climate and weather data, used in agriculture. |
| OD4D | Open Data for Development – A global partnership to support open data initiatives. |
| PMFBY | Pradhan Mantri Fasal Bima Yojana – India’s flagship crop insurance scheme. |
| PM-KISAN | Pradhan Mantri Kisan Samman Nidhi – India’s income support scheme for farmers. |
| PoC | Proof of Concept – A small-scale pilot project used to test the feasibility of an idea before wider rollout. |
| PxD | Precision Development – A nonprofit organization delivering personalized agricultural advice via mobile platforms. |
| RAG | Retrieval-Augmented Generation – AI technique where a model retrieves documents before generating a response, improving accuracy. |
| SSA | Sub-Saharan Africa – A region of Africa south of the Sahara Desert. |
| UFSI | Unified Farmer Service Interface – A digital switchboard under AgriStack that connects databases across government services. |
| VISTAAR | Virtually Integrated System to Access Agricultural Resources – A federated, AI-powered digital advisory platform launched by India’s Ministry of Agriculture. |
Farmers worldwide are navigating increasingly complex challenges, with smallholders particularly affected by climate variability, pest and disease outbreaks, soil degradation, volatile markets, and the growing demand to produce food sustainably. Approximately 733 million individuals experienced food insecurity in 2023 alone, underscoring the fragility of current agricultural systems and their limited capacity to meet global nutritional needs (FAO et al. 2024). While traditional agricultural knowledge remains vital, it often falls short in addressing the interconnected nature of today’s risks (Thudumu & Fisher, 2025). The COVID-19 pandemic further revealed systemic vulnerabilities in food systems, reinforcing the urgency for more resilient and inclusive agricultural frameworks (IARJSET, 2024; IPCC SRCCL Chapter 5, 2023). Responding to these evolving pressures requires a transformation toward more resilient, inclusive, and data-informed agricultural systems. In this context, technological innovations are playing a growing role in equipping farmers with tools to manage complexity and uncertainty. Among these innovations, Artificial Intelligence (AI) stands out for its ability to analyze large and varied datasets and generate tailored, location specific advisories (Asolo et al., 2024). By integrating information on weather, pests, soils, and markets, AI supports better farm level decisions while also advancing broader goals of climate resilience, resource efficiency, and equitable access to timely agricultural knowledge (Mana et al., 2024; Umar, 2023). AI technologies, such as precision farming and data-driven advisory systems, significantly contribute to this goal by empowering marginalized smallholder farmers with valuable insights and resources. By narrowing information gaps, these tools enable farmers to optimize resource use, make informed decisions, and ultimately improve their productivity and livelihoods (Gikunda, Kinyua. 2024).
However, the potential of AI in agriculture heavily depends on overcoming structural barriers such as climate stress, resource degradation, weak and fragmented infrastructure, and limited access to digital tools (Mana et al., 2024). Scaling AI solutions for a resilient agricultural future will require integrated data systems, affordable digital services, and robust financial support through targeted credit, insurance, incentives, and subsidy schemes to ensure widespread adoption and sustained impact. Additionally, obtaining trustworthy, curated, geo-tagged, and minimally noisy multi-seasonal data remains critical for effective agricultural AI, emphasizing quality over quantity.
Against this backdrop, this discussion paper explores the current landscape of, and opportunities to further enhance, agricultural data corpora for inclusive, equitable, and impactful AI-driven advisory systems. To do this, the paper draws on use cases from India and Sub-Saharan Africa (SSA), literature review, and a limited set of expert consultations, highlighting both learnings from existing solutions and initiatives, and recommended areas to focus in future implementation, investments, policies, and discussions, and looking across the broader suite of relevant stakeholders involved in effective corpora building and governance, including policymakers, academic institutions, financial providers, government agencies, and private sector actors.
Central to enhancing the promise of AI-powered advisory solutions for small-scale agricultural producers is the development of robust, adaptive, context-aware, and unbiased federated data corpora. These corpora must integrate high-quality datasets reflecting local agricultural realities, ensuring location-specific relevance while preserving privacy through trusted intermediaries, clear governance frameworks, consent management, and privacy-preserving technologies. Federated data systems enable decentralized data sharing across multiple actors without requiring central data pooling, thereby safeguarding privacy and encouraging participation (Kairouz et al., 2021). Building agricultural data corpora ideally involves inclusive collaboration across the agri-food system, from farmers and producer organizations to researchers, digital innovators, agronomists and policymakers. Such multi-stakeholder engagement ensures that corpora-building efforts consider, and seek to reflect, diverse agro-ecological contexts and farming practices, foster trust, embed co-design, and facilitate development of context-tailored AI tools (INRIA, 2022).
Data quality, relevance, and interpretability improve significantly when farmers, extension workers, co-operatives, farmer support organizations and local institutions are involved not only as data providers but also as co-designers and feedback loops in the system (Eastwood et al., 2019; Bronson, 2019). This participatory approach also strengthens trust, ownership, and inclusivity, addressing concerns around extractive data collection and use, and opaque algorithms behind AI advisory outputs. Importantly, interoperability standards such as open APIs and shared metadata protocols are key enablers of a federated approach, allowing diverse systems to exchange and interpret data in meaningful ways (FAO & ITU, 2022). Furthermore, adaptive governance frameworks anchored in fairness, transparency, and equitable benefit sharing help ensure that federated agricultural data corpora remain responsive to evolving climate conditions, policy shifts, and market trends (Stringer et al., 2020). Ultimately, such systems not only improve the precision of AI-driven advisories but also enhance their practical applicability and long-term sustainability in real-world farm settings.
The Role of Localized Data. Maize farmers in drought-prone Karnataka require vastly different advice from counterparts in humid Bihar, where fertile alluvial soils increase pest threats. Similarly, rice growers in West Bengal’s saline coastal belt face unique challenges compared to wheat farmers in waterlogged Punjab. In Kenya, maize farmers in Kitui dealing with drought conditions have distinct needs compared to those in Trans-Nzoia, where high humidity and fertile soils increase susceptibility to fungal diseases. Therefore, for AI tools to genuinely benefit farmers, the underlying data must deeply reflect these diverse agronomic conditions (Aroba and Rudolph, 2024; Munyao, 2024).
Developing a strong federated data corpus requires thoughtfully integrating both globally trusted agricultural knowledge and locally contextualized insights. Many foundational elements, such as crop growth stages, pest and disease life cycles, soil health parameters, nutrient management protocols, integrated pest management (IPM) strategies, and climate-resilient farming practices, can be drawn from global best practices (CGIAR, 2021; World Bank, 2022). These can then be enriched by local inputs, including vernacular content (Bhashini initiative launched by Govt. of India), traditional practices, region-specific cropping calendars, and microclimatic data, which fine-tune AI models to reflect real-world farming conditions.
Critically, an expert consultation with an AI specialist working closely with government and private sector stakeholders highlighted the importance of viewing datasets as layered across the hierarchy of Observation → Data → Information → Knowledge. Raw observations become structured data, analysed to generate meaningful information, ultimately forming actionable knowledge. This layering allows curated knowledge to be stored, retrieved, and continuously enriched by new insights, requiring ongoing human oversight (human-in-the-AI-loop) to ensure accuracy, trustworthiness, and local relevance.