The government on Tuesday came out with a proposal to invite private firms including startups to contribute anonymous, non-personal datasets of their users to the AI Kosh platform.
The move aims at accelerating the development of artificial intelligence (AI) applications in India along with enabling access to enough data for training large language models (LLMs), a senior government official told ET.
As per the plan, firms such as Google, Uber and PhonePe can contribute anonymised datasets of usage patterns on their websites without revealing the identity or any other personal details of their users.
“Several companies had reached out to the government with interest in contributing to AI Kosh, so the department decided to release a standard expression of interest (EoI) so that instead of these one-on-one dealings we can give the opportunity to everyone to contribute,” said the official, who did not wish to be identified.
Availability of data, especially current and India-specific, is important for training accurate local AI models on top of which applications can be developed.
So far, private companies such as Sarvam AI, Ola Krutrim and Eka Care have contributed their non-personal datasets to the platform. The government also has datasets from several arms of the government and hosts census data, weather data from the meteorological department along with datasets from agencies such as the ministries of agriculture and mines, and the state of Telangana.

The IndiaAI Mission also signed a partnership with the Lok Sabha Secretariat and is in talks with state broadcasters Doordarshan and All India Radio for sharing archives.
“Datasets which may add a lot of value can be from companies like Google, healthcare datasets, doctor prescriptions, call centre datasets, etc; ultimately all conversational datasets will help in training the LLM,” the official said.
Other public institutions which are contributing datasets include Open Data Telangana, Indian Council of Medical Research, Digital India Bhashini Division and Ministry of Jal Shakti. Research organisations include Development Data Lab along with non-profits I-Hub for Robotics and Autonomous Systems Innovation Foundation.
Apart from 70-odd text-to-speech or generative models in Indic languages from AI4Bharat, an AI research lab at IIT Madras, AI Kosh today reportedly lists Microsoft’s smaller Phi series of models and a few specialised non-LLM models.
Google, Meta, Microsoft, Ola Krutrim, PhonePe, and Uber, did not comment on the development.
Currently, as per its website, AI Kosh offers 339 datasets and 159 AI models from 17 organisations across 15 sectors, and a library of use cases and toolkits.
As per the proposal document, the government has invited academic and research institutions, startups and companies along with non-profit and civil society organisations to contribute to the platform.
AI Kosh will not engage in data monetisation.
“Any data or dataset shared with AIKosh pursuant to this EOI shall be in compliance with the provisions of the Digital Personal Data Protection Act (DPDPA), the National Data Sharing and Accessibility Policy (NDSAP), and all other applicable laws, regulations, and policies of the Government of India governing data sharing, privacy, and security,” the document said.
The IndiaAI Datasets Platform (AI Kosh), launched on March 6 under the IndiaAI Mission with an allocation of Rs 199.55 crore, is designed to be a unified platform integrating datasets from diverse sources. These include existing government data platforms and, crucially, non-government data contributors.
The initiative, part of the larger IndiaAI Mission, seeks to provide researchers and developers with access to crucial datasets, driving advancements across various sectors.
The move aims at accelerating the development of artificial intelligence (AI) applications in India along with enabling access to enough data for training large language models (LLMs), a senior government official told ET.
As per the plan, firms such as Google, Uber and PhonePe can contribute anonymised datasets of usage patterns on their websites without revealing the identity or any other personal details of their users.
“Several companies had reached out to the government with interest in contributing to AI Kosh, so the department decided to release a standard expression of interest (EoI) so that instead of these one-on-one dealings we can give the opportunity to everyone to contribute,” said the official, who did not wish to be identified.
Availability of data, especially current and India-specific, is important for training accurate local AI models on top of which applications can be developed.
So far, private companies such as Sarvam AI, Ola Krutrim and Eka Care have contributed their non-personal datasets to the platform. The government also has datasets from several arms of the government and hosts census data, weather data from the meteorological department along with datasets from agencies such as the ministries of agriculture and mines, and the state of Telangana.

The IndiaAI Mission also signed a partnership with the Lok Sabha Secretariat and is in talks with state broadcasters Doordarshan and All India Radio for sharing archives.
“Datasets which may add a lot of value can be from companies like Google, healthcare datasets, doctor prescriptions, call centre datasets, etc; ultimately all conversational datasets will help in training the LLM,” the official said.
Other public institutions which are contributing datasets include Open Data Telangana, Indian Council of Medical Research, Digital India Bhashini Division and Ministry of Jal Shakti. Research organisations include Development Data Lab along with non-profits I-Hub for Robotics and Autonomous Systems Innovation Foundation.
Apart from 70-odd text-to-speech or generative models in Indic languages from AI4Bharat, an AI research lab at IIT Madras, AI Kosh today reportedly lists Microsoft’s smaller Phi series of models and a few specialised non-LLM models.
Google, Meta, Microsoft, Ola Krutrim, PhonePe, and Uber, did not comment on the development.
Currently, as per its website, AI Kosh offers 339 datasets and 159 AI models from 17 organisations across 15 sectors, and a library of use cases and toolkits.
As per the proposal document, the government has invited academic and research institutions, startups and companies along with non-profit and civil society organisations to contribute to the platform.
AI Kosh will not engage in data monetisation.
“Any data or dataset shared with AIKosh pursuant to this EOI shall be in compliance with the provisions of the Digital Personal Data Protection Act (DPDPA), the National Data Sharing and Accessibility Policy (NDSAP), and all other applicable laws, regulations, and policies of the Government of India governing data sharing, privacy, and security,” the document said.
The IndiaAI Datasets Platform (AI Kosh), launched on March 6 under the IndiaAI Mission with an allocation of Rs 199.55 crore, is designed to be a unified platform integrating datasets from diverse sources. These include existing government data platforms and, crucially, non-government data contributors.
The initiative, part of the larger IndiaAI Mission, seeks to provide researchers and developers with access to crucial datasets, driving advancements across various sectors.