Why India wants AI to be sovereign

Key Concepts

Indic AI: Large Language Models (LLMs) specifically trained on India’s 22 scheduled languages to ensure accessibility for non-English speakers.
Sovereign AI: The concept of building indigenous AI infrastructure (local data centers, local LLMs, and local data storage) to ensure national security and reduce dependence on foreign technology.
Parameters: The variables an AI model learns during training; higher parameter counts generally correlate with increased model complexity and capability.
Frontier Models: Highly advanced, large-scale AI models (e.g., GPT-4, Claude) developed by global tech giants.
Data Sovereignty: The principle that the entity creating the data holds the rights to it, a key pillar of India’s national AI strategy.

1. The India AI Impact Summit: Key Developments

The summit served as a major gathering for global AI leaders (OpenAI, Anthropic, Google, Mistral) and Indian policymakers. India is viewed as a critical market due to its nearly 1 billion internet users, who provide both a massive data pool for training and a significant subscription base.

Global Expansion: OpenAI announced new offices in Bangalore and Mumbai (adding to their Delhi presence). Anthropic launched its first India office in Bangalore.
Infrastructure Investment: Indian billionaires committed massive capital to AI infrastructure:
- Gautam Adani: Announced a $100 billion investment in data centers.
- Mukesh Ambani: Committed $110 billion toward data centers and general-purpose AI, emphasizing that these are strategic, non-speculative investments.
Government Funding: The Indian government has launched the "India AI Mission" with a $1.5 billion corpus and the "Anusandhan Research Fund," which aims to disperse $5.5 billion for R&D.

2. The Push for "Indic AI"

The primary motivation for developing local LLMs is to democratize AI for the masses.

Linguistic Inclusion: With 22 scheduled languages, a large portion of the Indian population is not proficient in English. Indic AI aims to allow users (e.g., farmers) to query systems in their native languages like Tamil or Bengali.
Mitigating Bias: A study by the MIT Center for Constructive Communication found that when LLMs are queried by non-proficient English speakers, the responses can be condescending or suboptimal. Local models are designed to capture linguistic nuances and cultural context that global models often miss.
Geopolitical Security: Officials argue that relying solely on American-built frontier models poses a risk to national intelligence and defense. If foreign firms were forced to "pull the trigger" or restrict access, the Indian ecosystem could collapse.

3. Local Players vs. Global Giants

Two notable Indian entities have emerged to challenge global incumbents:

Sarvam AI: A VC-backed startup (investors include Lightspeed, Peak XV, and Kalaari Capital) that launched LLMs with 30 billion and 105 billion parameters.
BharatGen: A government-sponsored, non-profit initiative that launched a 17-billion-parameter model, fully bankrolled by the government ($140 million investment).

Strategic Differentiators:

Efficiency over Size: Unlike global firms chasing massive frontier models, Indian players are building smaller, "frugal" models that are custom-made for specific local use cases.
Primary Data Collection: Indian firms are using "feet on the street" to collect primary data from libraries, out-of-print books, and community radio stations to capture local dialects and cultural nuances.
Language Coverage: While global models typically support 10–12 Indian languages, local models aim to cover all 22 scheduled languages.

4. Challenges and Market Realities

Despite the nationalistic push, local companies face significant hurdles:

Capital Disparity: Global giants have vastly superior funding and are aggressively marketing in India (e.g., sponsoring the Indian Premier League). Local firms cannot match this level of spending.
Incumbent Adaptation: Global players are not standing still. Anthropic, for instance, has partnered with Karia to improve its primary data collection and linguistic accuracy in India.
The "Fiefdom" Problem: Global AI companies have already established a strong foothold in the Indian market. Breaking this dominance will require more than just sovereign sentiment; it will require sustained business viability.

Notable Quotes

Prime Minister Narendra Modi: "Whoever creates the data has rights over the data."
Rishi Bagla (CEO, BharatGen): "The frontier models are great, but they are like 100 PhDs rolled into one. My task doesn't need 100 PhDs... then why not build models that serve my use case?"

Synthesis

India is positioning AI as a "force multiplier" for public good, viewing it as critical infrastructure akin to energy. While the country is fostering a sovereign AI ecosystem to ensure linguistic inclusivity and geopolitical independence, the path forward is difficult. The success of local players like Sarvam and BharatGen will depend on their ability to provide superior local context and efficiency, as they cannot compete with the sheer financial and technical scale of global incumbents like OpenAI and Anthropic.