Mining the commons: AI extraction, Wikipedia, and the case for a multi-stakeholder settlement
Acknowledgements
Vasilis Kostakis acknowledges support from the Estonian Centre of Excellence in Energy Efficiency, ENER (grant TK230).
Wikipedia was built on a distinctive promise: if volunteers pooled their time, knowledge, and care, the result would be a shared resource no single actor could own or capture. In the age of generative AI, that commons is being mined at industrial scale by a handful of well-capitalised firms that convert volunteer labour into proprietary models worth billions.
The Wikimedia Foundation’s recent paid-access deals with Microsoft, Meta, Amazon, and others can be read narrowly as cost recovery (Murti, 2026). And on those terms, monetising high-volume AI traffic looks like overdue common sense. But framed within the political economy of the digital commons, the deals look uncomfortably close to enclosure; the process, familiar from early modern land privatisation, in which a shared resource is fenced off for private gain. Since the start of the AI boom, Wikipedia has functioned less as a neutral knowledge resource and more as an involuntary data centre for Big Tech.
Automated requests to Wikimedia projects have grown exponentially (Mueller et al., 2025); bandwidth for downloading multimedia rose 50 percent from January 2024, with scraper bots accounting for 65 percent of resource-intensive traffic. A handful of AI actors have, in effect, turned a public knowledge commons into a private extraction layer.
Wikipedia’s financial model long relied on small reader donations to cover server costs. The AI boom disrupted this: scrapers became the heaviest users, offering no social value like teachers or readers while firms evaded costs and took traffic away from Wikipedia, leaving donors unwittingly subsidising proprietary AI training. The same companies that position themselves as responsible AI leaders chose to hammer Wikipedia through aggressive scraping rather than using its existing APIs or negotiating sustainable access. Only once the bandwidth crisis became unsustainable – and Jimmy Wales publicly urged Google, OpenAI, and others to stop scraping and start paying for the Enterprise API in November 2025 (Perez, 2025) – did the Foundation begin pursuing licensing deals to offset the server costs that scraping had imposed.
Commons-based peer production under pressure
Wikipedia has been the flagship of commons-based peer production (Benkler, 2006). The “bargain” has been simple but profound: contributors retain moral ownership of their work while pooling it under commons-oriented licences so that knowledge can circulate freely. No single actor is supposed to be able to fence the resource or dictate its terms.
Mass AI scraping breaks this deal, legally fine, but not ethically. Once edits are fed into opaque models, individual contributions become proprietary data points. Contributors can’t say “yes for public good, no for corporate use,” and no value flows back from models to the commons. Open inputs justify closed outputs.
This is why the licensing deals feel like more than cost recovery. They crystallise a deeper structural pattern: commons-based projects supply raw material and legitimacy, while AI platforms capture the margins. These unequal returns are not an unfortunate side-effect; they are structurally embedded whenever open infrastructures function as free extraction layers for closed systems.
Digital Public Goods and the risk of capture
In early 2025, Wikipedia was formally recognised by the Digital Public Goods Alliance as a Digital Public Good – an open, globally relevant digital resource aligned with the Sustainable Development Goals and governed in the public interest (Sophia & Hu, 2025). Digital Public Goods (DPG) are supposed to be shielded from precisely this kind of capture. They require financing models commensurate with their public value, not models that make them fiscally dependent on their most extractive users. When the sustainability of a DPG hinges on a small oligopoly of AI firms, the risk turns political: agenda-setting and governance drift toward those who can threaten to walk away.
Wikipedia indisputably needs funding for infrastructure, security, legal defense, community support, and product development. Treating financial sustainability as taboo only risks burnout or corporate co-optation. The real question is: how and by whom should the digital commons be financed?
A useful starting point is to treat high-volume AI use not as ordinary traffic but as privileged commercial activity subject to a commons levy. AI firms are not merely “readers at scale”; they are converting a DPG into a core input of profit-making infrastructure. Pricing their access accordingly is not enclosure, but a minimal condition for reciprocity. What tips over into enclosure is when the terms of payment are opaque, concentrated, and negotiated without meaningful community oversight.
Mechanisms to strengthen the commons could include earmarking AI licensing revenue for under-resourced language editions, community governance, and contributor support; capping any single corporate client or sector’s share of Wikimedia’s budget to prevent capture; and pursuing public/philanthropic funding to diversify revenue.
Yet Wikimedia’s predicament converges with a wider crisis across commons-based production. Projects dependent on donations and grants are discovering how unreliable those streams can be: maintainers burn out, grants lapse, priorities shift, and critical infrastructure ends up running on unpaid labour until something breaks (Gooding, 2024). Even high-profile failures like Heartbleed – a 2014 bug in OpenSSL, a cryptographic library underpinning much of the internet that at the time survived on roughly $2,000 a year in donations – produced only short bursts of emergency funding for OpenSSL (Roman, 2014), illustrating how contingent crisis-driven philanthropy really is. “Just get more grants” is not a strategy; it is a stopgap that leaves essential commons exposed to the moods of donors and news cycles.
Toward a multi-stakeholder settlement
Wikipedia and similar DPGs cannot sustain themselves on a fragile mix of donations, sporadic philanthropy, and ad-hoc corporate generosity. What’s needed is a multi-stakeholder settlement in which large-scale users of the commons take on long-term, structured obligations to sustain it: contractual funding through paid APIs and usage-based levies, formal recognition of DPGs as Digital Public Infrastructure to unlock multilateral co-financing, and a shift in philanthropy from one-off project grants to sustained core support for the institutions that maintain the commons.
With the DPGA now recognising DPG financing as a priority for collaborative action (DPGA Secretariat, 2026), a rare institutional alignment is emerging – one that could deliver what ad-hoc donations and opaque corporate deals have not. Whether Wikipedia stays commons-first or becomes a thin nonprofit shell for others’ AI roadmaps is a governance choice, and the window is narrowing.






