Deep learning pioneer Yoshua Bengio, AI textbook author Stuart Russell, Taiwan’s representative without portfolio Audrey Tang, and 25 other top scholars jointly published a paper that systematically explains the 7 major threat patterns AI poses to democratic systems and social structures. The core argument is that even if every model is perfectly “aligned” with human values, the scale effect of AI will still internally undermine the operation of democratic governance.
(Background: When I can’t prove that I’m not an AI, forensic experts suggest: let’s create a secret code with friends and family.)
(Additional context: Anthropic launches AI Impact Dashboard: enter your job, and find out how much AI has taken over your work in seconds.)
Table of Contents
Toggle
The paper published on March 25 is titled “AI Poses Risks to Democratic and Social Systems,” and the author lineup is quite impressive. In addition to 2018 Turing Award winner Yoshua Bengio, Berkeley’s Stuart Russell, and Bernhard Schölkopf from the Max Planck Institute, there is also Audrey Tang from the Oxford Institute for AI Ethics, as well as heavyweight researchers from the University of Toronto, ETH Zurich, the University of Michigan, and other institutions.
The angle of this paper differs from most AI safety research, as current mainstream AI safety research focuses on “model-level” issues, such as hallucinations, toxic outputs, refusal behaviors, or more extreme “AI apocalypse” scenarios, etc.
However, this paper points out that a whole category of risks has been overlooked, namely the “system-level” harm caused by the large-scale deployment of AI on social systems and democratic governance.
A model producing a single toxic output can be handled with alignment techniques; however, one million compliant, polite, and policy-appropriate submissions can paralyze the government’s capacity to process public opinions, which exceeds the problems that alignment can solve.
Let’s briefly explain this paper, which breaks down the threats AI poses to governance into 7 failure modes (T1 to T7) distributed along a “governance feedback loop.” We can understand how human society usually inputs signals into the system (political expression) → the system processes these signals (public discourse) → the system feeds decisions back to society (legislation), but AI may disrupt each link as a factor.
At the “public belief” end, there are two threats.
Belief Homogenization (T1): When most people use similar trained models to think and write, the diversity of public discourse is compressed because post-training methods like RLHF systematically suppress the diversity of viewpoints in model outputs.
Belief Reinforcement (T2): Personalized AI assistants cater to the user’s existing views, and the long-term memory function enables this catering to accumulate, forming a self-confirming closed loop. Research cited shows that when GPT-4 obtains the user’s demographic data, the likelihood of persuading the user to agree with its arguments increases by over 80%.
At the “institutional processing” end, there are two risks:
Bureaucratic Congestion (T3): AI allows anyone to generate a large volume of unique, seemingly reasonable public opinion submissions at nearly zero cost, paralyzing the institution’s processing capability.
Cognitive Flooding (T4): The cost of generating credible content is far lower than the cost of verifying and correcting it, drowning the information ecosystem.
At the “institutional accountability” end, Unreviewable Authority (T5): The opacity, scale, and access barriers of AI decision-making collectively crush existing oversight mechanisms.
Norm Centralization (T6): When governments procure advanced AI models, the developers’ value constraints come along with the models into public infrastructure, effectively transferring normative power from elected officials to a few developers.
Finally, Power Concentration (T7) runs through all links.
AI simultaneously replaces human labor and participation in economic, ideological, political, and military domains, weakening the leverage that citizens use to check the system.
Historically, the concentration of power in one domain is usually balanced by countervailing forces in other domains; however, the uniqueness of AI is that it can simultaneously weaken civic leverage across all domains.
Audrey Tang contributes several key paragraphs in the paper, arguing that rather than passively defending against the institutional shocks brought by AI, it is better to fundamentally redesign the framework for participatory governance.
In response to bureaucratic congestion (T3), Tang proposes “structured deliberation platforms” as an alternative. These platforms use dimensionality reduction techniques to aggregate public opinion, allowing consensus to emerge rather than letting the loudest voices dominate. Because participants vote on existing statements rather than freely submitting text, the system structurally rewards the aggregation of positions rather than divisive discourse, making it more resistant to flood attacks from synthetic content than open comment systems.
Together with a lottery system (randomly selected citizen groups) to verify identity through “selection” rather than “self-nomination,” large-scale impersonation becomes structurally difficult.
In response to cognitive flooding (T4), Tang cites a practical case, the “Humor Over Rumor” strategy during Taiwan’s COVID-19 pandemic, where government agencies produced verified content within minutes of identifying misinformation, competing with fake news through speed and shareability rather than removal.
In response to norm centralization (T6), Tang points out that the emerging research on “collective constitutional AI” has demonstrated that through deliberative processes, a representative public sample can draft an AI constitution, and the resulting models perform comparably on safety metrics while exhibiting less bias than baselines designed by developers.
The key is that this process should be federal, allowing different political systems to reasonably derive different normative priorities; a single constitution should not exclude such variability.
The most concrete case in the paper appears in recommendation R7 (invest in deliberative governance infrastructure for AI).
In 2024, deepfake advertisements impersonating public figures spread widely on social media. Taiwan’s Ministry of Digital Affairs convened 447 randomly selected citizens to engage in online discussions in 44 virtual deliberation rooms, and the AI dialogue engine synthesized their proposals on the same day. This citizens’ assembly focused on “regulating actors and actions,” including joint liability for platforms regarding unauthorized deepfake ads, mandatory labeling for unsigned advertisements, and throttling non-compliant services, rather than taking a content moderation approach.
At that time, the ban on deepfakes received bipartisan support and led to a 94% decrease in impersonation ads within a year.
The paper presents 7 corresponding recommendations for the core risks:
The paper also directly addresses two common rebuttals. The first rebuttal is the belief that “society will adapt to AI.” However, the paper points out that while AI concentrates economic rents, it simultaneously erodes the political and organizational capacities that institutional self-correction relies on, and the speed of damage accumulation may outpace adaptation.
The second rebuttal is the idea that “AI alignment with society is sufficient.” The paper agrees that alignment is necessary but points out that certain failure modes (such as congestion attacks due to cost asymmetries and the weakening of civic leverage caused by labor substitution) will still occur even under perfect model alignment.
The conclusion of the paper mentions that institutional resilience does not need to be built from scratch; current civic tech initiatives have already proven that structured deliberation and participatory governance can operate at a national scale. However, deploying these tools in AI governance remains an open research challenge.