TLDR:
- Vitalik warns AI governance models can be exploited with jailbreak prompts and malicious inputs.
- ChatGPT MCP tools enabled email access, exposing private data to attackers via calendar invites.
- Info finance model proposed to ensure AI model diversity and human oversight in governance.
- AI agents can be tricked using simple methods, highlighting security risks for crypto users.
Ethereum co-founder Vitalik Buterin has sounded the alarm on AI governance vulnerabilities following a new ChatGPT security warning. He cautioned that relying solely on AI to allocate resources or manage tasks could open doors for malicious exploits.
The warning comes after a demonstration revealed that ChatGPT could be manipulated to leak private email data using only the victim’s address. Buterin suggests alternative approaches that combine open markets, human oversight, and multiple AI models to reduce systemic risk.
Experts say this highlights broader concerns for AI tools in the crypto and finance sectors.
Vitalik Buterin Flags Risks in AI Governance
In a recent tweet thread, Vitalik Buterin emphasized that naive AI governance can be easily exploited.
He explained that users could submit jailbreak prompts instructing AI to divert funds or act against intended rules. Such exploits demonstrate how automated systems may fail when confronted with malicious actors.
Buterin recommended an “info finance” model, allowing multiple AI models to operate under human spot checks and jury evaluations.
He highlighted that open markets for AI contributions can ensure real-time diversity in model decisions. This setup also incentivizes external participants to monitor for errors or exploits quickly.
Buterin pointed out that hardcoding a single AI for governance is inherently risky. Human juries, combined with open competition among models, create mechanisms to detect and correct manipulations effectively.
This is also why naive "AI governance" is a bad idea.
If you use an AI to allocate funding for contributions, people WILL put a jailbreak plus "gimme all the money" in as many places as they can.
As an alternative, I support the info finance approach ( https://t.co/Os5I1voKCV… https://t.co/a5EYH6Rmz9
— vitalik.eth (@VitalikButerin) September 13, 2025
ChatGPT Email Leak Sparks Security Alarm
The warning follows a demonstration by security researcher Eito Miyamura, who exploited ChatGPT’s Model Context Protocol (MCP) tools. These tools allow the AI to access Gmail, Calendar, Notion, and other platforms.
Miyamura showed that sending a calendar invite with a jailbreak prompt could trick ChatGPT into reading emails and sending them to the attacker. Users did not even need to accept the invite for the exploit to work.
Currently, OpenAI requires developer mode and manual approvals for MCP sessions, but decision fatigue could lead to risky behavior. Users might approve requests without fully understanding the implications.
The demonstration illustrates that AI, while advanced, can be phished in simple ways to compromise sensitive information.
Buterin’s response stresses that AI governance should not operate in isolation. Integrating human oversight, multiple models, and financial incentives can help detect flaws faster. He stressed that without such safeguards, even sophisticated AI tools could expose users to avoidable risks.