TLDR
- Reddit filed a lawsuit against Perplexity AI and three data scraping companies (Oxylabs, AWMProxy, and SerpApi) for allegedly stealing copyrighted user posts to train AI models.
- The lawsuit claims these entities masked their identities and disguised their scrapers to bypass Reddit’s protections and extract user content without permission.
- Perplexity denied the allegations and called Reddit’s lawsuit “extortion,” stating it only summarizes and cites public Reddit discussions rather than training AI models on the content.
- Reddit has been monetizing its user data through licensing agreements with companies like OpenAI and Google, with AI licensing deals making up nearly 10% of Reddit’s revenue as of February.
- This is Reddit’s second AI-related lawsuit after suing Anthropic in June, as the platform works to assert control over its user-generated content.
Reddit launched a lawsuit against artificial intelligence company Perplexity on Wednesday in New York federal court. The social media platform accused Perplexity of illegally scraping user posts to power its AI search engine.
The complaint also named three other defendants. These include Lithuanian data scraper Oxylabs, AWMProxy described as a former Russian botnet, and Texas-based startup SerpApi.
Reddit alleged these companies helped Perplexity collect its copyrighted content. The lawsuit claims they masked their identities and disguised their web scrapers as regular users to bypass Reddit’s technological protections.
Perplexity operates an AI-powered search engine that competes with Google and ChatGPT. The company denied Reddit’s allegations and accused the platform of extortion and opposing an open internet.
SerpApi told CNBC it strongly disagrees with the claims and plans to defend itself in court. CNBC was unable to reach Oxylabs and AWMProxy for comment.
Ben Lee, Reddit’s Chief Legal Officer, said AI companies are locked in an arms race for quality human content. This pressure has created what he called an “industrial-scale data laundering economy.”
The Battle Over User Data
Reddit hosts over 100,000 interest-based subreddit communities. The lawsuit states that Reddit user posts have become the most commonly cited source for AI-generated answers on Perplexity.
Reddit sent Perplexity a cease-and-desist letter about the alleged scraping. After receiving the letter, Perplexity increased the volume of citations to Reddit by forty times, according to the lawsuit.
AI researchers have noted that Reddit’s moderated conversations help AI chatbots produce more natural-sounding responses. The platform contains a large volume of authentic human discussions on various topics.
Reddit has worked to monetize its data pool through licensing agreements. The company has signed AI-related licensing deals with OpenAI and Google’s parent company Alphabet.
Perplexity Defends Its Practices
Perplexity posted a response on Reddit itself defending its actions. The company stated it does not train AI models on content but only summarizes and cites public Reddit discussions.
Perplexity claimed it is impossible to sign a license agreement given its business model. The company said that a year ago, Reddit demanded payment despite Perplexity lawfully accessing public Reddit data.
The AI company described the lawsuit as a show of force in Reddit’s training data negotiations with Google and OpenAI. Perplexity argued that Reddit is using litigation to strengthen its position in licensing talks.
Data licensing has become an important revenue source for Reddit. In February, Reddit’s COO Jen Wong told Adweek that AI licensing deals with Google and OpenAI made up nearly 10% of the company’s revenue.
This marks Reddit’s second AI-related lawsuit. The platform sued AI startup Anthropic in June over similar data scraping concerns.