Rise of ChatGPT Could Threaten Future AI Development, New Study Warns

Read Time:6 Minute, 57 Second

As people rely more on AI tools like ChatGPT, human-generated content online is shrinking, posing potential risks for the future training of advanced AI models.

A recent study from Corvinus University of Budapest highlights a surprising paradox in the age of artificial intelligence: the success of AI chatbots like ChatGPT may be undermining the very foundation needed for training future AI systems. Led by Associate Professor Johannes Wachs and colleagues from University College London (UCL) and LMU Munich, the research shows a significant drop in human-generated content on platforms like Stack Overflow, a go-to knowledge hub for programmers, since the release of ChatGPT.

AI’s Dependence on Human Knowledge

Large Language Models (LLMs) like ChatGPT are trained on vast datasets of human-generated content, drawing on conversations, articles, discussions, and other online resources to learn language patterns, facts, and problem-solving approaches. This data is essential for LLMs to develop responses that feel natural and informative. However, with more people turning to ChatGPT and similar AI tools to answer questions, the need to ask and answer questions on forums and other online platforms is decreasing, resulting in fewer human-generated contributions online.

Professor Wachs’ team studied trends on Stack Overflow, a popular question-and-answer platform for programmers, to assess how ChatGPT is changing online user behaviour. The results are striking: since the arrival of ChatGPT, fewer questions and answers have been posted by humans, and fewer people visit the site. The drop in activity means there is less human-generated data to help train future AI models, potentially setting up a vicious cycle where AI tools rely increasingly on AI-generated data, which may degrade the quality of future models.

Decline in Content Creation: What the Numbers Reveal

Stack Overflow has long been a vital resource for programmers worldwide, providing solutions to coding problems and a space for discussion. Yet, since ChatGPT became available, many users are choosing it over the traditional Q&A format, leading to a marked drop in new contributions.

In their study, Wachs and his colleagues observed that this decline is not only in the number of questions and answers posted but also in the frequency of visits. As reliance on AI increases, there is less motivation for users to create new content, leading to a reduction in the variety and richness of data available for AI training. Professor Wachs explains that this decline could have far-reaching consequences for the future of AI.

“The decreased production of open data will limit the training of future models,” says Professor Wachs. “LLM-generated content itself is likely an ineffective substitute for training data generated by humans. Training an LLM on LLM-generated content is like making a photocopy of a photocopy, providing successively less satisfying results.”

The Quality Conundrum: Why AI Needs Human-Generated Content

LLMs are designed to learn from human interactions to produce responses that are accurate, insightful, and contextually appropriate. Human-generated content is rich in natural errors, unique perspectives, and cultural context that help AI models grasp language and knowledge with accuracy. In contrast, AI-generated content tends to lack the originality and authenticity found in human discourse.

When AI models are trained on AI-generated data, they may start to produce responses that are increasingly repetitive, generic, or even misleading. Professor Wachs compares this process to “making a photocopy of a photocopy,” where the quality degrades with each iteration. Over time, this could result in AI models that struggle to offer the depth of insight or creativity that comes naturally to humans.

Why Stack Overflow and Platforms Like It Are Critical

Platforms such as Stack Overflow are indispensable for AI development because they offer diverse, high-quality datasets created by people solving real-world problems. Programmers discussing code, troubleshooting issues, and sharing innovative solutions make Stack Overflow a unique repository of knowledge. The platform’s Q&A format, coupled with human oversight through voting and moderation, produces highly valuable data for AI training that can’t easily be replaced by content generated by LLMs.

When people choose ChatGPT over posting questions on Stack Overflow, this wealth of new knowledge is lost to a closed-loop system. With fewer human questions and solutions, the diversity and accuracy of the dataset available for training future models could suffer, diminishing the quality of future AI responses.

The Broader Implications: Could AI Tools Change the Internet as We Know It?

This trend poses an existential question for the internet itself. If people increasingly turn to AI tools for knowledge, what will happen to platforms built on human interaction, content creation, and knowledge sharing? Websites like Stack Overflow, Wikipedia, and Quora could see diminished engagement as more people rely on AI, leading to less dynamic and diverse online ecosystems.

As human interaction declines, the internet could become more of an echo chamber, with AI systems “learning” from each other in a self-referential cycle. Without fresh human perspectives, AI may not be able to provide the same quality of insights and creativity, particularly as it becomes reliant on data generated by earlier versions of itself.

A Call to Preserve Human Contribution in the Age of AI

In light of their findings, Professor Wachs and his colleagues urge for a renewed focus on encouraging human interaction and knowledge exchange on online platforms. Maintaining a rich flow of human-generated content is essential, they argue, to ensure that AI models have the diverse and robust datasets needed for high-quality training.

This could mean developing policies or incentives to promote continued participation on forums and other knowledge-sharing sites, even as AI technology becomes more prevalent. Encouraging users to actively participate in online discussions and create content is vital to maintaining the quality of data available for AI and the diversity of perspectives that only humans can provide.

Looking to the Future: AI and Human Collaboration

Ultimately, the study highlights a crucial aspect of AI’s relationship with humanity: AI can enhance human productivity and offer new tools, but it also relies heavily on human input to maintain its relevance and accuracy. In this sense, AI and human knowledge creation are interdependent, with each playing a vital role in the other’s development.

To ensure that AI systems continue to improve, the research implies that society must actively support human-generated knowledge sources and keep them as essential parts of our digital lives. Rather than replacing human interaction, AI should be seen as a complement to it, offering support while still relying on human originality and problem-solving skills.

Balancing AI Use and Human Engagement Online

The findings from Corvinus University suggest that to sustain AI development, we need to strike a balance between using AI tools and continuing to engage in human-driven knowledge sharing. This could involve:

Education Campaigns: Informing users of the importance of contributing to knowledge-sharing sites, not just relying on AI answers.
Platform Incentives: Creating incentives for users to continue contributing to platforms like Stack Overflow and Quora, ensuring a steady flow of fresh human-generated content.
Policy Support: Developing regulations that promote the use of diverse datasets and discourage over-reliance on AI-generated data in model training.
Collaborative AI Models: Designing AI systems that actively encourage and even require human participation, fostering a more symbiotic relationship between AI and human users.

A Shared Responsibility for a Balanced Digital Future

As we continue to integrate AI into daily life, we have a collective responsibility to ensure that it serves as a tool that enhances, rather than replaces, human knowledge creation. By maintaining spaces for human discussion, questions, and creativity, we not only support AI development but also preserve the diversity, authenticity, and vibrancy of the internet.

Corvinus University’s research raises an important call to action: if we want AI to remain a valuable asset, we must actively nurture the human side of digital spaces. As Professor Wachs and his team emphasise, it’s essential that we recognise the value of human-generated content—not just as a training resource for AI, but as an irreplaceable part of our digital ecosystem.

To find out more about this study, the impact of AI on knowledge sharing, or to speak with Professor Wachs, please contact BlueSky Education on +44 (0) 1582 790709 or kyle@bluesky-pr.com