Wikipedia Grapples With Chatbots: Should It Allow Their Use For … – Techdirt

from the questions,-questions dept

There have been various chapters in the new large language models (LLMs) story. First, people were amazed that systems like ChatGPT could write a sonnet about bananas in the style of Shakespeare, and in just a few seconds. Soon, though, they realized that chatbots replies might be grammatically correct, but they were frequently peppered with false information that the system simply made up, often with equally fake references. Were now at the stage where many are starting to think through the deeper implications of using LLMs, with all their powers and flaws, and how they will affect current working (and living) practices. As a post on the Vice site explains, one group grappling with this issue is the Wikipedia community:

During a recent community call, it became apparent that there is a community split over whether or not to use large language models to generate content. While some people expressed that tools like Open AIs ChatGPT could help with generating and summarizing articles, others remained wary.

Wikipedia already has a draft policy on how LLMs can be used when writing Wikipedia entries. The draft provides an excellent summary of some of the key problems of using chatbots, many of which will be faced by people in other domains. Here are the main points from the basic guidance section:

Do not publish content on Wikipedia obtained by asking LLMs to write original content or generate references. Even if such content has been heavily edited, seek other alternatives that dont use machine-generated content.

You may use LLMs as a writing advisor, i.e. asking for outlines, asking how to improve paragraphs, asking for criticism of text, etc. However, you should be aware that the information they give to you can be unreliable and flat out wrong. Use due diligence and common sense when choosing whether to incorporate the LLMs suggestions or not.

You may use LLMs for copyediting, summarization, and paraphrasing, but note that they may not properly detect grammatical errors or keep key information intact. Use due diligence and heavily edit the response. Dont hesitate to ask the LLM to correct deficiencies such as missing information in a summary or an unencyclopedic, e.g. promotional tone.

You are responsible for making sure that using an LLM will not be disruptive to Wikipedia.

You must denote that a LLM was used in the edit summary.

LLM-created works are not reliable sources. Unless their outputs were published by reliable outlets with rigorous oversight, they should not be cited in our articles.

It would be foolish to try to forbid Wikipedia contributors from using chatbots to help write articles: people would use them anyway, but would try to hide the fact. A ban would also be counterproductive. LLMs are simply tools, just like computers, and the real issue is not whether to use them, but how to use them properly. The guidelines listed above essentially amount to yes, you can use chatbots to help you write and improve your writing, but they should not be relied upon unquestioningly. That means human input and checking afterwards are indispensable. Also important is flagging up that LLMs were used in some way, so that users of Wikipedia know where information is coming from, and can be alert to possible problems arising from this fact.

The Wikipedia draft policy concentrates on how LLMs output might be used to create material for Wikipedia entries. The Vice article points out that there is another question, about whether there should be restrictions on how LLMs can use Wikipedia entries as part of the machine learning process:

The [Wikipedia] community is also divided on whether large language models should be allowed to train on Wikipedia content. While open access is a cornerstone of Wikipedias design principles, some worry the unrestricted scraping of internet data allows AI companies like OpenAI to exploit the open web to create closed commercial datasets for their models. This is especially a problem if the Wikipedia content itself is AI-generated, creating a feedback loop of potentially biased information, if left unchecked.

That concern seems overblown. Low-quality training materials can cause chatbots to produce questionable or downright harmful outputs. An obvious way to counter that would be to encourage the use of high-quality input that has undergone some kind of fact checking. Wikipedia is one of the best and largest sources of such material, and in hundreds of languages. Provided the final Wikipedia policy on LLMs requires human checks on chatbot output, as proposed in the draft, the use of Wikipedia articles for training LLMs should surely be encouraged with the aim of making chatbots better for everyone.

Follow me @glynmoody onMastodon.

Filed Under: chatbots, chatgpt, llms, wikipediaCompanies: openai

The rest is here:
Wikipedia Grapples With Chatbots: Should It Allow Their Use For ... - Techdirt

Related Posts

Comments are closed.