色盒直播

AI text detectors aren鈥檛 working. Is regulation the answer?

Tools developed to stamp out misconduct have been shown to be biased and inaccurate. Will AI creators themselves be forced to do it better?

Published on
August 9, 2023
Last updated
August 10, 2023
A man walks along Brighton Beach using a metal detector to illustrate Programs are not detecting AI text. Is听regulation needed to halt cheats?
Source: Getty Images

More regulation could make the job of听detecting whether academic writing has been generated by听artificial intelligence easier, amid concerns that tools created for this purpose are suffering from low accuracy rates and inbuilt biases.

Universities worldwide have embraced the use of AI听detectors to听combat the rising concern that the likes of听ChatGPT and its successor GPT-4 can help students cheat on听assignments, although many remain wary as an听increasing body of听evidence shows that they struggle in听real-world scenarios.

In a paper , researchers based across European universities concluded that 鈥渢he available detection tools are neither accurate nor reliable and have a听main bias towards classifying the output as human-written rather than detecting AI-generated text鈥. This followed another paper that showed that students whose second language was English were being disproportionately penalised because their vocabularies were more limited than native English speakers鈥.

A third from academics at the University of Maryland confirmed inaccuracy concerns and found that detectors could be easily outwitted by students using paraphrasing tools to rewrite text initially generated by large language models (LLMs).

色盒直播

ADVERTISEMENT

Campus collection: AI transformers like ChatGPT are here, so what next?


One of that study鈥檚 authors, Soheil Feizi, assistant professor of computer science, said the flaws in the tools had already had a 鈥渞eal-world impact鈥, with many cases of students suffering 鈥渢rauma鈥 after being falsely accused of听misconduct.

鈥淭he issue is that the 鈥楢I detection camp鈥 is quite powerful and is successful in muddying the water: they often evaluate their detection accuracy under unrealistic or very specific scenarios and don鈥檛 report the full spectrum of false positive and detection rates,鈥 he added.

色盒直播

ADVERTISEMENT

One of the detectors Dr Feizi tested was the model created by OpenAI, the company behind ChatGPT, which was recently shelved in a move that many viewed as evidence that detection could not be done.

Turnitin 鈥 whose detector generally scored higher than most in the studies but did not prove infallible 鈥 recently revealed that its tool has already been used 65听million times.

Annie Chechitelli, the company鈥檚 chief product officer, said the product was helping maintain 鈥渇airness and consistency in classrooms鈥 but was also still 鈥渆volving鈥 and the next step was to help educators better understand the numbers the detector produces and what this might indicate.

Swansea University was not听yet using Turnitin, according to Michael Draper, a professor of legal education who also serves as the university鈥檚 academic integrity director.

He said he had 鈥渕ixed feelings鈥 about detection. 鈥淚f you use a detection tool as a primary means of evidence when accusing a student of committing misconduct, then you are on a hiding to nothing,鈥 he said.

鈥淏ut I听think using it as a first step is legitimate. You can then have an exploratory conversation with a student in relation to their submission. Some may volunteer they have used AI, or it will become clear they can鈥檛 adequately explain how they have arrived at their answer.鈥

Professor Draper said universities should consider asking students to submit a 鈥渞esearch trail鈥 alongside their final draft to show their workings out, which could form part of the assessment.

色盒直播

ADVERTISEMENT

鈥淭hese things can also be fabricated, but it is still a useful extra step in detection,鈥 he said. 鈥淎nyway, it would be beneficial for students to develop this skill.鈥

色盒直播

ADVERTISEMENT

AI detection was not going to go away, however, according to Professor Draper, who pointed to a recent voluntary commitment made in the US by many of the major companies creating LLMs to develop 鈥渞obust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system鈥.

This, he said, would likely be followed by regulation if adequate detection methods were not produced voluntarily, in a 鈥渢urning of the tide鈥 against companies that 鈥渉ave a vested commercial interest in not having detection鈥.

鈥淭here is increasing recognition that we need to have the ability to differentiate between AI- and human-written text for a number of ethical and legal reasons. It is in everyone鈥檚 interest long term to know if something is AI generated or not,鈥 Professor Draper said.

鈥淪ome people say detection will never keep up. That鈥檚 true when it鈥檚 an independent company trying to second-guess what will happen next, but when you have a commitment from the AI companies themselves to create a means of detection, you are on a much stronger wicket.鈥

Savvy and determined students will find ways around watermarking, but another issue was the blurring of the lines between AI and human writing as chatbots become embedded into everyday programs, according to Mike Sharples, emeritus professor at the Open University's Institute of Educational Technology.

For example, 鈥淐opilot鈥 鈥 Microsoft鈥檚 soon-to-launch AI assistant 鈥 promises to be able to 鈥渟horten, rewrite or give feedback鈥 on a user鈥檚 written work.

鈥淩ather than generating an entire essay with AI, students will just press the 鈥榗ontinue鈥 button or equivalent when they get stuck,鈥 said Professor Sharples.

鈥淥r use it to rewrite a section, or to suggest references. AI听will become part of the workflow. It will become increasingly difficult for AI听detectors to call out these 鈥楢I-assisted鈥 student assignments.鈥

色盒直播

ADVERTISEMENT

tom.williams@timeshighereducation.com

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Reader's comments (5)

Like with elections, maybe there is something to be said for having to turn up on the day with pen and paper.
It's all very well looking at technical 'solutions' but surely it is more important that students learn that it is wrong to cheat? Students need clear and robust information as to what they should and should not be doing. This won't stop the truly dishonest, of course, but many students are penalised because they are not aware of all the nuances... last year I found that one student penalised for plagiarism was useless at paraphrasing and sat with them for a whole afternoon showing them how to rewrite source material in their own words.
Anyone with a reasonable English vocabulary can defeat Tutnitin. Lords Prayer "The Good Lord is my shephered, he leads me beside quiet waters". My version, "A wonderful deity is the keeper of my flock, he guides me to peaceful lakes and rivers". Not even God himself could detect any plagiarism there. Just hand out essay titles that require local study, e.g. of supermarkets, field boundaries, houses, towns, whatever you are teaching. now see the students find something they can plagiarise on that.
What gives away those that are poorly researched and don't know what they are talking about is the context of their writing. For example those claiming to have re-written a part of the Lord's Prayer when actually it's a verse from the Psalms.
No it is not the answer. There are a multitude of other things wrong with HE that could do with better regulation. students using chat GPT is not one of them. Academics should wise up and think of better ways of assessing stdudents if indeed assessments are necessary.

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT