Why even AI itself finds it difficult to determine whether a text has been written by AI

As text-generating artificial intelligence tools continue to spread, educational institutions, businesses and consumers want to know whether what they are reading was written by a human or a machine.

Establishing rules for the use of AI-generated content is relatively straightforward. Enforcing them, however, requires something far more difficult: the reliable detection of whether a text has been written by artificial intelligence, according to The Conversation.

Research shows that some people – especially those who regularly use AI tools – can accurately identify such texts. Under controlled conditions, groups of human evaluators can even outperform automated tools. However, this expertise is not widespread, and individual judgement often proves inconsistent. For this reason, organisations that require large-scale and uniform assessment turn to automated detectors.

The problem of detecting AI-generated texts

The basic logic of detection is simple: a text is processed through a tool (often itself AI-based), which produces a score in the form of a probability indicating whether the text was generated by a machine. This result is then used for further decisions, such as the imposition of penalties.

In practice, however, this process relies on critical assumptions. Is it known which AI tools may have been used? Is there access to them? How large is the text sample? Is it a single piece of writing or a collection of texts over time? The answers to these questions largely determine what a detection tool can and cannot conclude.

Particular importance is also placed on whether the AI system that generated the text under examination has deliberately embedded markers to facilitate later detection. So-called watermarks within a text are “signals” that are not detectable to the naked eye, allowing anyone with the appropriate “key” to verify the origin of the text. This method, however, requires the cooperation of AI providers and is not always available.

How AI detection tools work

The most common approach is based on the use of artificial intelligence itself. Large datasets of texts labelled as either human-written or AI-generated are collected, and a model is then trained to distinguish between the two. The process is similar to email spam filtering: the tool compares a new text with those it has examined in the past and decides which category it most closely matches. This method can work even without knowledge of the specific AI tool that produced the text, provided the training data is sufficiently diverse.

When, however, there is access to the language models themselves that generate text, a different strategy can be used: searching for statistical patterns linked to the way specific models produce language. If a model assigns an unusually high probability to an exact sequence of words, this may indicate that it generated the text itself. Finally, in the case of watermarked texts, the process shifts from detection to verification, based on information that does not derive solely from the text itself.

The limitations of AI detection tools

None of these approaches is definitively superior. Training-based detectors lose accuracy when new texts diverge from the training data, which quickly become outdated as new AI models emerge. Continuously updating datasets is costly, and detectors inevitably lag behind the systems they are meant to identify.

Statistical methods depend on assumptions about how specific models operate or on access to their internal information. When models are proprietary or change frequently, these methods become unreliable. Watermarking, finally, requires cooperation from AI companies and is only effective when it has been enabled from the outset.

As society adapts to generative artificial intelligence, the rules and detection techniques will continue to evolve. However, it must be accepted that these tools will never be completely infallible.