The internet already runs on hashing technology
A common myth is that child sexual abuse material (CSAM) detection technologies use unprecedented techniques to identify content online. Hash matching, for instance, is the most tried and tested form of detecting known content, including illegal content such as malware and viruses, alongside images and videos of child sexual abuse.
Hashing algorithms such as MD5 or SHA256 are foundational tools used across digital systems worldwide. They help verify file integrity, secure passwords, synchronise databases and ensure that systems can compare data quickly and accurately without exchanging the files themselves.
At its simplest, hashing converts a file into a unique mathematical signature. If two signatures match, the underlying files match. That principle is not controversial and underpins much of modern computing.
Hashing is used in CSAM detection to identify illegal images and videos that are already known to child protection organisations or law enforcement. The system does not view personal content – in essence, hash matching asks “is this number in the database”. If it is, we know the content is an image of a child being sexually abused. Hashing is central to the efforts of IWF, and many other hotlines, efforts to tackle the spread of CSAM.
These technologies are not new
Another myth is that CSAM detection technologies are immature or unreliable. However, perceptual hashing technologies, which have been designed to identify near-duplicate images even if they have been cropped, resized or slightly altered, have existed for decades.
Technologies such as PhotoDNA were developed specifically to combat the persistent recirculation of known abuse imagery. They can recognise images even if they have been modified to evade detection.
Again, this is not new in the wider technology ecosystem. Similar forms of matching, detection and upload prevention are already used extensively in cybersecurity. Anti-virus software identifies malicious files using signature-based detection. Web browsers compare URLs against databases of harmful websites to protect users from phishing attacks and malware.
We accept these protections because they help keep digital environments safe. The principle behind CSAM detection is no different.
Why the myths persist
Misunderstanding exists partly because conversations about online safety often become entangled with broader anxieties about privacy, encryption, artificial intelligence, state surveillance and so-called “chat control”. Those are legitimate debates, but, too often, nuanced technical discussions are replaced by slogans and worst-case hypotheticals that obscure the reality of how detection systems work and how safeguards can be enforced to ensure the tech does what it was designed to do.
“CSAM detection” is often spoken about as though it were a single technology. It isn’t – the ecosystem includes different tools designed for different purposes:
- detection of known CSAM through hash matching;
- detection of previously unknown abuse imagery;
- behavioural signals related to grooming and solicitation;
- reporting and moderation systems;
- human review and safeguarding processes.
In the video, I compare this to a “Swiss Cheese” model – just as individual slices of Swiss cheese have holes, no single detection method is perfect. However, when multiple distinct safeguards are stacked together, the “holes” are covered, preventing harmful content slipping through the system.
Public debate also overlooks the sheer scale of the problem. Every year, organisations like the Internet Watch Foundation assess and action vast quantities of child sexual abuse material. Those numbers reflect the limits of human capacity, not the true scale of abuse material circulating online. The amount detected is not the same as the amount that exists.