When Unicode Characters Masquerade as ASCII

June 4, 2025

Curl founder and lead developer Daniel Stenberg suggests methods for “Detecting Malicious Unicode.” The advice comes after human reviewers missed look-alike characters that had been swapped in for regular letters. We learn:

“In a recent educational trick, curl contributor James Fuller submitted a pull-request to the project in which he suggested a larger cleanup of a set of scripts. In a later presentation, he could show us how not a single human reviewer in the team nor any CI job had spotted or remarked on one of the changes he included: he replaced an ASCII letter with a Unicode alternative in a URL. This was an eye-opener to several of us and we decided we needed to up our game.”

Since such swaps cannot be detected by human eyeballs alone, special software is needed. Stenberg found GitHub’s abilities lacking, though apparently the organization is on the case. Fellow curl dev Victor Szakats found Gitea at least highlights “ambiguous Unicode characters,” but Stenberg wanted more than that. So he made a detection tool himself. He writes:

“We have implemented checks to help us poor humans spot things like this. To detect malicious Unicode. We have added a CI job that scans all files and validates every UTF-8 sequence in the git repository. In the curl git repository most files and most content are plain old ASCII so we can “easily” whitelist a small set of UTF-8 sequences and some specific files, the rest of the files are simply not allowed to use UTF-8 at all as they will then fail the CI job and turn up red. … The next time someone tries this stunt on us it could be someone with less good intentions, but now ideally our CI will tell us.”

Ideally. We think if these swaps are being identified by "researchers," cybersecurity vendors need to address the issue.

Cynthia Murrell, June 4, 2025

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta