Captcha Systems and Webmentions

created Sep 3, 2018

Even though I rarely receive a Webmention (remote comment) on my website, I occasionally wonder about bots and spam, related to Webmentions.

In my opinion, the Webmention is the only acceptable commenting system for a personal website. It's enough of a barrier to deter most human-based flamers, trolls, and spammers.

Users must create their replies on one of their own web presences. This would be considered the Source post. Users could create their webmention replies on their own personal websites or any public facing social media service. If users create their webmention replies on Facebook, then their posts cannot be restricted by privacy controls. The webmention replies must be publicly viewable in order for the webmention to be accepted.

When a commenter creates a webmention reply post, the user must include within the webmention the URL of the post that the person is replying to. This other URL is called the Target URL.

If the target URL is missing from the webmention reply post, then the target site will reject the webmention, according to the webmention spec.

According to the spec, a commenter can submit the same source URL only one time to the same target URL.

A nefarious actor could create a program that would make a list of every web page URL of the target site. Then the program could create thousands of HTML pages, and each of these source HTML pages would contain the list of URLs of the target site. Finally, the program would make thousands of webmention posts to every URL of the target site. Even doing this to only one URL of the target site would be irritating for the target site owner.

The above scenario requires some effort by the nefarious actor. Maybe the default webmention behavior is enough of a barrier to prevent this from occurring. Flooding one target URL page, however, seems more doable.

I should create a new test website that uses my Wren code and then create such a program to test. The target site could implement some kind of throttling feature. The bot program may try to submit thousands of webmentions as fast as the target site can handle them. Throttling on the target site, may only accept webmentions once every 60 seconds or at a longer interval. Throttling could be restricted to once every 5 minutes per domain name.

My test website is a message board, based entirely on webmentions. I limit receiving webmention posts to once every 60 seconds in total and once every five minutes per domain name. That still requires server-side code to execute and reject the webmention.

Even if the target site has barriers in place to eliminate flooding, the target site could have limited access because it's under CPU load by making a webmention endpoint available.

The IndieWeb has proposed Vouch as a means of reducing spam, flooding, etc. for a website that accepts webmentions.

The target site would need to implement the Vouch protocol or spec. The problem with Vouch is that it's such a high barrier that it would nearly eliminate all commenting possibilities by users who were not already in the clique of IndieWeb users. Vouch may be useful in niche situations, but I don't consider it workable across something like a "blogosphere."

At the IndieWeb chat, I have never seen anyone suggest a captcha setup as a deterrent to webmention spam and abuse. I'll have to search the IndieWeb wiki some more.

Sometimes, the idea of webmention moderating is suggested. I think that a few IndieWeb users do moderate their webmentions.

Vouch, moderating, throttling, captcha, etc. all sort of fall under the admin tax for managing a personal website even if the site owner uses a CMS-hosted solution. It's possible that some day, if not now, a hosted website could offer webmentions and moderating webmentions.

But if site owners desire to build a community on their websites by allowin some kind of commenting system, then they will have to accept some kind of admin-like tax for managing the comments or webmentions.

Many IndieWeb users do not accept webmentions for various reasons, such as they have no desire to engage in discussions on their personal websites, they don't want to create the code or they don't know how to create the code to accept, parse, and display webmentions, they don't want to rely on a third party service to accept webmentions, which requires target site owners to use JavaScript to display the webmentions, they don't want their webmention endpoints to be attack vectors for spam and flooding, or they outsource their discussions to their Twitter accounts without backfeeding to their personal websites.

The only other possible commenting alternative that I would consider is email.

Here are a few captcha-related links. I like the idea of text-based captchas, but these could be subverted more easily by programs. - Inaccessibility of CAPTCHA