created Sep 3, 2018
Even though I rarely receive a Webmention (remote comment) on my website, I occasionally wonder about bots and spam, related to Webmentions.
In my opinion, the Webmention is the only acceptable commenting system for a personal website. It's enough of a barrier to deter most human-based flamers, trolls, and spammers.
Users must create their replies on one of their own web presences. This would be considered the Source post. Users could create their webmention replies on their own personal websites or any public facing social media service. If users create their webmention replies on Facebook, then their posts cannot be restricted by privacy controls. The webmention replies must be publicly viewable in order for the webmention to be accepted.
When a commenter creates a webmention reply post, the user must include within the webmention the URL of the post that the person is replying to. This other URL is called the Target URL.
If the target URL is missing from the webmention reply post, then the target site will reject the webmention, according to the webmention spec.
According to the spec, a commenter can submit the same source URL only one time to the same target URL.
A nefarious actor could create a program that would make a list of every web page URL of the target site. Then the program could create thousands of HTML pages, and each of these source HTML pages would contain the list of URLs of the target site. Finally, the program would make thousands of webmention posts to every URL of the target site. Even doing this to only one URL of the target site would be irritating for the target site owner.
The above scenario requires some effort by the nefarious actor. Maybe the default webmention behavior is enough of a barrier to prevent this from occurring. Flooding one target URL page, however, seems more doable.
I should create a new test website that uses my Wren code and then create such a program to test. The target site could implement some kind of throttling feature. The bot program may try to submit thousands of webmentions as fast as the target site can handle them. Throttling on the target site, may only accept webmentions once every 60 seconds or at a longer interval. Throttling could be restricted to once every 5 minutes per domain name.
My test website http://kleete.com is a message board, based entirely on webmentions. I limit receiving webmention posts to once every 60 seconds in total and once every five minutes per domain name. That still requires server-side code to execute and reject the webmention.
Even if the target site has barriers in place to eliminate flooding, the target site could have limited access because it's under CPU load by making a webmention endpoint available.
The IndieWeb has proposed Vouch as a means of reducing spam, flooding, etc. for a website that accepts webmentions.
The target site would need to implement the Vouch protocol or spec. The problem with Vouch is that it's such a high barrier that it would nearly eliminate all commenting possibilities by users who were not already in the clique of IndieWeb users. Vouch may be useful in niche situations, but I don't consider it workable across something like a "blogosphere."
At the IndieWeb chat, I have never seen anyone suggest a captcha setup as a deterrent to webmention spam and abuse. I'll have to search the IndieWeb wiki some more.
Sometimes, the idea of webmention moderating is suggested. I think that a few IndieWeb users do moderate their webmentions.
Vouch, moderating, throttling, captcha, etc. all sort of fall under the admin tax for managing a personal website even if the site owner uses a CMS-hosted solution. It's possible that some day, if not now, a Wordpress.com hosted website could offer webmentions and moderating webmentions.
But if site owners desire to build a community on their websites by allowin some kind of commenting system, then they will have to accept some kind of admin-like tax for managing the comments or webmentions.
The only other possible commenting alternative that I would consider is email.
If the email comment contains no link to the reply user's post, then the comment is a private comment, and most likely, no part of the reply email gets included on any page of my website.
If the reply user wishes to have his or her comment included at the bottom of one of my posts, then the reply user needs to include in the email a web link to the user's reply post (source URL), and that web reply post must contain the URL of the post on my site (target URL) that the user is replying to. Then I can decide whether to include all or excerpts of the reply post on my article page that's being replied to. On my article page, I would include a link to the source URL. This is obviously a manual process, but it's lo-fi. Email systems are mature and offer users methods of blocking and filtering bad email messages. And the target site owner does not need to implement any new code.
Here are a few captcha-related links. I like the idea of text-based captchas, but these could be subverted more easily by programs.
w3.org - Inaccessibility of CAPTCHA