Uh oh. This post became a long rant.
Hacker News comment from Jan 14, 2020 with my emphasis added. I did not create this comment.
People who choose to turn off JS are excluding themselves. JS is part of the web platform, and there are tons of amazing things it allows you to do.
More from the comment:
Back to the comment:
Similar private actions occur when accessing websites for banking, health care, tax prep, etc. These are applications that are accessed over the web, maybe unfortunately, but the interactions are private, hopefully. These web apps are not a part of the web of documents.
I started personal publishing on the web in 2001, and since then I have used HTML's simple
textarea box to perform a ton of creates and updates to web posts. I still rely on the
textarea box often for quick creates and updates.
But I added an auto-update feature to Tanager. I make Tanager access my CMS's API endpoint. Tanager sends and receives JSON. The default auto-update time is every five minutes, but I can also modify that timing within Tanager.
Anyway, back to the HN comment:
Now the main quote:
Our atrociously designed local newspaper website https://toledoblade.com is murdering the web.
I still believe that https://text.npr.org is the best designed media website.
The Blade should provide logged-in subscribers with a slightly enhanced version of text.npr.org's design. To hide its content behind a hard paywall, the Blade's server-side software would have to ensure that the reader is logged-in (a paying customer), which requires the usage of cookies.
Then the Blade's server-side system either accesses the file system outside of document root to retrieve the HTML file (article) to send to the logged-in reader, or the Blade's server-side software dynamically creates the article page on the fly, probably by pulling info from a database, and sending it to the reader's web browser.
In my opinion, both methods of sending content to readers qualifies as a web of documents because in simple terms, I'm reading articles.
I love this October 2019 article.
Here's the January 2020 HN thread that pointed to that post.
The author of the Web of Documents post suggested no cookies and not POST type requests. I suppose that a user login form could use a GET request, but in order to maintain a login session, a cookie or cookies would be needed.
For READERS of sawv.org, then yes, the restrictions outlined by the Web of Documents author would apply and work fine.
But unless the toledoblade.com switches its funding model to something like how public radio works where it's free and open to all, but it exists because some people donate money, then the Blade will need to put its content behind a paywall.
Public radio, however, also receives taxpayer money too, I think, and that should not occur with our local "newspaper" websites.
Fastmail is a web application. toledoblade.com should be a web of documents for logged-in subscribers, but that's not the case with the Blade's current web design.
My seasonal blog http://toledowinter.com that I operated for a few winters is a web of documents even though no static HTML files exist.
If program code needs to be executed on the client-side in order to display a document, then it's obviously a web app. I don't want to execute code in the web browser to display a document that contains mainly text. The rendering should have occurred on the server.
For toledowinter.com, a READER will receive a web page that the Nginx web server pulled from Memcached, or if not cached, then the Nginx web server will execute my server-side code that pulls content from the database and applies a template to create the web page dynamically, and that page of content gets sent to the reader. And if the page was dynamically generated, then the web page gets stuffed into Memcached, which means that if a reader refreshed the page, then the reader would see the cached version. If I update the page's contents, then the updated page gets stuffed into Memcached.
Using the caching server means that my server-side code executes less often, which means that the database is accessed less often, and the resulting page is downloaded faster.
But all of that still qualifies toledowinter.com to support a web of documents, in my opinion, because pages can be accessed from command line utilities and then optionally converted to text/plain files.
curl http://toledowinter.com/7769/three-more-days-of-mild-temps-then-normal-temps | html2text -style pretty
The html2text program is an old utility that for some reason does not preserve whitespace, created by the HTML paragraph tag. The resulting text shows paragraphs with no spacing. But it's still readable plain, raw text.
lynx --dump http://toledowinter.com/7769/three-more-days-of-mild-temps-then-normal-temps
Displays the plain, raw text better. Paragraph spacing is preserved.
Of course, I can view toledowinter.com and websites that support the web of documents within text-based web browsers, such as Lynx, links, elinks, w3m, etc. And I can view the site within a limited GUI-based web browser, such as NetSurf.
The top of toledowinter.com's homepage that contains the image and site title looks wonky within NetSurf because of the CSS that I'm using, but the main content on the homepage still loads fine.
I like the idea that if the web content is not cURL-able, then that website does not support the open web.
Here's a tough one. Does cjr.org support the open web or the idea of the web of documents? The answer is half and half. cjr.org definitely uses an idiotic web design for a website that contains mainly text-based information. It's ridiculous.
If I view the article within NetSurf, then the page is blank, except for the the orange bar.
If I view the article within elinks, then the text displays. I can read the article easily.
But for some reason, that same cjr.org article produce a "403 Forbidden" error message when I try to access it within the Lynx web browser. It also errors out when I use Lynx from the command line with the --dump option.
But this works:
curl https://www.cjr.org/politics/drudge-report-trump.php | html2text -style pretty
I can read the text fine after executing the above command.
Strange. Actually, it's absurd on cjr.org's part.
https://politico.com is another example of how modern web design is ruining the web. It should be a website that supports the web of documents, but it's not, or it only halfway supports the web of documents.
I wish that NetSurf contained to disable styling like Firefox.
lynx --dump command works with the above Politico article page. That command produces easily readable plain text.
Modern web design is ruining the open web and/or the web of documents.
Back to our old, wretched friend toledoblade.com, which uses a wonderfully hideous modern web design.
This is a small editorial piece.
It's text. The editorial contains a little over 300 words and one large useless image. I know this because of how the Blade CMS constructs its web pages.
When I open Firefox in private mode and access the above editorial, part of the page displays, and then I receive a message about disabling my ad blocker. ???
It doesn't matter. The Blade's article content is not stored within HTML tags, such as the venerable
I'm a Blade paying customer, but I do not use any of the Blade's poor digital products, which includes the Blade's website and their apps.
I created my own web setup to read Blade articles.
lynx --dump command to access the above Blade editorial produces text that only contains information about the website's navigation. No article content, of course.
The Blade newspaper has existed since the 1800s. It's stunning that the Blade has failed to display text on the web.
Organizations, such as the Blade, should be forced to provide the public or at least paying customers quarterly security and privacy audits of their websites.
webpagetest.org results for the above Blade editorial that contains about 350 words.
From: Dulles, VA - Chrome - Cable
1/31/2020, 4:16:37 PM
First View Fully Loaded:
Download time: 11.425 seconds
Web requests: 405 !!!
Bytes downloaded: 2,864 KB
Actually, "only" 2.8 megabytes for 350 words is small for the Blade and other newspaper websites today, which is sad.
The text/plain version of War and Peace is 3.2 megabytes. The printed version of that Tolstoy books contains over 1,000 pages.
My Blade web reading app displays content to me in a manner that is a slightly enhanced version of text.npr.org. And if I desire, I can use Firefox's and Safari's reader mode capability.
Only I use my Blade web reading app. Here's the webpagetest.org results for the same Blade editorial that is displayed humanely by my code.
From: Dulles, VA - Chrome - Cable
1/31/2020, 4:24:08 PM
First View Fully Loaded:
Download time: 0.595 seconds
Web requests: 2
Bytes downloaded: 4 KB
The minimal CSS that I use is contained within the HTML output that is dynamcially generated on my server. 100 percent of the downloaded bytes went for HTML.
That's how a web page, an article from a media org, should display for paying customers.
A usable web of documents is easy to create. Maybe that's problem. It's too easy. We need a complex web to justify the need for some tech people.
When I access the Blade's content, I'm looking to be informed and not abused.
I like this January 2020 post.
More comments from the above HN thread.
anyone who browses that way should expect that many sites won't work and will need to be manually whitelisted.
I never whitelist sites that don't work without JS unless the site is actually critical for some reason (doing so is too risky). I expect that this means some parts of the web effectively no longer exist for people like me, and accept that, but I wonder if the authors of these badly engineered pages really know that they're excluding people.
Rendering markdown client-side seems like a fine way to implement a web page.
Requiring readers to execute arbitrary code in order to read content seems like a terrible way to implement a web page.
Nor is it cheap: it requires every single reader to execute the same code, burning CPU over and over and over when it could be done once for all readers, by the server.
Yes, some people choose to browse with JS disabled, but anyone who browses that way should expect that many sites won't work and will need to be manually whitelisted.
Yes, you can require execute privileges in order to publish content, but anyone who publishes that way should expect that many people won't read what he writes.
JS can be a highly supported tool, but it isn't itself a primitive use-case. It is an enabler of use-cases and, to OP's point, JS for the sake of loading JS isn't useful if the content of the site really didn't need it.
How is JS related to a page with only text and hyperlinks?
Please render your markdown on your server and not my client. You're wasting my battery, man. Shame on you!
Hah. Reminds me of these posts.
Back to comments excerpted from the HN thread.
JS search is not generally a critical site function.
Rendering the goddamned content is.
If you can't at a minimum give me a title, byline, dateline, main body text and/or some level of summary or description of non-textual content (as with graphics, audio, video, or interactive elements), then you're failing.
(SPAs or web applications should at least provide context for understanding what the application is/does. I'm not calling for all functionality to be rendered in HTML, but sufficient context to determine WTF the site is about.
Your "but I cannot implement search" is a strawman, and really doesn't address the core complaint.