The Moral Valence of Technical Decisions
This essay originally appeared on secure scuttlebutt at %3k6qAo85Q/1hjMW6xc3S0MNt+PsBCM00S354HeXOUco=.sha256
I just spent like an hour fiddling with a (very short) script in order to get the RSS it generates to work in somebody’s preferred RSS reader — one that, I guess, was using a standard XML parser (and therefore couldn’t handle ampersands in URLs) and that could only support RFC882 dates (for some reason).
I have no idea if it worked. I can’t run their RSS readers, and all the readers I have can read this feed just fine. And when I search for documentation on the RSS format, the examples I see of well-formed RSS often contain the things these clients apparently reject.
Basically: why is RSS so broken?
The answer, I think, is that it was created with a webtech mentality.
On the one hand, RSS uses XML — which, on top of being absurdly verbose and fiddly to read and write (and difficult to write an efficient or reliable parser for, and subject to strange failure modes), is not at all suited to an application whose primary purpose is to transmit URLs. This is because an extremely common character in URLs, the ampersand, must be escaped (and therefore expanded into three or four characters). Meanwhile, standard HTML escape sequences aren’t technically part of XML so XML parsers won’t necessarily handle them! So, for the titles I automatically process out of the HTML of the linked documents, I would probably need to find HTML escape sequences and use an exhaustive lookup table to replace them with equivalent XML escape sequences.
On the other hand, no RSS client or RSS-generating application does any of this work until somebody complaints, because webtech culture loves postel’s law — which, put into practical terms, really means “just wing it, try to support whatever trash other applications emit, and if somebody can’t handle the trash you emit then it’s their problem”. No point in actually specifying the date format for RSS — everybody should just guess, and if some client can’t handle every possible date format then fuck ’em (unless your valuable user happens to prefer that client).
RSS could be made sensible. Imagine keeping the RSS structure but transmitting over JSON: section names are repeated fully half as often (with a quarter of the separators), only quotes and newlines would ever need to be escaped. Imagine, furthermore, that we said that only ISO dates could be used. Suddenly, this becomes a format a child could bitbang and parse, and there are basically no barriers to using existing JSON parsers and generators either.
I mean, RSS is a product of its time. I understand why RSS didn’t use JSON — because JSON didn’t exist yet. But RSS could have used… TSV. Or CSV. Or a line-based key-value format with section separators. Using XML for anything should have been immediately and obviously a bad idea to any professional developer, even in the early 90s, considering just how many problems one immediately encounters in trying to work with it.
I blame Dave Winer for a number of things about the state of tech (generally speaking, the embrace of webtech is the fault of false statements Winer made on his blog in the early 90s), but I also blame Dave Winer for inventing RSS and deciding to transport it over XML — a decision that probably doomed the format when less than a decade later the XML-fanboy-amateurs became actual professional developers and weeped at the sight of what they, in the folly of their youths, had wrought. XML could not have had the success it did without being aided and abetted by people who ought to have known better.
Technology decisions have a genuine moral valence. Every temporary hack, as soon as more than one person uses it, becomes effectively permanent. This means that if you are creating something that multiple people will use, you are in relationship of power over those people & part of your responsibility is to make sure you don’t harm them in the long run.
The decision to use XML for RSS led, in a fairly predictable way, to the demise of widespread RSS support — and so it was not merely a stupid decision but a morally bad one (since a better-designed format would not have died out so easily or become centralized in the hands of a companies like Google that eventually decided to choke it). The decision to use XML for RSS led inevitably to both Google Reader and Google’s decision to kill Google Reader, and that has been a huge setback for the “open web” (which, while it was never really open — basically for exactly these reasons — has never been as close to open since).
Webtech consists mostly of these “morally bad” design decisions.
New web browsers cannot be written, nor can web browsers be maintained except by the three largest tech companies in the world, because postel’s law (along with the IETF policy of “loose consensus and running code”) has doomed all web standards to being enormous baroque messes of corner cases that can only be navigated by the chinese-army technique of throwing waves of cheap contractors at it. Since no single person can completely understand any W3C standard, no person can be sure that they are generating W3C-compliant code or markup, so they test using an existing dominant browser (or maybe two or three of them). Any new browser, even in the event that it happens to adhere to the W3C standard, is unlikely to behave exactly the same way as these dominant browsers, and so actually-existing code will look “broken” (even if it’s actually being interpreted correctly according to the standard). This is a moral failing: it leads inevitably to centralization.
Using hostnames as part of the identifier for a piece of data leads inevitably to centralization as well. Your host must be beefy enough to take whatever traffic it gets. You cannot rely upon the caches of your peers (like in bittorrent or IPFS). So you need to pay to rent a beefy machine, pay for a domain name so you can switch to a beefier machine or load balance between several, etc. It’s protocol-enforced rentseeking.
Using XML at all leads inevitably to centralization. XML is complex enough that parsing it properly is difficult — so use a “proven” or “trusted” parser (nevermind whether you like any of the existing parsers, or if they are too heavy; you don’t have 10 free years to write your own). XML is bloated, so if you are using it for anything nontrivial, you need an even beefier machine to serve it.
HTTP is also bloated and over-complex in the same way. And don’t get me started on certificates and certificate chains. Or fucking DNS.
It doesn’t take a genius to come up with better formats than these for the same applications. These aren’t inherently-complex problems. These are relatively simple problems with obvious and straightforward solutions that the industry collectively ignored in favor of obviously-bad solutions that a couple famous people promoted.
Choosing an awkward and complex solution where a simple one will do is also a centralizing/rentseeking tactic. It’s a kind of gatekeeping-via-technical-debt.
If you write your application as a 10 line shell script, somebody who doesn’t know shell can still figure out how it works in an afternoon; if you write the same application as a ten thousand line Enterprise Java Microservice Architecture, not only will it be slower and worse but it will also be completely impenetrable to anybody who didn’t get a four year degree and spend it imbibing java enterprise bullshit (and even then, it’ll take them six months to get even a vague idea of how it works and a couple years to be comfortable changing anything).