Non-consensual wisdom

Previously, Shane Greenup brought to my attention two very interesting software projects, with somewhat similar goals: his own rbutr (currently in beta testing), and Dan Whaley’s Hypothes.is (currently being planned and prototyped).

Rbutr (pronounced “rebutter”) allows its user base to link together web pages that rebut one another. These links eventually form conversation chains and webs that may span any number of websites, without needing or seeking the consent of the website owners. I, as a blogger, would have no control (or, at least, no veto) over rbutr links connecting my blog posts to someone else’s refutation of them, but these links would be available for any reader (who uses rbutr) to see and follow.

Hypothes.is has the even more ambitious goal of providing an “annotation layer” for the Internet. Any arbitrary passage of text (as well as other media types, including images, video and audio) within any web page may be adorned with a critical remark, visible to anyone else using the software, again without the consent of the web site owner. It aims to be a fine-grained peer-review system for, well, everything.

The minds behind Hypothes.is are open about the fact that others have tried and failed (to varying extents) in the goal of creating a “web annotator”. However, they seem very determined to identify and learn from past mistakes. Perhaps the most important of these has been the lack of quality control in creating annotations. In a previous post I mentioned a similar project called Dispute Finder. I now gather that Dispute Finder’s database itself may have been overrun by misinformation. As one article explains:

Third, and most critical in my thinking, there will be [in Hypothes.is] an extensive reputation system to qualify and rank comments based on the expertise of the commenter. The lack of this was part of what doomed an earlier project called Dispute Finder. I thought for a while that it would evolve into the tool skeptics needed, but very quickly the data in that tool was awash in conspiracy theories and other nonsense, with no way provided to sort by quality.

Hypothes.is is bringing together a pool of experts to determine how to create a “reputation model” to prevent this sort of thing from happening again. After all, Wikipedia seems to manage commendably well to resist incursions from interest groups1. Even the Slashdot moderation system seems to successfully raise up interesting and insightful comments at the expense of mundane and simplistic ones. I feel that our collective intelligence, though sometimes disorganised, is often under-appreciated.

Projects like this might prove an attractive middle road between (a) the Internet as a anarchic incubator of (mis-)information, and (b) the Internet as an oppressively-sanitised, centrally-regulated newspaper. Join the dots, for instance, between Hypothes.is and the current debate over media regulation in Australia. Libertarian-minded newspapers and bloggers take furious offence to any suggestion that their activities should be overseen by The Government.

It would be hard to mount quite the same argument, with quite the same emotive imagery, against Hypothes.is or rbutr. While non-consensual, there is certainly no coercion involved — no fines, no censorship, no forced apologies, etc. There is nothing here that need be sanctioned by those in power. The system operates on a purely informative level. Affected websites are not required to do anything, and nobody is required to use the system in the first place. Such systems can only succeed if people choose to use them. That (presumably) will only happen as long as they meet a socially/psychologically acceptable level of reasonableness and transparency.

But neither is Hypothes.is or rbutr a “toothless tiger”. It would surely be a blow to authors’ and editors’ egos and credibility to have third-party corrections publicly scribbled over their otherwise majestic prose. They would have to contend with new, publicly-known metrics that assess aspects of their intellectual integrity, not just “hits” and the like that demonstrate their popularity. They would no longer enjoy the same flexibility with the truth, considering that their errors may be almost immediately visible. Any third-party annotations could easily become the most attention-grabbing parts of an article, destroying at a glance whatever the original (accidental or deliberate) misinformation may have been.

As a result, there would surely some backlash from tabloid newspapers and bloggers upon discovering that they no longer have absolute control over what their readers read when visiting their sites. They might even consider it a threat to their business model. Operators like Andrew Bolt certainly seem to make a career out of saying things that need to be corrected (while at the same time exhibiting extraordinary defensiveness).

If it works, Hypothes.is could initially make a lot of people very, very angry. There could be lawsuits — particularly of the defamation variety, I imagine — and that could be a problem for a non-profit organisation. But, if it gets that far, the idea of a peer-reviewed Internet has already won.

 

  1. That isn’t to say I’d rely on the quality of Wikipedia, necessarily. However, for a publicly-editable resource, it is curiously bereft of the kind of backhanded misinformation and puerile simplicity you find in many — even professional — online news articles or blog posts, and the outright lunacy you find in the comments section underneath (present company excepted, of course). []

Curing viral misinformation

A great deal of mischief is caused, regularly, by viral misinformation. Factoids that support one side of any controversial issue are rapidly copied and pasted many times over (the “echo chamber”). By the time anyone manages to marshal the truth into a coherent response, it’s too late — the lie has convinced enough people for it to become self-reinforcing. Everyone can probably name some examples of this, particularly in day-to-day national politics.

I can’t help but quote Churchill:

A lie gets halfway around the world before the truth has a chance to get its pants on.

(Given the Internet, this actually seems rather conservative.)

For me, the frenzied reaction in 2009 to the hacked CRU emails springs to mind. All manner of nefarious interpretations were placed on isolated snippets of private correspondence of climate scientists, before anyone in a position to understand the emails’ context (or at least the lack thereof) could conduct an honest evaluation. And in cases like this, the lies are often more complete than the truth, and certainly more interesting.

I don’t have an exact model of how this process unfolds. However, I suspect that, if we sat down and analysed a sample of propagated misinformation, we’d find that important parts of the original wording have largely been preserved, with very little paraphrasing. Misinformation only manages to propagate so fast because higher cognitive levels1 are (probably) never reached in the initial hours of propagation. This means that the propagation of misinformation is largely a mechanical process (not a creative one), which places it within the reach of automated or semi-automated analysis.

To come to the point, we can and should devise a tool to automatically detect this misinformation, and build it into the web browser — a browser extension. It should highlight and annotate misinformation in any web page the user views, based on a regularly-updated database. There are a few sites already dedicated to correcting misinformation (Snopes, Skeptical Science, etc.), and they are certainly invaluable, but a greater prize is to have misinformation annotated without any immediate human effort at all.

I’ve been toying with this idea for over a year, considering how to engineer communication between the browser extension and the database, how to provide flexibility in searching for different types of misinformation, while avoiding software security vulnerabilities, etc. (I should probably have written a prototype by now, but paid work took priority.)

It turns out — unsurprisingly — that others have considered some of these issues as well. The existing research tool Dispute Finder is very similar to what I’d envisaged. (It was well reported back in 2009, but clearly escaped my attention at the time). However, that project has apparently ended, and its principal investigator Rob Ennals has moved on. The Firefox browser extension has been removed, so I haven’t seen it in action, and presumably the database is no longer available either. The project did get as far as conducting user evaluations of the software. Perhaps Dispute Finder was only intended to have a fixed lifetime, or perhaps the authors decided that the project was not sufficiently successful.

Skeptical Science has its own Firefox browser extension, but this is climate-change-specific, and so is most likely to be used by those who consciously and actively accept the reality of climate change. That’s not to say it isn’t useful, but its effects on public discourse are probably indirect.

A generic “lie detector” tool might have a disproportionately greater impact on public discourse compared to a domain-specific tool. The generic tool would cover a much greater array of misinformation, and as a result would probably also gain wider acceptance. For instance, at least some of those who don’t particularly care about or believe climate science may nonetheless choose to use the generic tool for its treatment of other issues. (Hard core denialists of any stripe may complain about the “anomalous” treatment of their pet topics. Such complaints might be a blessing in disguise, actually boosting awareness.)

In fact, there are really two pieces of software here: the browser extension itself and the database. Given an appropriate means of communication, they could be developed quite independently.

The source code for Dispute Finder (previously “Think Link”) seems to be available here. I still intend to write my own independently, because I have different views on the technical architecture, which I may elucidate in future. The research findings of the Dispute Finder / “Confrontational Computing” project are certainly worth pondering, though. It would be a waste to ignore the experience gained, and it seems too good an idea to give up on.

  1. Bloom’s taxonomy breaks cognition into distinct levels: knowledge, comprehension, application, analysis, synthesis and evaluation. The “knowledge” level is pure rote learning, while “evaluation” represents critical thinking. []