Non-consensual wisdom

Previously, Shane Greenup brought to my attention two very interesting software projects, with somewhat similar goals: his own rbutr (currently in beta testing), and Dan Whaley’s Hypothes.is (currently being planned and prototyped).

Rbutr (pronounced “rebutter”) allows its user base to link together web pages that rebut one another. These links eventually form conversation chains and webs that may span any number of websites, without needing or seeking the consent of the website owners. I, as a blogger, would have no control (or, at least, no veto) over rbutr links connecting my blog posts to someone else’s refutation of them, but these links would be available for any reader (who uses rbutr) to see and follow.

Hypothes.is has the even more ambitious goal of providing an “annotation layer” for the Internet. Any arbitrary passage of text (as well as other media types, including images, video and audio) within any web page may be adorned with a critical remark, visible to anyone else using the software, again without the consent of the web site owner. It aims to be a fine-grained peer-review system for, well, everything.

The minds behind Hypothes.is are open about the fact that others have tried and failed (to varying extents) in the goal of creating a “web annotator”. However, they seem very determined to identify and learn from past mistakes. Perhaps the most important of these has been the lack of quality control in creating annotations. In a previous post I mentioned a similar project called Dispute Finder. I now gather that Dispute Finder’s database itself may have been overrun by misinformation. As one article explains:

Third, and most critical in my thinking, there will be [in Hypothes.is] an extensive reputation system to qualify and rank comments based on the expertise of the commenter. The lack of this was part of what doomed an earlier project called Dispute Finder. I thought for a while that it would evolve into the tool skeptics needed, but very quickly the data in that tool was awash in conspiracy theories and other nonsense, with no way provided to sort by quality.

Hypothes.is is bringing together a pool of experts to determine how to create a “reputation model” to prevent this sort of thing from happening again. After all, Wikipedia seems to manage commendably well to resist incursions from interest groups1. Even the Slashdot moderation system seems to successfully raise up interesting and insightful comments at the expense of mundane and simplistic ones. I feel that our collective intelligence, though sometimes disorganised, is often under-appreciated.

Projects like this might prove an attractive middle road between (a) the Internet as a anarchic incubator of (mis-)information, and (b) the Internet as an oppressively-sanitised, centrally-regulated newspaper. Join the dots, for instance, between Hypothes.is and the current debate over media regulation in Australia. Libertarian-minded newspapers and bloggers take furious offence to any suggestion that their activities should be overseen by The Government.

It would be hard to mount quite the same argument, with quite the same emotive imagery, against Hypothes.is or rbutr. While non-consensual, there is certainly no coercion involved — no fines, no censorship, no forced apologies, etc. There is nothing here that need be sanctioned by those in power. The system operates on a purely informative level. Affected websites are not required to do anything, and nobody is required to use the system in the first place. Such systems can only succeed if people choose to use them. That (presumably) will only happen as long as they meet a socially/psychologically acceptable level of reasonableness and transparency.

But neither is Hypothes.is or rbutr a “toothless tiger”. It would surely be a blow to authors’ and editors’ egos and credibility to have third-party corrections publicly scribbled over their otherwise majestic prose. They would have to contend with new, publicly-known metrics that assess aspects of their intellectual integrity, not just “hits” and the like that demonstrate their popularity. They would no longer enjoy the same flexibility with the truth, considering that their errors may be almost immediately visible. Any third-party annotations could easily become the most attention-grabbing parts of an article, destroying at a glance whatever the original (accidental or deliberate) misinformation may have been.

As a result, there would surely some backlash from tabloid newspapers and bloggers upon discovering that they no longer have absolute control over what their readers read when visiting their sites. They might even consider it a threat to their business model. Operators like Andrew Bolt certainly seem to make a career out of saying things that need to be corrected (while at the same time exhibiting extraordinary defensiveness).

If it works, Hypothes.is could initially make a lot of people very, very angry. There could be lawsuits — particularly of the defamation variety, I imagine — and that could be a problem for a non-profit organisation. But, if it gets that far, the idea of a peer-reviewed Internet has already won.

 

  1. That isn’t to say I’d rely on the quality of Wikipedia, necessarily. However, for a publicly-editable resource, it is curiously bereft of the kind of backhanded misinformation and puerile simplicity you find in many — even professional — online news articles or blog posts, and the outright lunacy you find in the comments section underneath (present company excepted, of course). []

Curing viral misinformation

A great deal of mischief is caused, regularly, by viral misinformation. Factoids that support one side of any controversial issue are rapidly copied and pasted many times over (the “echo chamber”). By the time anyone manages to marshal the truth into a coherent response, it’s too late — the lie has convinced enough people for it to become self-reinforcing. Everyone can probably name some examples of this, particularly in day-to-day national politics.

I can’t help but quote Churchill:

A lie gets halfway around the world before the truth has a chance to get its pants on.

(Given the Internet, this actually seems rather conservative.)

For me, the frenzied reaction in 2009 to the hacked CRU emails springs to mind. All manner of nefarious interpretations were placed on isolated snippets of private correspondence of climate scientists, before anyone in a position to understand the emails’ context (or at least the lack thereof) could conduct an honest evaluation. And in cases like this, the lies are often more complete than the truth, and certainly more interesting.

I don’t have an exact model of how this process unfolds. However, I suspect that, if we sat down and analysed a sample of propagated misinformation, we’d find that important parts of the original wording have largely been preserved, with very little paraphrasing. Misinformation only manages to propagate so fast because higher cognitive levels1 are (probably) never reached in the initial hours of propagation. This means that the propagation of misinformation is largely a mechanical process (not a creative one), which places it within the reach of automated or semi-automated analysis.

To come to the point, we can and should devise a tool to automatically detect this misinformation, and build it into the web browser — a browser extension. It should highlight and annotate misinformation in any web page the user views, based on a regularly-updated database. There are a few sites already dedicated to correcting misinformation (Snopes, Skeptical Science, etc.), and they are certainly invaluable, but a greater prize is to have misinformation annotated without any immediate human effort at all.

I’ve been toying with this idea for over a year, considering how to engineer communication between the browser extension and the database, how to provide flexibility in searching for different types of misinformation, while avoiding software security vulnerabilities, etc. (I should probably have written a prototype by now, but paid work took priority.)

It turns out — unsurprisingly — that others have considered some of these issues as well. The existing research tool Dispute Finder is very similar to what I’d envisaged. (It was well reported back in 2009, but clearly escaped my attention at the time). However, that project has apparently ended, and its principal investigator Rob Ennals has moved on. The Firefox browser extension has been removed, so I haven’t seen it in action, and presumably the database is no longer available either. The project did get as far as conducting user evaluations of the software. Perhaps Dispute Finder was only intended to have a fixed lifetime, or perhaps the authors decided that the project was not sufficiently successful.

Skeptical Science has its own Firefox browser extension, but this is climate-change-specific, and so is most likely to be used by those who consciously and actively accept the reality of climate change. That’s not to say it isn’t useful, but its effects on public discourse are probably indirect.

A generic “lie detector” tool might have a disproportionately greater impact on public discourse compared to a domain-specific tool. The generic tool would cover a much greater array of misinformation, and as a result would probably also gain wider acceptance. For instance, at least some of those who don’t particularly care about or believe climate science may nonetheless choose to use the generic tool for its treatment of other issues. (Hard core denialists of any stripe may complain about the “anomalous” treatment of their pet topics. Such complaints might be a blessing in disguise, actually boosting awareness.)

In fact, there are really two pieces of software here: the browser extension itself and the database. Given an appropriate means of communication, they could be developed quite independently.

The source code for Dispute Finder (previously “Think Link”) seems to be available here. I still intend to write my own independently, because I have different views on the technical architecture, which I may elucidate in future. The research findings of the Dispute Finder / “Confrontational Computing” project are certainly worth pondering, though. It would be a waste to ignore the experience gained, and it seems too good an idea to give up on.

  1. Bloom’s taxonomy breaks cognition into distinct levels: knowledge, comprehension, application, analysis, synthesis and evaluation. The “knowledge” level is pure rote learning, while “evaluation” represents critical thinking. []

One brand to fool them all

As you might have realised, I work at Curtin University, formerly Curtin University of Technology (CUT), formerly – though conceivably somewhat apocryphally – Curtin University of New Technology (CU*T), formerly the Western Australian Institute of Technology (WAIT), formerly Perth Technical College, formerly Perth Technical School, formerly – and definitely more apocryphally – the New Holland Colonial Blacksmith and Breakfast Bar, formerly the East Gondwana School of Blunt Instruments.

We’re not good at names.

One of the most frustrating aspects of this particular institutional blight is how it plays out in the University’s ICT services. There are those who must spend long dark hours of their lives dreaming up grandiose names with which to inspire the huddled masses to come forth and be dazzled by yet another online service. The problem is that we have hundreds of such services, and it’s an act of cognitive warfare to suggest that we should memorise that many bizarre acronyms and cute but hideously overly-generic terms and the circumstances in which they must be applied. Without wishing to blame anyone in particular, it’s all getting a bit ridiculous.

You can see how and why it happens. The University’s ICT infrastructure has grown organically, bits and pieces being added over time with no real coordination. This is probably inevitable in a large, diverse organisation. The plethora of different ICT services resemble a market, with each different product competing for mind share. However, it’s not a market, and in theory we’re supposed to use all of the relevant services. So, when a new service is added or updated, it suddenly becomes Very Important that everyone bow down before the mighty ingenuity involved, and recognise the sudden urgency with which the new technology must be adopted. The next service to be added or updated after that requires the same thing, and so on. To make it happen, each of these new services can’t just be named – they must be branded. ICT services are not just provided at Curtin – they are, in the marketing sense, sold.

Several years ago, University management commissioned the “OASIS” website, with the aim of integrating all the disparate online services (and introducing our beloved Official Communication Channel). OASIS originally stood stands for “Online Access to Student Information Service” (a backronym, one presumes). Now, however, it doesn’t seem to stand for anything. It’s just a meaningless name, and thus is itself a perfect example of the problem at work.

OASIS was originally marketted as the “One Site to Rule Them All”, which it sort of is, but only at a very superficial level. There’s a lot of delegation involved, and the “ruling them all” bit only goes as far as logging in. Once you’re logged in, you still need to navigate a maze of services that are still essentially separate niche applications. The fact that these are not fully integrated, functionally and stylistically, is not the first problem. The first problem is what they’re all called.

Names of these services include “eVALUate”, “StudentOne”, “eStudent”, and “eAcademic”, among others. My point is perhaps more easily grasped by an outsider, for whom these names must seem rather useless as descriptions of what the systems actually do. Indeed they are – eVALUuate has nothing to do with grading or student results, StudentOne is inaccessible to students, eStudent provides nothing that students will find useful on a regular basis, and eAcademic is actually used to access student information. (The true functions of these systems are, of course, better understood by Curtin staff and students than by outsiders, but only by being forced to use them.)

Now, there are many ways in which these services might be better integrated, but a not-insignificant amount of confusion and cognitive waste could be alleviated simply by coming up with names that actually make sense. By this, I mean intuitively obvious, not requiring large-scale internal marketing programs. The ICT branding we have at the moment is a complete waste of resources at every level. In my ever-humble opinion, all these services should have purely functional names. They should not stand out. They should not be cute, or cool, or inspiring, or grandiose. They should be simple, accurate, no-nonsense descriptions of the services provided. For instance:

  • OASIS should be called “Curtin online services”
  • eVALUate should be called “Course/teaching feedback”
  • StudentOne should be called “Student database”
  • eStudent should be called “View/update your enrolment details”
  • eAcademic should be called “View student details”

At least, that gives you some idea.

I don’t know how the mind of a marketing person might react to this. I’d hope that a good marketing person might recognise the merits of functional naming as a means of encouraging the use of ICT services.

Existence continuation

As you have doubtless deduced from my total failure to keep you entertained over the last month and a half, I have in fact been a little busy. Possible illusions to the contrary notwithstanding, my existence is not synonymous with that of my blog. (At least, not yet it isn’t. This may change later in life when my consciousness is uploaded. I’ll keep you posted.)

First, it may be worth noting that, after more than six years, my PhD thesis is about to be officially approved. I shall thus enter the PhD afterlife, my soul having been judged and marked, and corrections thereto proclaimed. I shall wander the Earth instilling great wisdom in anyone mildly curious about the nature and mechanics of comparisons between different software inspection strategies, for such has been the tiny sliver of human experience to which I have contributed.

Second, I have simultaneously rewritten and lectured a unit on C programming in UNIX, a feat probably not entirely without precedent, although this has been my first real lecturing experience. It is said that writing your own lecture notes gives you much better preparation for lecturing than reusing someone else’s, and this was probably true. In hindsight I would recommend having all this done before the semester begins, though in any event I didn’t have much choice. Over the course of 14 weeks, I developed 10 lectures, 9 tutorial worksheets and associated tutors’ notes, 3 tests (including one exam) plus marking guides, 3 mock tests plus answer guides, and 1 assignment plus marking guide. That’s 26-42 distinct documents (depending on how you count them), whose combined content would rival a PhD thesis (and I know). This was in addition to 24 hours/week of actual face-to-face teaching.

Third, (as any teacher or lecturer will be all too aware) after the teaching comes the marking. Exam marking is easy – there is no feedback to give, and a fairly constrained context for creative idiocy. You get to see what students do well, and what they do poorly, and what questions could be improved for next time. Prac report marking is also relatively easy, though collating them all at the end seems to lead to the conclusion among some students that they just weren’t going to be marked at all.

Assignment marking is hard. In my case, this is partly because programming assignments do not really constrain creative idiocy. There are many ways to write a program correctly, but infinitely more ways to do it badly. It’s not as mind numbing as (I imagine) essay marking must be. The very worst essays (such as the incoherent, smudged dribblings written in high school English Lit exams by none other than yours truly), must be positively soul destroying to read and mark. You can mark a programming assignment partly by running it, but nevertheless, like an essay, it must also be read.

Reading said programming assignments was made at the same time more entertaining and more frustrating by a discovery I made towards the end. Imagine, if you will, twelve student essays (essays being an analogy here for programs) that are identical except for the substitution of synonyms, punctuation, fonts and paragraph breaks. Then imagine the students involved swearing blind that they merely discussed the topic and certainly never copied anything, and that of course they were the same because the question could only be answered one way.  Then imagine – and this is what really annoys me and my good colleague who was roped into conducting the investigation – that a large proportion of these students are, for want of a better term, good students and will easily pass the unit. (At least, they will if my suspicions about the severity of their punishments turns out to be correct.)

Now that’s all behind me, I come to my last (and continuing) major task for the year: trying to find a job. Then, absurdly, I might have some free time.

Open source science

Slashdot notes an article from the Guardian: “If you’re going to do good science, release the computer code too“. The author is, Darrel Ince, is a Professor of Computing at The Open University. You might recognise something of the mayhem that is the climate change debate in the title.

Both the public release of scientific software and the defect content thereof are worthwhile topics for discussion. Unfortunately, Ince seems to go for over-the-top rhetoric without having a great deal of evidence to support his position.

For instance, Ince cites an article by Professor Les Hatton (who I also cite on account of his recent study on software inspection checklists). Hatton’s article here was on defects in scientific software. The unwary reader might get the impression that Hatton was specifically targetting recent climate modelling software, since that’s the theme of Ince’s article. However, Hatton discusses studies conducted from 1990-1994, in different scientific disciplines. The results might still be applicable, but it’s odd that the Ince would choose to cite such an old article as his only source. There are much newer and more relevant papers; for instance:

S. M. Easterbrook and T. C. Johns (2009), Engineering the Software for Understanding Climate Change, Computing in Science and Engineering.

I stumbled across this article within ten minutes of searching. While Hatton takes a broad sample of software from across disciplines, Easterbrook and Johns  delve into the processes employed specifically in the development of climate modelling software. Hatton reports defect densities of around 8 or 12 per KLOC (thousand lines of code), while Easterbrook and Johns suggest 0.03 defects per KLOC for the current version of the climate modelling software under analysis. Quite a difference – two orders of magnitude, for those counting.

Based on Hatton’s findings of the defectiveness of scientific software, Ince says:

This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program.

This is a profoundly strange thing for a Professor of Computing to say. It’s certainly true that one single error can invalidate a computer program, but whether it usually does this is not so obvious. There is no theory to support this proclamation, nor any empirical study (at least, none cited). Non-scientific programs are littered with bugs, and yet they are not useless. Easterbrook and Johns report that many defects, before being fixed, had been “treated as acceptable model imperfections in previous releases”, clearly not the sort of defects that would invalidate the model. After all, models never correspond perfectly to empirical observations anyway, especially in such complex systems as climate.

Ince claims, as a running theme, that:

Many climate scientists have refused to publish their computer programs.

His only example of this is Mann, who by Ince’s own admission did eventually release his code. The climate modelling software examined by Easterbrook and Johns is available under licence to other researchers, and RealClimate lists several more publicly-available climate modelling programs. I am left wondering what Ince is actually complaining about.

Finally, Ince seems to have a rather brutal view of what constitutes acceptable scientific behaviour:

So, if you are publishing research articles that use computer programs, if you want to claim that you are engaging in science, the programs are in your possession and you will not release them then I would not regard you as a scientist; I would also regard any papers based on the software as null and void.

This is quite a militant position, and does not sound like a scientist speaking. If Ince himself is to be believed (in that published climate research is often based on un-released code), then the reviewers of those papers who recommended publication clearly didn’t think as Ince does – that the code must be released.

Ince may be convinced that scientific software must be publicly-auditable. However, scientific validity ultimately derives from methodological rigour and the reproducibility of results, not from the availability of source code. The latter may be a good idea, but it is not necessary in order to ensure confidence in the science. Other independent researchers should be able to confirm or contradict your results without requiring your source code, because you should have explained all the important details in published papers. (In the event that your results are not reproducible due to a software defect, releasing the source code may help to pinpoint the problem, but that’s after the problem has been noticed.)

There was a time before computing power was widely available, when model calculations were evaluated manually. How on Earth did science cope back then, when there was no software to release?

Please reboot the aircraft

I was hearing vague snippets of the disaster that was the Virgin Blue computer system, but my JetStar flight had its own problems. Everyone was seated (that is, except for the restless and very, very sensitive toddler standing on the opposite window seat, who burst into tears whenever mum dared suggest he sit down and put his seat belt on), but there seemed to be a delay.

It was getting quite stuffy, actually. A couple of people took to fanning themselves with the A320-232 safety instruction cards. It emerged that there were “maintenance issues”, which sounded a little dubious. Shortly thereafter, the captain (or someone) informed us that the problem was indeed related to the air-con. He could fix it in 2 seconds, but he would need to switch the plane off.

Had they, on the spur of the moment, installed a new air-conditioning software update? At least this was happening before takeoff, I thought to myself. (For instance, they didn’t say this: “Sorry, ladies and gentlemen – we will shortly begin a rapid descent towards to ocean while we install this critical software patch and restart the aircraft. Not sure how long we’ll be – let’s just hope it works this time.”)

So, for about a minute, the cabin lights were replaced by blue-tinted torch light, the engines died down and there was eery quiet (that is, except for the gentle snorting of the person next to me and the squeals from across the isle). It was also a reprieve from the terrible, cheesy music that had been playing over the speakers to pass the time; cheesy to an extent that can surely only be achieved with premeditated malice.

Then, with our air-con software apparently working as advertised, all hands reached for the vents above our seats and we were off.

Admit me to the conspiracy

Deltoid takes a look at a piece of code taken from the Climate Research Unit (CRU) that apparently has the denialists salivating. Buried therein is the following comment: “Apply a VERY ARTIFICAL [sic] correction for decline!!” Are you convinced yet of the global leftist socialist global warming alarmist conspiracy?! I certainly am.

I’d also like to apply for membership. You see, trawling through my own code for handling experimental data (from September 2008), I’ve re-discovered my own comment: “Artificially extends a data set by a given amount”. Indeed, I appear to have written two entire functions to concoct artificial data*, clearly in nefarious support of the communist agenda. I therefore submit myself as a candidate for the conspiracy. The PhD is only a ruse, after all. Being a member of the Conspiracy is the only qualification that really counts in academia.

* I’m not making this up – I really do have such functions. However, lest you become concerned about the quality of my research, this artificial data was merely used to test the behaviour of the rest of my code. It was certainly not used to generate actual results. I can sympathise with the researcher(s) who leave such untidy snippets of  code lying around, and I’m a software engineer who should know better!

Software defect costs

In my persuit of software engineering data, I’ve recently been poring over a 2002 report to the US Government on the annual costs of software  defects. The report is entitled “The Economic Impacts of Inadequate Infrastructure for Software Testing“. Ultimately, it estimates that software defects cost the US economy $59.5 billion every year.

Modelling such economic impacts is an incredibly complex task, and I haven’t read most of the report’s 309 pages (because much of it isn’t immediately relevant to my work). However, since trying to use some of the report’s data for my own purposes, certain things have been bothering me.

For instance, the following (taken from the report):

nist_table

This table summarises the consequences to users of software defects (where “users” are companies in the automotive and aerospace industries).

Strictly speaking, it shouldn’t even be a table. The right-most column serves no purpose, and what remains is a collection of disparate pieces of information. There is nothing inherently “tabular” about the data being presented. Admittedly, for someone skimming through the document, the data is much easier to spot in a table form than as plain text.

The last number piqued my curiosity, and my frustration (since I need to use it). What kind of person considers a $4 million loss to be the result of a “minor” error? This seems to be well in excess of the cost of a “major” error. If we multiply it by the average number of minor errors for each company (70.2) we arrive at the ludicrous figure of $282 million. For minor errors. Per company. Each year.

If the $4 million figure is really the total cost of minor errors – which would place it more within the bounds of plausibility – why does it say “Costs per bug”?

The report includes a similar table for the financial services sector. There, the cost per minor error is apparently a mere $3,292.90, less than a thousandth of that in the automotive and aerospace industries. However, there the cost of major errors is similarly much lower, and still fails to exceed the cost of minor errors. Apparently.

What’s more, the report seems to be very casual about its use of the words “bug” and “error”, and uses them interchangeably (as you can see in the above table). The term “bug” is roughly equivalent to “defect”. “Error” has a somewhat different meaning in software testing. Different definitions for these terms abound, but the report provides no definitions of its own (that I’ve found, anyway). This may be a moot point, because none of these terms accurately describe what the numbers are actually referring to – “failures”.

A failure is the event in which the software does something it isn’t supposed to do, or fails to do something it should. A defect, bug or fault is generally the underlying imperfection in the software that causes a failure. The distinction is important, because a single defect can result in an ongoing sequence of failures. The cost of a defect is the cost of all failures attributable to that defect, put together, as well as any costs associated with finding and removing it.

The casual use of the terms “bug” and “error” extends to the survey instrument – the questionnaire through which data was obtained – and this is where the real trouble lies. Here, potential respondants are asked about bugs, errors and failures with no suggestion of any difference in the meanings of those terms. It is not clear what interpretation a respondant would have taken. Failures are more visible than defects, but if you use a piece of buggy software for long enough, you will take note of the defects so that you can avoid them.

I’m not sure what effect this has on the final estimate given by the report, and I’m not suggesting that the $59.5 billion figure is substantially inaccurate. However, it worries me that such a comprehensive report on software testing is not more rigorous in its terminology and more careful in its data collection.

The colloquium

An “official communication” from early June demanded that all Engineering and Computing postgraduate students take part in the Curtin Engineering & Computing Research Colloquium. Those who didn’t might be placed on “conditional status”, the message warned.

A slightly rebellious instinct led me to think of ways to obey the letter but not the spirit of this new requirement. Particularly, the fact that previous colloquiums have been published online introduced some interesting possibilities:

  • a randomly-generated talk;
  • a discussion of some inventively embarrassing new kind of pseudo-science/quackery; or
  • the recitation of a poem.

In the end I yielded, and on the day (August 25) I gave a reasonably serious and possibly even somewhat comprehensible talk on a controlled experiment I’d conducted on defect detection in software inspections.

A while afterwards, I received in the mail a certificate of participation, certifying that I had indeed given the talk I had given. It felt a little awkward. Giving a 15 minute talk isn’t something I’d have thought deserving of a certificate. It might be useful for proving that I’ve done it, since it now appears to be a course requirement, but a simple note would have sufficed.

Interestingly, I later received another certificate, identical except that my thesis title had been substituted for the actual title of my talk. In essence, I now have a piece of paper, signed personally by the Dean of Engineering, certifying that I’ve given a talk that never happened.

Old computers

The Linux boot up message of the moment:

/ has gone 49710 days without being checked, check forced.

This would place the manufacturing date of the computer in question at around 1872 or earlier; a century before the UNIX epoch (the official Dawn of Time for UNIX-based computers) and at least 86 years prior to the invention of the microchip.