Software defect costs

In my persuit of software engineering data, I’ve recently been poring over a 2002 report to the US Government on the annual costs of software  defects. The report is entitled “The Economic Impacts of Inadequate Infrastructure for Software Testing“. Ultimately, it estimates that software defects cost the US economy $59.5 billion every year.

Modelling such economic impacts is an incredibly complex task, and I haven’t read most of the report’s 309 pages (because much of it isn’t immediately relevant to my work). However, since trying to use some of the report’s data for my own purposes, certain things have been bothering me.

For instance, the following (taken from the report):

nist_table

This table summarises the consequences to users of software defects (where “users” are companies in the automotive and aerospace industries).

Strictly speaking, it shouldn’t even be a table. The right-most column serves no purpose, and what remains is a collection of disparate pieces of information. There is nothing inherently “tabular” about the data being presented. Admittedly, for someone skimming through the document, the data is much easier to spot in a table form than as plain text.

The last number piqued my curiosity, and my frustration (since I need to use it). What kind of person considers a $4 million loss to be the result of a “minor” error? This seems to be well in excess of the cost of a “major” error. If we multiply it by the average number of minor errors for each company (70.2) we arrive at the ludicrous figure of $282 million. For minor errors. Per company. Each year.

If the $4 million figure is really the total cost of minor errors – which would place it more within the bounds of plausibility – why does it say “Costs per bug”?

The report includes a similar table for the financial services sector. There, the cost per minor error is apparently a mere $3,292.90, less than a thousandth of that in the automotive and aerospace industries. However, there the cost of major errors is similarly much lower, and still fails to exceed the cost of minor errors. Apparently.

What’s more, the report seems to be very casual about its use of the words “bug” and “error”, and uses them interchangeably (as you can see in the above table). The term “bug” is roughly equivalent to “defect”. “Error” has a somewhat different meaning in software testing. Different definitions for these terms abound, but the report provides no definitions of its own (that I’ve found, anyway). This may be a moot point, because none of these terms accurately describe what the numbers are actually referring to – “failures”.

A failure is the event in which the software does something it isn’t supposed to do, or fails to do something it should. A defect, bug or fault is generally the underlying imperfection in the software that causes a failure. The distinction is important, because a single defect can result in an ongoing sequence of failures. The cost of a defect is the cost of all failures attributable to that defect, put together, as well as any costs associated with finding and removing it.

The casual use of the terms “bug” and “error” extends to the survey instrument – the questionnaire through which data was obtained – and this is where the real trouble lies. Here, potential respondants are asked about bugs, errors and failures with no suggestion of any difference in the meanings of those terms. It is not clear what interpretation a respondant would have taken. Failures are more visible than defects, but if you use a piece of buggy software for long enough, you will take note of the defects so that you can avoid them.

I’m not sure what effect this has on the final estimate given by the report, and I’m not suggesting that the $59.5 billion figure is substantially inaccurate. However, it worries me that such a comprehensive report on software testing is not more rigorous in its terminology and more careful in its data collection.

The colloquium

An “official communication” from early June demanded that all Engineering and Computing postgraduate students take part in the Curtin Engineering & Computing Research Colloquium. Those who didn’t might be placed on “conditional status”, the message warned.

A slightly rebellious instinct led me to think of ways to obey the letter but not the spirit of this new requirement. Particularly, the fact that previous colloquiums have been published online introduced some interesting possibilities:

  • a randomly-generated talk;
  • a discussion of some inventively embarrassing new kind of pseudo-science/quackery; or
  • the recitation of a poem.

In the end I yielded, and on the day (August 25) I gave a reasonably serious and possibly even somewhat comprehensible talk on a controlled experiment I’d conducted on defect detection in software inspections.

A while afterwards, I received in the mail a certificate of participation, certifying that I had indeed given the talk I had given. It felt a little awkward. Giving a 15 minute talk isn’t something I’d have thought deserving of a certificate. It might be useful for proving that I’ve done it, since it now appears to be a course requirement, but a simple note would have sufficed.

Interestingly, I later received another certificate, identical except that my thesis title had been substituted for the actual title of my talk. In essence, I now have a piece of paper, signed personally by the Dean of Engineering, certifying that I’ve given a talk that never happened.

Meta-engineering

I’m beginning to think I should have approached this maths modelling stuff from an engineering point of view: with a requirements document, version control and unit testing. Constructing a reasonably complicated mathematical model seems to have enough in common with software development that such things could be quite useful.

I’m calling this “meta-engineering”, because I’d be engineering the development of a model which itself describes (part of) the software engineering process.

The only problem is that formal maths notation can’t just be compiled and executed like source code, and source code is far too verbose (and lacking in variety of symbols) to give you a decent view of the maths.

Fortunately, Bayesian networks provide a kind of high-level design notation; perhaps the UML of probability analysis. Mine look like some sort of demented public transport system. However, drawing them in LaTeX using TikZ/PGF gives me a warm fuzzy feeling.

What am I doing?

Over the past few weeks I’ve had numerous questions of the form: “how’s your work going?” I find I can only ever answer this with banalities like “good” or “meh”.

It’s not that I don’t know what I’m doing. At any given point in time, I have a list of minor challenges written up on the whiteboard (which accumulate with monotonous regularity). However, my first problem is that I never remember what these are when I’m not actually working on them. I write them down so that I don’t have to remember, of course.

My second problem is that, even if I did remember what I was supposed to be doing, there just isn’t any short explanation. Currently I have on the whiteboard such startling conversation pieces as “Express CI in terms of S and U”. This may or may not tickle your curiosity (depending on how much of a nerd you are), but explaining what it means – and granted, I’ll have to do that eventually anyway – demands as much mental energy as solving the problem itself.

My third problem is  that I regularly shuffle around the meaning of the letters, to ensure I don’t run out of them and also to resolve inconsistencies. I’m currently using the entire English alphabet in my equations and a large proportion of the Greek one, so naming variables is a minor headache in itself. For instance, since I wrote the todo item “Express CI in terms of S and U”, I’ve decided to rename the variable “CI” to “CS“. Also, “S” used to be “T”, and “U” used to be two separate variables. This is mostly cosmetic, but I recoil at the prospect of explaining something so obviously in flux.

I choose to believe that I’ll be able to explain everything once I’ve written my thesis… and hopefully as I’m writing my thesis.

Artificial intelligence

A thought occurs, spurred on by my use of Bayesian networks. They’re used in AI (so I’m led to believe), though I’m using them to model the comprehension process in humans. However, I do also work in a building filled with other people applying AI techniques.

My question is this: how long until Sarah Connor arrives and blows up level 4? And if she doesn’t, does that mean that the machines have already won? Or does it simply mean that we’re all horrible failures and that nothing will ever come of AI?

A good friend (you know who you are) is working with and discovering things about ferrofluids. In my naivety, I now find myself wondering if you could incorporate some kind of neural structure into it, and get it to reform itself at will…

The Bayesian rabbit hole

You may recall previous rants about my theoretical framework. The recent evolution of my thought processes (much like all other times) has been something like this: hurrah, done… except… [ponder]… I should see if I can fix this little problem… [ponder]… How the hell is this supposed to work?… [ponder]… Damn, the library doesn’t have any books on that… [ponder]…  Gah, I’ll never finish this.

This all concerns the enormous equation slowly materialising in Chapter 7 of my thesis – the one that calculates the “cost effectiveness” of a software inspection. It used to be finished. I distinctly recall finishing it several times, in fact.

The equation was always long, but it used to contain relatively simple concepts like no. defects detected × average defect cost. Then I decided in a state of mild insanity that it would be much better if I had matrix multiplication in there. Then I decided that this wasn’t good enough either, and that what I really needed were some good solid Bayesian networks (often discussed in the context of artificial intelligence). I only just talked myself down from using continuous-time Bayesian networks, because – though I like learning about these things – at some point I’d like to finish my thesis and have a life.

(Put simply, Bayesian networks are a great way of working out probabilities when there are complex causal relationships, and you have limited knowledge. They also allow you to insert pretty diagrams into an otherwise swampy expanse of hard maths.)

On the up side, I’ve learnt what 2S means, where S is a set, and that there’s such a thing as product integration (as opposed to the normal area-under-the-curve “summation” integration). It’s all happening here.

The Zim desktop wiki

I’ve discovered that Zim is a great little brainstorming tool, for me at least. While I occasionally “think in images”, my brain usually works on words and symbols. A wiki – especially one that sports a LaTeX equation editor – seems to be a powerful way to assist a text-based brainstorming session. Being a desktop application (rather than a web application), Zim is also very simple to set up and use.

I spent today and yesterday using it to construct some arcane maths involving matrix multiplication. Said maths mostly turned out to be wrong, of course, but that’s all part of the process.

Theoretical frameworks, part 3

The first and second instalments of this saga discussed the thinking and writing processes. However, I also need to fess up to reality and do some measuring.

A theoretical framework is not a theory. The point of a theoretical framework is to frame theories – to provide all the concepts and variables that a theory might then make predictions about. (If I were a physicist these might be things like light and mass). You can test whether a theory is right or wrong by comparing its predictions to reality. You can’t do that for theoretical frameworks, because there are no predictions, only concepts and variables. The best you can do is determine whether those concepts and variables are useful. This really means you have to demonstrate some sort of use.

And so it falls to me to prove that there’s a point to all my cogitations, and to do so I need data. In fact, I need quite complex data, and in deference to approaching deadlines and my somewhat fatigued brain, I need someone else’s quite complex data.

The truth is – I’m probably not going to get it; at least, not all of it.  Ideally, I need data on:

  • the length of time programmers take to assimilate specific pieces of knowledge about a piece of software;
  • the specific types of knowledge required to assimilate other specific types of knowledge;
  • the probability that programmers will succeed in understanding something, including the probability that they find a defect;
  • the probability that a given software defect will be judged sufficiently important to correct;
  • the precise consequences, in terms of subsequent defect removal efforts, of leaving a defect uncorrected;
  • the cost to the end user of a given software defect;
  • the propensity of programmers to find higher-cost defects; and
  • the total number of defects present in a piece of software in the first place.

I also need each of these broken down according to some classification scheme for knowledge and software defects. I also need not just ranges of values but entire probability distributions. Such is the pain of a theoretical framework that attempts to connect rudimentary cognitive psychology to economics via software engineering.

With luck, I may be able to stitch together enough different sources of data to create a usable data set. I hope to demonstrate usefulness by using this data to make recommendations about how best to find defects in software.

Theoretical frameworks, part 2

Carrying on from my last research-related rant, my other problem of late lies in the writing process.

The framework is supposed to assist the detection of defects in software, in a very round-about fashion. Why is this important? Well, hands up who hasn’t lost work as a result of software screwing up. Some have died. Many people have experimented with more direct ways to assist defect detection, with some success, but as a result there are now many disjointed explanations of what works and why. Intuitively there should only be one explanation, taking into account everyone’s accumulated experience on the subject, but nobody (as far as I know) has really sat down to work out what it might be.

That’s not to say that what I’m doing is especially hard. If nobody else is doing it, it’s only because so far they’ve been busy getting us all to this point – our current level of understanding – not because it takes anything special to go beyond it.

Nevertheless, I’ve been revising and rethinking this damn theoretical framework for over two years now, on and off. The first vague thoughts coalesced in late 2006, but it was a long time before I worked out what it was actually for. That didn’t stop me writing about it, however (because that’s what you do). I’ve written a lot and drawn a lot, but there hasn’t been a single cohesive strain of thought. It has been more like an evolving organism. Like DNA, I have many different segments of writing dating from different periods, each adapted to different circumstances. I use various methods of flagging these as being “non-current”, but I dare not throw them out in case there are still important truths buried therein.

The first step in writing is often the “brain dump”, where you pour out all your thoughts into a monologue. However, I find that I can’t really do a good monologue when I already have chunks of writing – the result of previous brain dumps – that need to be knitted together somehow. It’s hard to revise your thinking on something that you haven’t thought out completely to begin with, and then do so again, and again, and again. If you’re as disorganised as I am, entropy catches up with you.

When asked, I tell everyone that I’m “getting there”, which translates to “things are happening but I’m making no commitments”.

Theoretical frameworks

One of the chapters of my much-delayed thesis describes (or rather will describe) a theoretical framework, which is academic-speak for “a way of understanding stuff” in a given field. In my case, stuff = software inspections, and my way of understanding them is a mixture of abstractions of abstractions of abstractions and some slightly crazy maths, just to give it that extra bit of abstractedness that seemed to be lacking.

It’s very easy when engaged in abstract theorising to forget what it is you’re actually modelling. All those boxes and lines look positively elegant on a whiteboard, but when you come to describe what the concepts represent and how someone would actually use it, things frequently go a bit pear-shaped. The problem, as far as I’ve been able to tell, is the limited short-term memory available for all this mental tinkering. What you need is to keep the concrete and the abstract in your head simultaneously, but this is easier said than done (especially if one’s head is full of concrete to begin with). When the abstract gets very abstract and there’s lots of it, the real-world stuff slips quietly out of your consciousness without telling you.

Sometimes it’s only a small thing that gets you. Sometimes you realise that it all mostly makes sense, if only this box was called something else. Then there are times when you finish your sketch with a dramatic flourish, try to find some way of describing the point of the whole thing, and shortly after sit back in an embarrassing silence.

My latest accomplishment, or perhaps crime against reason, is the introduction of integrals into my slightly crazy maths (already liberally strewn with capital sigmas). An integral, for the uninitiated, looks a bit like an S, but rather pronounced “dear god, no”. You can think of it as the sum of an infinite number of infinitely small things, which of course is impossible. However, it does allow my theoretical framework to abstract… no, nevermind.