Meta-engineering

I’m beginning to think I should have approached this maths modelling stuff from an engineering point of view: with a requirements document, version control and unit testing. Constructing a reasonably complicated mathematical model seems to have enough in common with software development that such things could be quite useful.

I’m calling this “meta-engineering”, because I’d be engineering the development of a model which itself describes (part of) the software engineering process.

The only problem is that formal maths notation can’t just be compiled and executed like source code, and source code is far too verbose (and lacking in variety of symbols) to give you a decent view of the maths.

Fortunately, Bayesian networks provide a kind of high-level design notation; perhaps the UML of probability analysis. Mine look like some sort of demented public transport system. However, drawing them in LaTeX using TikZ/PGF gives me a warm fuzzy feeling.

What am I doing?

Over the past few weeks I’ve had numerous questions of the form: “how’s your work going?” I find I can only ever answer this with banalities like “good” or “meh”.

It’s not that I don’t know what I’m doing. At any given point in time, I have a list of minor challenges written up on the whiteboard (which accumulate with monotonous regularity). However, my first problem is that I never remember what these are when I’m not actually working on them. I write them down so that I don’t have to remember, of course.

My second problem is that, even if I did remember what I was supposed to be doing, there just isn’t any short explanation. Currently I have on the whiteboard such startling conversation pieces as “Express CI in terms of S and U”. This may or may not tickle your curiosity (depending on how much of a nerd you are), but explaining what it means – and granted, I’ll have to do that eventually anyway – demands as much mental energy as solving the problem itself.

My third problem is  that I regularly shuffle around the meaning of the letters, to ensure I don’t run out of them and also to resolve inconsistencies. I’m currently using the entire English alphabet in my equations and a large proportion of the Greek one, so naming variables is a minor headache in itself. For instance, since I wrote the todo item “Express CI in terms of S and U”, I’ve decided to rename the variable “CI” to “CS“. Also, “S” used to be “T”, and “U” used to be two separate variables. This is mostly cosmetic, but I recoil at the prospect of explaining something so obviously in flux.

I choose to believe that I’ll be able to explain everything once I’ve written my thesis… and hopefully as I’m writing my thesis.

Science fail

Apparently one of the world’s foremost experts on global warming – as far as the denialist camp is concerned – is Viscount Monckton of Brenchley. The sum total of his qualifications appear to be his propensity to comment on the subject. A google search turned up the Heartland Institute’s take on Monckton.

Observe the ad on the left of the page: “Why Does Gore Refuse To Debate His Critics? CLIMATE CHANGE IS NOT A CRISIS”. It looks like something straight out of a political campaign, which ought to be enough to toss it aside without further contemplation. But let’s contemplate for a second. The ad shows Al Gore’s face above four people who – we presume – are “his critics” (one of whom is our esteemed Viscount Monckton). How much tomfoolery can you squeeze into something so small?

  1. The one-versus-four theme makes Al Gore look like he’s on his own, which couldn’t be further from the truth.
  2. The ad conjures up images of public debates of the sort that have nothing to do with science. One does not resolve anything, least of all matters of scientific enquiry and public policy, by having proponents of each view point stand up on a stage and hurl sound bites at each other.
  3. If anyone did need to be involved in a debate, it would be the hundreds of scientists who contribute to the IPCC’s reports, not Al Gore, who is after all just the messenger.

Science. We’ve heard of it.

Artificial intelligence

A thought occurs, spurred on by my use of Bayesian networks. They’re used in AI (so I’m led to believe), though I’m using them to model the comprehension process in humans. However, I do also work in a building filled with other people applying AI techniques.

My question is this: how long until Sarah Connor arrives and blows up level 4? And if she doesn’t, does that mean that the machines have already won? Or does it simply mean that we’re all horrible failures and that nothing will ever come of AI?

A good friend (you know who you are) is working with and discovering things about ferrofluids. In my naivety, I now find myself wondering if you could incorporate some kind of neural structure into it, and get it to reform itself at will…

Theoretical frameworks, part 3

The first and second instalments of this saga discussed the thinking and writing processes. However, I also need to fess up to reality and do some measuring.

A theoretical framework is not a theory. The point of a theoretical framework is to frame theories – to provide all the concepts and variables that a theory might then make predictions about. (If I were a physicist these might be things like light and mass). You can test whether a theory is right or wrong by comparing its predictions to reality. You can’t do that for theoretical frameworks, because there are no predictions, only concepts and variables. The best you can do is determine whether those concepts and variables are useful. This really means you have to demonstrate some sort of use.

And so it falls to me to prove that there’s a point to all my cogitations, and to do so I need data. In fact, I need quite complex data, and in deference to approaching deadlines and my somewhat fatigued brain, I need someone else’s quite complex data.

The truth is – I’m probably not going to get it; at least, not all of it.  Ideally, I need data on:

  • the length of time programmers take to assimilate specific pieces of knowledge about a piece of software;
  • the specific types of knowledge required to assimilate other specific types of knowledge;
  • the probability that programmers will succeed in understanding something, including the probability that they find a defect;
  • the probability that a given software defect will be judged sufficiently important to correct;
  • the precise consequences, in terms of subsequent defect removal efforts, of leaving a defect uncorrected;
  • the cost to the end user of a given software defect;
  • the propensity of programmers to find higher-cost defects; and
  • the total number of defects present in a piece of software in the first place.

I also need each of these broken down according to some classification scheme for knowledge and software defects. I also need not just ranges of values but entire probability distributions. Such is the pain of a theoretical framework that attempts to connect rudimentary cognitive psychology to economics via software engineering.

With luck, I may be able to stitch together enough different sources of data to create a usable data set. I hope to demonstrate usefulness by using this data to make recommendations about how best to find defects in software.

Theoretical frameworks

One of the chapters of my much-delayed thesis describes (or rather will describe) a theoretical framework, which is academic-speak for “a way of understanding stuff” in a given field. In my case, stuff = software inspections, and my way of understanding them is a mixture of abstractions of abstractions of abstractions and some slightly crazy maths, just to give it that extra bit of abstractedness that seemed to be lacking.

It’s very easy when engaged in abstract theorising to forget what it is you’re actually modelling. All those boxes and lines look positively elegant on a whiteboard, but when you come to describe what the concepts represent and how someone would actually use it, things frequently go a bit pear-shaped. The problem, as far as I’ve been able to tell, is the limited short-term memory available for all this mental tinkering. What you need is to keep the concrete and the abstract in your head simultaneously, but this is easier said than done (especially if one’s head is full of concrete to begin with). When the abstract gets very abstract and there’s lots of it, the real-world stuff slips quietly out of your consciousness without telling you.

Sometimes it’s only a small thing that gets you. Sometimes you realise that it all mostly makes sense, if only this box was called something else. Then there are times when you finish your sketch with a dramatic flourish, try to find some way of describing the point of the whole thing, and shortly after sit back in an embarrassing silence.

My latest accomplishment, or perhaps crime against reason, is the introduction of integrals into my slightly crazy maths (already liberally strewn with capital sigmas). An integral, for the uninitiated, looks a bit like an S, but rather pronounced “dear god, no”. You can think of it as the sum of an infinite number of infinitely small things, which of course is impossible. However, it does allow my theoretical framework to abstract… no, nevermind.

Ponderings of sanity

There are many things to be said about debating in online forums. One, that you learn early on, is that it doesn’t take much effort to find the fruitcakes. It really doesn’t. The people who firmly believe that the World Trade Centre was brought down by explosives, as evidenced by the “indisputable fact” that it “fell faster than gravity”, because just look at that YouTube video. The people who believe you’re going to hell not just because you don’t believe in God, but because you haven’t performed the 54-day version of the “Rosary Novena” (a type of prayer) and that TV shows made since the 1960s are so unforgivably immoral that they must be the work of Satan Himself. The people who equate taxation with slavery and socialism with atheism. The people who believe that oil is not derived from ancient organic matter but instead is simply “produced” by the Earth’s core. The people who proudly challenge you to disprove their three-paragraph thesis on why the entirety of science on evolution and cosmology is flat-wrong and the literal Biblical account is the only possible alternative.

One person I encountered had a pet theory on the nature of photons (particles of light): that each in fact comprises an electron and a positron in orbit around each other. Facts, such as the one where photons have no mass, unlike electrons and positrons, do not pose a hindrance to such theories, I’ve discovered. The idea, more generally, that experts in the field have been looking into this sort of thing for quite some time, publishing multitudes of peer-reviewed journal articles along the way, is of little concern.

Not that I’d wish to put you off online debating, but as you’re encountering these varied and interesting specimens, you’re bound to pick up a few insults, depending on what fascinating theory you’re being unreasonably sceptical of. As a change of pace from the usual names I get called – leftist, liberal, socialist, atheist (which at least is true), materialist or totalitarian – I’ve recently been called a “Bushbot”. This is an interesting and somewhat disturbing thought, considering some of the stuff that’s popped up in my George Bush “Out of Office Countdown” off-the-wall calendar.

Not even Bush though can match some of the wisdom of the Internet, which I’ve decided to share with you:

“In addition, the Earth is continually producing oil, because “Peak oil” was a carefully crafted myth. Oil does not come from dead dinosaurs as you skulls full of mush have been brainwashed to believe.”

“Scientists are usually the last to know about anything”

“A price chart is how I make my living….It represents truth.”

“A truth to point, all the Atheists I know have no children and it is always due to thier Atheistic mental state as compared to normal (spiritual) people. I know 7 Atheists; three couples. Sure many Atheists do produce children but certainly a large number possessing the Atheistic mind, refuse and will therefore generally NOT pass on either their genetic or social make up to the younger generations.”

“The constant social and technological progress resulting from the constant advancement of the metaphysical mind set means that we now have societies full of people, some of whom now can survive to adulthood with all alorts of personal shortcomings. This obviously includes Atheists.”

So now you know.

When statistics attack

I swear stats is trying to kill me. I’ve redesigned my experiment so that it’s a nice elegant “two-factor repeated measures” flavour. I won’t trouble you with exactly what that means, or exactly what the nine separate hypotheses I’m testing are. What I will trouble you with, for it’s certainly been troubling me, is this:

To analyse the data I will collect I need to use a stats test, which broadly speaking is a factory that converts numbers into truth (or lies if you’re not careful).

Jim, Mr Stats, has a stats handbook that tells you how to do this. It has a nifty little flowchart at the beginning that you can trace through to work out which of the several dozen different kinds of stats tests you need to use. Easy enough, I think to myself as my fingers follow the little arrows across the page. And where do I end up? At a little box that states helpfully: “It may be possible to devise an ad hoc statistical test for the design under consideration.”

That’s right – with my new, improved, elegant design, the Oracle of Statistics reckons it may be possible, with not so much as a hint as to how one might actually go about it.

Not to be defeated, however, I turn to the Oracle of Everything – Google – with which I stumble upon something called Factorial Logistic Regression. I certainly won’t trouble you with what this means, not because I don’t want to but because I currently have no idea myself. Neither of my two supervisors – one of whom is Jim Himself – does either.

My only hope appears to lie in a library book entitled Regression Modeling Strategies. So the campaign continues…

How does this experiment work?

Statistics. It all seems to easy until you have to do it.

No worries Dave, I confidently assured myself as I fitted the last details of my delicate experimental design into place, all set to be unleashed on as many undergraduates as I had chocolate to bribe. Now all I have to do is plug in the stats terminology and I’ll… oh shit.

As a research student, you know you’re onto a “winning” idea when you have to write a Python script just to work out what factors your experiment is testing. Somewhat like realising, after you’ve found the ultimate answer to life, the universe and everything, that you didn’t know what the question was.