Teaching SE: code as design

Software engineering lecturers have some misconceptions to grapple with — students’ certainly, but also our own.

One is this: we have tried to carve out an unambiguous1 distinction between software design and software implementation. In a previous post, I discussed a 2nd-year unit that historically focused on the Unified Modelling Language (UML). This unit purported to teach software design, to the exclusion of the “implementation” of that design. The unit content consisted almost entirely of diagrammatic notations (parts of UML) that, we assumed when they were first introduced, would stand on their own as the language of software design. The unit mentioned almost nothing about code, because we considered code to be a low-level implementation detail already covered in other units.

“UML is how you will write software”, some of us thought (to varying extents). “What comes after is just a labourer’s job.”

We2 wanted to believe this, probably because it seemed to imply that SE was maturing, that it was starting to develop efficiencies that would make current practice look like dark-age superstition. The notion of having separate “design” and “implementation” phases has been taken from older engineering disciplines (to some extent), in which a physical object is first designed, then constructed. That strict progression seems obvious. Design is the intellectual effort — the problem solving — while construction is physical creation of what you’ve designed.

Our UML fixation also owes something to the fact that diagrams play such an important role in older engineering disciplines. How would you construct a plane or a building without any diagrams to work from? You could do it, by describing measurements using only words and equations, but it would be painfully inefficient and error-prone. Why would you not want a picture of the thing you’re about to create? For software engineering, UML seems to slot naturally into this role.

We also noticed that software development requires you to think at different levels of abstraction. Initially, the work is highly abstract, big-picture stuff: whiteboard sketches and high level specifications, both of which have often been done with UML (or, historically, other diagrammatic notations). Work then progresses to more concrete and finely detailed stuff: code. There seems to be an obvious distinction here: a design phase, complete with diagrams, and a construction phase.

But this was always a conceptual mistake. Both activities are design, and there isn’t a construction phase at all (at least, not one that has any significant cost). The big-picture diagrams and the finely-detailed code are both purely intellectual; they’re both part of a single problem-solving exercise — one big design phase. The problem solving doesn’t stop until the coding is finished, and it often goes back on itself, with some whiteboarding required in the midst of coding. What would you think of the engineering of aircraft, buildings, etc. if you knew that the engineers routinely conducted critical redesign work half-way through physical construction? It’s not so much that software engineering is radically different3, but that we’ve been misusing the terms.

Real-world SE uses the term “building” or “build process” to refer to a set of automatable steps — compiling, linking, packaging, running unit tests — that turn the code into a finished product4. SE education forgets this a bit, but it’s a much closer analogue of the construction of a physical object than is the concept of the software “implementation”.

Once you accept that coding is, in fact, fundamentally a design activity, you realise that you cannot really teach software design without coding. If you take the coding out of software design, you’re really just left with an empty shell.

Our UML unit suffered because students were unable to connect UML’s diagrammatic notations to the things software actually has to do. Buildings and planes have schematics that broadly look like the physical object; there is an intuitive spatial relationship. But software has no tangible existence. A software diagram has nothing to do with what the software looks like, because software doesn’t look like anything. Software diagrams are no more than an aid to problem-solving. And how could we expect students to undertake problem-solving — design — without access to the one notation — code — in which the solution must be written?

Ultimately, code is everything. UML diagrams are merely subsets of the information present in code. They are abstract representations that highlight a few key details at the expense of many others. This can be very useful, but only as a way of organising your understanding of the code. Almost perversely, UML shows the big picture but never the whole picture. Experienced software engineers will understand the kinds of information omitted from a UML diagram, and thus what work remains to be done. But without code, students will look upon the diagram itself as the thing they need to master, without seeing it as merely a view of something more complicated.

Last year, alongside introducing patterns, I gave students a non-trivial amount of coding to do. (In hindsight, I probably gave them too much, to the point that it became a significant drain on their time, but such things can be recalibrated.)

In the weekly practical sessions, students formed into groups of three (or so). Over the course of the semester5, they worked on developing three pieces of software: an image viewer, library catalogue and a blog editor. Each system would be developed in a different language: Java, C++ or Python6, with the students choosing which language to assign to which system. Code, of course, is not just one notation, but many. The use of multiple languages was intended to show that software patterns (and other design concepts) apply to different languages and in slightly different ways.

Each week, I gave the students a few extra requirements for each system, drawing from the topics covered in that week’s lecture. This was all non-assessable — just preparation for the tests, assignment and exam.

A further twist was that students rotated between the different projects within their group, so that one group member would work on the image viewer one week, the library catalogue the next week, then the blog editor, then return to the image viewer, and so on. This would expose each student to three different languages, and also to the challenges of working with other people’s code. In practice, students became frustrated at the inability of their colleagues to actually finish their assigned tasks. That suggests I have some more thinking of my own to do, but I hope that these frustrations themselves served as a learning experience.

The unit still includes UML as a way of organising broad concepts. In fact, the introduction of code makes it easier to talk about UML. Students and I can use it to focus on the design concepts we need to convey, without getting too bogged down in superficial syntax or arcane rules.

By giving students the opportunity to write code, we give them all the information and the complexity that software design actually entails. Armed with that insight, they are in a better position to understand what design diagrams are for, and they can use them more effectively.

  1. Well, sort of. We occasionally pay lip service to the concept of an overlap. []
  2. Almost certainly not everyone believed this. Those who didn’t probably have some entitlement to point and laugh. []
  3. Software engineering is different — all engineering disciplines are different from each other, in terms of the methods and tools used — but we need not manufacture more differences than actually exist. []
  4. Or, rather, a potentially working product. The software build process is exceptionally cheap, because it’s automatable and has no material costs, so it’s done early and often. []
  5. In practice, mostly in the first half of semester, due to mounting stress levels and triaging of study effort. []
  6. There are many possible choices, of course, but I sought languages that were widely used and represented a diversity of approaches, but which basically adhered to the traditional OO paradigm. Java is our baseline teaching language (giving it the edge over C#). C++ stands out because of its pervasiveness and unique (if somewhat horrible) challenges. Python is a good representative of the dynamically-typed languages. []

Teaching SE: from UML to Patterns

In teaching software engineering at Curtin Uni, we have long had a 2nd-year unit that dealt principally with the grandiosely-named Unified Modelling Language (UML), that diagrammatic language that promised to be software’s answer to the technical drawings of other engineering disciplines. I recall it as a student, when the various UML notations (each one allowing you to articulate a particular aspect of the software you’re designing) were described at a rate of approximately one per lecture (with one lecture per week). At the time, not knowing any different, this seemed a logical approach to useful design material.

As a mathematical system, UML draws you in. Pure abstract curiosity drives you to understand what its constructs are for and how they relate to one another. It has a kind of internal elegance, if you know where to look, where a few core concepts can be used to articulate systems of great complexity with great subtlety and flexibility. Its creators made mammoth efforts to anticipate all the situations they wanted to be able model with a diagrammatic (or semi-diagrammatic) language, and provided the ultimate multi-purpose tool for the job.

UML positioned itself throughout the “software life cycle” (another curious term) but particularly in the design phase. It was supposed to be the vocabulary in which you expressed the bulk of your design work. You crafted UML diagrams from a requirements specification, then, once the UML “model” was complete, it became a blueprint with which you could hammer out the code in trivial time.

There are probably quite a few ways to explain why (in many cases or even most cases) this doesn’t really work. UML has a role in SE, but not one quite as celebrated as its evangelists might have thought. It is a vocabulary, one used to communicate complex, abstract ideas quickly. But not completely.

UML serves a useful role as an informal language, not a specification language, and the difference is completeness. UML can be complete — that was the whole idea according to some people — but practically speaking you don’t usually want it to be so. At the day-to-day level of human communication, you don’t need to be complete — you merely say as much as you need to say before the other person “gets it”. We trade off completeness for efficiency. When you want someone to understand what you’re thinking, you estimate the differences between your understanding and their understanding, and you communicate only those differences, not the sum total of your thoughts. In UML, you would draw only those entities and relationships that are relevant to the immediate issue at hand. (We take a very different approach when writing code and specifications, in which all details must be carefully articulated.)

In the second half of 2012, 11 years after I took the UML unit as a student, I was asked to teach it. The unit had evolved slightly in the way that units do, but the approach was broadly the same. Over the years, it had occasionally attracted the nickname “Diagrams 252”. People knew that the emphasis of the unit was misplaced. Now that I was on the other side of the educational fence, I came to see the problem in our teaching approach — not just in what we were telling students about UML, but in what we were failing to tell them about software design more generally.

UML is a vocabulary, formal or informal, but so what? Software design isn’t about syntax. It’s about ideas. What should you actually do with all those notations? Where was the actual design? Where was the engineering?

In many an SE textbook, there is no shortage of advice on what you shouldn’t do in software engineering. There are rules about how not to use inheritance, what language constructs you should avoid at all costs, what kinds of coupling are unacceptable, etc. There is usually not a great deal of advice about what you should or even can do.

Exposing students only to a set of rules and notations, as we did previously, is really just fiddling around the edges of SE education. The knowledge that creates a competent software engineer is not UML or design checklists. Rather, a competent software engineer first and foremost needs a mental toolbox of design options — a large collection of adaptable solutions to small problems, stored in one’s brain. These are not memorised in the way that students often try to memorise bullet point lists. They are not catalogued in a neat little taxonomy as might appear in a traditional textbook. They are haphazard fragments of possible design approaches — the result of seeing how design could be done in various ways. The technical term from cognitive science, as I understand it, is “plan”. Plans are building blocks for the creative and investigative processes.

One can acquire such a mental toolbox through sheer brutal personal experience — that’s how they did it in the old days, of course. But education is supposed to short-circuit that, or else what’s the point?

In teaching the unit for a second time in 2013, I made some significant changes, one of which was to teach software design patterns1. It’s easy to make the argument for teaching patterns, because they have that aura of academic respectability that accompanies things Published by Reputable Authors. Patterns, of course, are broadly defined as reusable solutions to common problems. They are implemented independently by different engineers in different contexts, and later discovered for what they are. Familiar?

The logic here is more convoluted than it first appears. A pattern is not a plan — not exactly. Each pattern is a named phenomenon (the “Strategy Pattern”, “Observer Pattern”, “Composite Pattern”, etc.), analysed, documented with UML and catalogued. The widely-known seminal list of software patterns was laid down in a book by the “Gang of Four”. So, for educational purposes, patterns sound a lot like a giant list of bullet points that students would simply try to memorise.

But patterns are important for educational purposes because (unlike UML itself or the usual what-not-to-do list of SE design rules) they show you what you can do. Patterns are solutions known to work well, assuming you understand what each one is for. Moreover, it has been my hope that teaching patterns also reveals some of the more fundamental ideas about software design. That is, even if you can’t remember a single actual pattern after taking the unit, you might nonetheless remember some interesting things you can do with polymorphism. Hopefully some knowledge underlying those patterns will have rubbed off, and given you the beginnings of that mental toolbox.

  1. There had always been a single lecture on patterns anyway, but it didn’t cover much material. []