Teaching SE: code as design

Software engineering lecturers have some misconceptions to grapple with — students’ certainly, but also our own.

One is this: we have tried to carve out an unambiguous1 distinction between software design and software implementation. In a previous post, I discussed a 2nd-year unit that historically focused on the Unified Modelling Language (UML). This unit purported to teach software design, to the exclusion of the “implementation” of that design. The unit content consisted almost entirely of diagrammatic notations (parts of UML) that, we assumed when they were first introduced, would stand on their own as the language of software design. The unit mentioned almost nothing about code, because we considered code to be a low-level implementation detail already covered in other units.

“UML is how you will write software”, some of us thought (to varying extents). “What comes after is just a labourer’s job.”

We2 wanted to believe this, probably because it seemed to imply that SE was maturing, that it was starting to develop efficiencies that would make current practice look like dark-age superstition. The notion of having separate “design” and “implementation” phases has been taken from older engineering disciplines (to some extent), in which a physical object is first designed, then constructed. That strict progression seems obvious. Design is the intellectual effort — the problem solving — while construction is physical creation of what you’ve designed.

Our UML fixation also owes something to the fact that diagrams play such an important role in older engineering disciplines. How would you construct a plane or a building without any diagrams to work from? You could do it, by describing measurements using only words and equations, but it would be painfully inefficient and error-prone. Why would you not want a picture of the thing you’re about to create? For software engineering, UML seems to slot naturally into this role.

We also noticed that software development requires you to think at different levels of abstraction. Initially, the work is highly abstract, big-picture stuff: whiteboard sketches and high level specifications, both of which have often been done with UML (or, historically, other diagrammatic notations). Work then progresses to more concrete and finely detailed stuff: code. There seems to be an obvious distinction here: a design phase, complete with diagrams, and a construction phase.

But this was always a conceptual mistake. Both activities are design, and there isn’t a construction phase at all (at least, not one that has any significant cost). The big-picture diagrams and the finely-detailed code are both purely intellectual; they’re both part of a single problem-solving exercise — one big design phase. The problem solving doesn’t stop until the coding is finished, and it often goes back on itself, with some whiteboarding required in the midst of coding. What would you think of the engineering of aircraft, buildings, etc. if you knew that the engineers routinely conducted critical redesign work half-way through physical construction? It’s not so much that software engineering is radically different3, but that we’ve been misusing the terms.

Real-world SE uses the term “building” or “build process” to refer to a set of automatable steps — compiling, linking, packaging, running unit tests — that turn the code into a finished product4. SE education forgets this a bit, but it’s a much closer analogue of the construction of a physical object than is the concept of the software “implementation”.

Once you accept that coding is, in fact, fundamentally a design activity, you realise that you cannot really teach software design without coding. If you take the coding out of software design, you’re really just left with an empty shell.

Our UML unit suffered because students were unable to connect UML’s diagrammatic notations to the things software actually has to do. Buildings and planes have schematics that broadly look like the physical object; there is an intuitive spatial relationship. But software has no tangible existence. A software diagram has nothing to do with what the software looks like, because software doesn’t look like anything. Software diagrams are no more than an aid to problem-solving. And how could we expect students to undertake problem-solving — design — without access to the one notation — code — in which the solution must be written?

Ultimately, code is everything. UML diagrams are merely subsets of the information present in code. They are abstract representations that highlight a few key details at the expense of many others. This can be very useful, but only as a way of organising your understanding of the code. Almost perversely, UML shows the big picture but never the whole picture. Experienced software engineers will understand the kinds of information omitted from a UML diagram, and thus what work remains to be done. But without code, students will look upon the diagram itself as the thing they need to master, without seeing it as merely a view of something more complicated.

Last year, alongside introducing patterns, I gave students a non-trivial amount of coding to do. (In hindsight, I probably gave them too much, to the point that it became a significant drain on their time, but such things can be recalibrated.)

In the weekly practical sessions, students formed into groups of three (or so). Over the course of the semester5, they worked on developing three pieces of software: an image viewer, library catalogue and a blog editor. Each system would be developed in a different language: Java, C++ or Python6, with the students choosing which language to assign to which system. Code, of course, is not just one notation, but many. The use of multiple languages was intended to show that software patterns (and other design concepts) apply to different languages and in slightly different ways.

Each week, I gave the students a few extra requirements for each system, drawing from the topics covered in that week’s lecture. This was all non-assessable — just preparation for the tests, assignment and exam.

A further twist was that students rotated between the different projects within their group, so that one group member would work on the image viewer one week, the library catalogue the next week, then the blog editor, then return to the image viewer, and so on. This would expose each student to three different languages, and also to the challenges of working with other people’s code. In practice, students became frustrated at the inability of their colleagues to actually finish their assigned tasks. That suggests I have some more thinking of my own to do, but I hope that these frustrations themselves served as a learning experience.

The unit still includes UML as a way of organising broad concepts. In fact, the introduction of code makes it easier to talk about UML. Students and I can use it to focus on the design concepts we need to convey, without getting too bogged down in superficial syntax or arcane rules.

By giving students the opportunity to write code, we give them all the information and the complexity that software design actually entails. Armed with that insight, they are in a better position to understand what design diagrams are for, and they can use them more effectively.

  1. Well, sort of. We occasionally pay lip service to the concept of an overlap. []
  2. Almost certainly not everyone believed this. Those who didn’t probably have some entitlement to point and laugh. []
  3. Software engineering is different — all engineering disciplines are different from each other, in terms of the methods and tools used — but we need not manufacture more differences than actually exist. []
  4. Or, rather, a potentially working product. The software build process is exceptionally cheap, because it’s automatable and has no material costs, so it’s done early and often. []
  5. In practice, mostly in the first half of semester, due to mounting stress levels and triaging of study effort. []
  6. There are many possible choices, of course, but I sought languages that were widely used and represented a diversity of approaches, but which basically adhered to the traditional OO paradigm. Java is our baseline teaching language (giving it the edge over C#). C++ stands out because of its pervasiveness and unique (if somewhat horrible) challenges. Python is a good representative of the dynamically-typed languages. []