This article is the first part of a two-part series on dealing with technical debt. The second part is here.
Our modern world runs on an enormous amount of technology. A lot of it is older than people realize. This technology is mostly invisible when it is working, but when it isn't it can cause real problems. When it fails, it is only natural for people to ask questions and try to understand why things went wrong. Even in a best-case scenario where nobody is being blamed or hurt, the technology generally needs to be fixed, and understanding how it failed is important.
Most computer technology is quite opaque to non-specialists. Because of this, it is easy to be distracted by the most visible aspects of the technology. As we will show, one of the most commonly blamed parts is the programming language used to give the computer instructions, especially if it is an old but common programming language like COBOL. In reality this is about as useful as blaming car problems on the color of the paint. The real problem is, and always has been, the fact that the failed systems hadn't been maintained properly.
Our goal in this article is two-fold. First we wish to convince you that COBOL code failing in the modern world is a symptom of insufficient maintenance, not a cause of failure. In other words, if a system you are using or are maintaining fails, even if COBOL is used somewhere inside it, that fact is mostly irrelevant. Secondly, we will describe the concepts of refactoring and technical debt, which give a far more useful explanation for why complex technical systems fail. In a follow-up article, we will describe ways governments and businesses can manage refactoring and technical debt.
The New Jersey nightmare
Imagine that you have just been laid off. Not only that, but you were laid off because a brand new disease was spreading across the world, killing people and overwhelming hospitals. Nobody seems to have been prepared for this, and it feels kind of like the world is falling apart. Despite all this, there is some hope: government officials are saying that you can still apply for unemployment benefits. In fact, more people can apply for unemployment benefits than ever before. It'll at least be something, right? How hard can it be?
Really hard, even in the best of times, it turns out. The system is kind of complicated. There are all kinds of deadlines, and even in the best case, it'll take a while to get the money. You try to go through the process and…
...and the unemployment website is down. The unemployment phone lines are down. You can't hit those all-important deadlines. You can't even get instructions on how to do it. And the people in charge don't seem to know what is going on.
This is the nightmare that hit the people of New Jersey during late April and May of 2020. Many businesses across the world laid off record numbers of employees because of the COVID-19 pandemic. According to the Labor Department for New Jersey, this caused the number of unemployment claims they received to balloon a massive 1600%. The state unemployment website began to suffer outages in April, continuing into May. Similar problems were reported to have happened in other states, although most were not as widely reported as in New Jersey.
After the first outages, the Governor of New Jersey, Phil Murphy, promised to get to the root of the problem. The next day, the state was looking for volunteers who could program in a sixty-year-old programming language called COBOL (an acronym for COmmon Business-Oriented Language); some of the systems that had gone down were apparently written in COBOL, and they needed immediate help fixing those systems. This is the part of the story that made headlines across the country. New York Magazine, for instance, had an article snidely titled "NJ Governor Requests Expertise of 6 People Who Still Know COBOL." Slate magazine's commentary had the somewhat more factual headline "Why New Jersey’s Unemployment Insurance System Uses a 60-Year-Old Programming Language." Business Insider went with the rather more educational headline "The governor of New Jersey is asking for urgent help with COBOL, a 61-year-old programming language. Here's why it's causing problems with unemployment systems, and why it's so hard to replace."
As we mentioned in the introduction, most of these articles really should have had different headlines. Their mistake was in referencing COBOL at all 1. Their failures are as natural and predictable as having your car's engine fail after years of never changing the oil. Unlike with cars, though, the knowledge of how to maintain complex computer systems has never really spread to non-specialists, and so these kinds of failures continue to take businesses and governments by surprise. Continuing our analogy above, they either forgot to install an oil service light, or decided to ignore it, generally for years.
- We're not blaming the reporters for this, for two reasons. First, if you look past the headlines, you'll get more of these details; they tried. Second, many inexperienced programmers (often loudly, on the internet) make the same mistake, so who can blame the reporters for believing them?
Why COBOL isn't the real problem
Once you know how to program, learning programming languages is easy. Understanding existing programs is hard. Maintaining existing programs (in the form of what we call refactoring) is the most difficult part, by far.
If you need to change old programs, hiring experienced programmers and teaching them COBOL is the cheap part. Getting even an experienced programmer up to speed on the existing program/system enough to actually fix it? That's much more expensive. Actually providing the programmers the time and money to make sure the fixes stick when things need to change again? That's the real cost, right there.
What is COBOL and why is it important?
COBOL, as its acronym suggests, is what is known as a programming language. A programming language is a special kind of language that both humans and computers understand. In many ways, programming languages are like so-called natural languages, i.e. languages that humans use to communicate with each other, such as English or Standard Chinese.
COBOL is one of the oldest programming languages around, with the original version being created in the early 1960's. To make a long story short, it was deliberately created and promoted by the United States of America's Department of Defense as a standard programming language that could be used across many different types of systems. With their support, it rapidly spread both to other government agencies and to many types of businesses, and is still used heavily by both.
A set of instructions, written in a programming language, telling a computer how to do specific things, are called a program. Programs written in COBOL are currently being used to control such important things as ATM transactions, inventory management, government data processing, and much more. It is a part of our modern life, invisible but always there. As the world changes, the system's behaviours need to be updated to keep up with new demands and other changes. Because of this, we still need humans who understand COBOL well enough to update the computers' instructions.
Programming languages are easy to learn
A fact that I suspect will not surprise my technical readers, but will surprise many others, is that programming languages are really easy to learn. After some serious study, the authors have decided that COBOL is no exception to this rule 2 (it has some striking oddities compared to more standard programming languages, see below, but the strangeness doesn't increase the difficulty enough to matter). With modern resources such as the internet and proper reference books, using one of COBOL's many variants is entirely manageable for an experienced programmer; they will get up to speed on it quickly (although we do not recommend it for beginning programmers; it is too easy to learn bad programming habits from dealing with the language's limitations).
For evidence of how easy it is to learn programming languages, one of the authors of the current paper has, over the course of his life, learned 18 distinct programming languages. His father, a physics professor, claims to have used 20 and 30 programming languages. A quick poll of engineers at Lucid Software showed that this was not unique, with all of the engineers having used at least two programming languages, and the majority having used somewhere between 5 to 20. What's more, the longer the participant had been working with computers, the more programming languages they had learned, well into the later stages of their career.
The participants in the survey also commented on their experience learning these programming languages. The consensus was that once you have learned to actually program, learning the languages themselves are the easiest part of the whole process. In fact, programming languages build on each other: the more you learn, the easier learning new languages is. Sometimes learning a new programming language also requires learning new programming techniques, and learning these techniques can be difficult. However, it is the programming techniques, not the language, that are the real barrier.
If you compare computer languages with natural languages, it's pretty easy to see why computer languages are much easier to learn than natural ones.. Your typical natural language speaker knows something like 20,000 words 3 from their native language. Programming languages are really small—a typical programming language has somewhere between 20 to 50 "keywords" along with a few hundred extra words from what are usually called "standard libraries" 4. Even including special symbols (e.g. `, $, &, and so on) and mathematics, programming languages really are very small compared to natural languages.
Of course, words aren't everything. There's the matter of grammar, of how you can put words together to say something useful. In the case of programming languages, learning the grammar really is rather hard. Computers aren't like people; if things are wrong in a program, the computer will either reject what you say as nonsense, or—much worse—interpret it the wrong way. In order to program, you have to learn special kinds of skills, such as how to think very slowly and carefully, how to solve complex problems in ways that other people can read later, and how to be detail-oriented. Acquiring these skills is why programming is challenging to learn. But once you have squeezed your mind into that strange space, you can see how all the languages use different words and variations to communicate the same concept. If you learn a range of techniques, build some small apps or websites, and practice, you’ll find all the languages (and even different grammars) overlap and blur together.
In our opinion, COBOL's most important differences from other programming languages are
- COBOL was designed to look more like English than most other programming languages (in a failed attempt to make it more readable to non-programmers), so it has a somewhat larger vocabulary than other programming languages. It's still nothing compared to natural languages, though (it maxes out at about 1000 words for some COBOL variants), especially for fluent English speakers, since most of COBOL's words are English loanwords.
- COBOL's grammar and structure is rather different from the most common families of programming languages, but not that different; most programmers will have learned far wilder syntax elsewhere 5.
One of the authors was able to make sense of the sample programs in reference 2 within a week, simply by studying said textbook. As we will show in the next section, the sheer size of the programs written in the language itself dwarf both of these concerns.
- This came from an extensive study of ex. [1], "Beginning COBOL for Programmers."
- This can be hard to define and measure; what is a word, exactly? (See ex. [2] for a discussion). Luckily for us, the size difference between programming and natural languages is so big, the details don't matter much.
- Especially nowadays, popular languages usually have very large standard libraries and a lot of "non-standard" libraries. The point here is that you don't have to learn all the extra words in order to "know the language". Just like most people only look up unfamiliar words when they run into them, programmers refer to documentation with the more specialized parts of the language. the details don't matter much.
- Regular expressions come to mind. So do HTML, CSS, and the many, many formats used for build systems. There are so many unusual domain-specific languages.
Programs are hard to learn
Contrast the size of programming languages with the size of the actual programs themselves.
Programs can be truly enormous. According to one source (ex. [1]), some single COBOL programs were upwards of 100,000 lines; in all likelihood, some of those are still around today. Even if they were split into smaller files at some point, most of those lines will still be somewhere. To get a feeling for just how long this is, if a 100,000 line program averaged 6 words per line, it would be longer than the novel "War and Peace." Although in its defense, the programs' plot would be more engaging 6.
There are extra problems with reading large programs, beyond their simple length. The number of programmer-defined words in a program can be, and very often is, larger than all the words in the original programming language itself. When writing a large program, the programmers will have been forced to invent thousands of all-new, program-specific words (in programming terms, variable names 7) to describe their particular problem. While these names are more often useful than not in understanding what the program is doing, not-useful names are still far too common 8. Even if they are individually easy to understand, the very number of words means that programming language-specific knowledge helps very little.
These kinds of programs don't live alone, either; they are part of bigger systems. To fully understand them, you not only have to read a novel-lengths’ worth of code, you have to understand why the program was written and what the original programmers were dealing with when they wrote it. For another literary analogy, most large programs read more like "Finnegan's Wake" than "War and Peace." As for written explanations of what it does and why, we should be so lucky.
It gets worse when you realize that despite our little joke above about the plot of "War and Peace," programs don't even have the decency to start at the beginning and go straight through to the end, telling a single story. Instead, they can go backwards and forwards and jump around in the strangest possible ways 9. It's like trying to read five different pages in "War and Peace" at the same time. This is particularly bad with old programs (written before we discovered that this was a serious problem) and with programs that have been heavily changed without the effort to keep them readable.
The upshot of all this is that in the long run, it doesn't matter whether you hire a room full of experienced COBOL programmers or a room full of experienced Java programmers to fix a large COBOL program; practically all of their work will go into understanding the actual program, rather than the language it was written in. The Java programmers would of course be somewhat slower at first, but they'd catch up quickly enough, and then begin running into the same walls as the COBOL programmers would: it's just too much to read and understand, especially if you don't know exactly what the program is doing (or what it's supposed to be doing).
- Just kidding. We've never read "War and Peace" in any form.
- Also, function names, class names, file names, and on and on. There's a lot of naming in programming.
- This is observed in every sizable codebase, old or new. Constantly fixing names is part of basic program maintenance.
- A truly remarkable amount of work on how to program well has been studying how to make a program both tell a single coherent story and still do everything it needs to do. Most published books written by humans for humans don't face this problem unless they actively chose to.
Refactoring: Maintaining programs is hardest of all
At this point the careful reader might notice that the real world seems to contradict our position. After all, programs written today are doing much more complicated things now than ever before. Forty years ago, most people interacted with computers through purely textual interfaces that were hard to learn and mostly acted like glorified printers (assuming they weren't actually printers). Nowadays, most people interact with computers through sophisticated graphical interfaces with smooth animation and deliberately broad appeal. The code needed to do something as common as opening a web browser is surprisingly complex when you look at it from a programmer's point of view. If our last two sections are true, how did we get this far?
The answer is that lots of smart people saw what happened when programs began to become too long. When they noticed it, they didn't just say "Oh no, I guess we've hit our limits." Instead they studied what people could do to avoid the problem. They found examples of large programs that became long but still were easy to change. They carefully studied how humans actually read and write programs. They brainstormed new ideas. They created and tested new programming languages and new programming techniques. The end result was that, on average, and over decades of hard work and experience, we have gotten better at writing programs.
In the big picture, these improvements are very good news. They have enabled a truly remarkable growth in what computers can do for us. The problem is that practically all of these techniques and methods require some very important things:
- A programming language that makes it easy for the programmers to write clean code.
- Strongly disciplined programmers. Good languages help, but bad programmers can make a mess in any language.
- The ability to extend programs with "general-purpose" code written by other people. That way, the programmers can focus on new problems, rather than solving old problems for the umpteenth time.
- What is known in programming circles as refactoring. Refactoring means that when a programmer goes to change a program, they don't make the smallest and easiest change that works. Instead, they work out the smallest change that is also readable and maintainable. It often results in bigger changes to the code but a better program. It's the difference between building a shed on the side of a house, and actually expanding the house's foundation to add a new room.
Of these, only item 3 has really become normal and easily available to modern businesses and governments 10. Items 1, 2, and 4 require employees with training, experience, and dedication to both quality and to the good of the employer. Perhaps most importantly, though, item 4 requires both commitment and understanding from the employer.
At this point, it makes sense to expand our analogy from item 4 above. You can imagine a small house that is well-built and safe and stable. That often describes how a program starts when it is first written by a single person. Then things change; you need a little bit more room. Do you expand the house (and the foundation) or do you build a cheap shed outside?
The temptation is to just build the cheap shed. Sometimes that's even the right choice. After all, it's easy and cheap and works fine. The problem is that you don't build a big house by starting with a small house and then bolting a bunch of sheds onto the side. The whole thing can end up as a pile of sheds. The original house may end up crushed under the weight of all the extra sheds. It's an unstable mess that sways worryingly in the lightest breeze.
In building a real house, at some point, you have to stop adding more sheds and actually expand the house. It's more expensive, but if you want it to not fall down under winds of more than 5 mph, you don't have much choice. If you do it right, you might even slowly turn your small house into a beautiful mansion, or office building, or skyscraper—far larger, stable, and more useful than it started.
Refactoring is easily the most important part of code maintenance. In the beginning it is always more expensive than doing the "easy" fix, but it is necessary if you'd rather not have to demolish the whole thing and start over. For programmers, it is easily the hardest thing to sell to their employers, because while employers can't see the whole ugly mess in all its terrible, terrible glory, they can see all the problems that come from taking an extra three months to do it correctly. On the other hand, it's perfectly understandable that managers have a hard time seeing it; experienced programmers often need serious time and effort to understand just how poorly a program is built.
And this at last is the core problem with improperly maintained systems. If you have a system that keeps changing, but only do the bare minimum to patch it every time a change is needed, you end up with a pile of sheds rather than a mansion. New employees will spend most of their time figuring out where the kitchen is 11, rather than actually adding new things. Fixing things becomes more and more expensive, but demolishing and rebuilding also becomes more expensive. It's a downward spiral that, like an unpaid loan, eventually comes due.
This last point is very important, because it introduces one of the most important concepts of proper software maintenance: technical debt. Technical debt is a term for, well, the money and time that organizations have to spend to fix the "too many sheds" problem described above. Just like normal debt, it can actually be useful if managed well (having a single shed is often the right choice, by itself), but also just like normal debt, it has a nasty habit of growing out of control.
- The biggest problem with COBOL as an actual language is that support for extending the language with other people's work has been harder than with most other languages. Explaining exactly why is beyond the scope of this paper, but even the languages' supporters generally admit that this is true. See ex. [1], chapter 16 for a discussion and history of this problem.
- It's split across sheds #34, #67 and #3245. That's because the regular oven was installed right at the start, but was moved a year later, the microwave was added five years later, and they finally bolted on the plumbing last year. Shed #3245 also doubles as the place where the kids keep the hamsters.
So what now?
In this article, we have discussed the fact that, despite the headlines, the programming language COBOL is not the main cause of system failures or problems with old and large codebases. Instead, it is the size and structure of the code itself that make old or poorly designed systems hard to maintain and liable to failure. Ultimately, it is a lack of basic maintenance that has driven up a kind of debt, known as technical debt, which makes programs and systems so hard to manage.
As for New Jersey's problems, it's hard to say what has happened. The local newspapers appear to have stopped reporting on people having trouble getting unemployment benefits, so hopefully that has cleared up. Whether the actual problem was even the COBOL code, rather than a newer but poorly designed part of the system, is unclear. It seems unlikely that New Jersey has paid down all of the involved technical debt, nor is it reasonable to expect them to have. That process could take years and will only happen if they are forced to do so by being held accountable for the system failures.
The problem of technical debt may seem overwhelming. While it is a large problem, it can be solved. In the second article in this series, we will discuss some important principles for measuring and paying down technical debt.
Sources:
- Coughlan, Michael. Beginning COBOL for Programmers. Apress, 2014
- Marc Brysbaert, Michaël Stevens, Paweł Mandera and Emmanuel Keuleers, How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age, https://www.frontiersin.org/articles/10.3389/fpsyg.2016.01116/full
- See, for example, estimates and links at https://github.com/leighmcculloch/keywords and https://stackoverflow.com/questions/4980766/reserved-keywords-count-by-programming-language.
About Lucid
Lucid Software is a pioneer and leader in visual collaboration dedicated to helping teams build the future. With its products—Lucidchart, Lucidspark, and Lucidscale—teams are supported from ideation to execution and are empowered to align around a shared vision, clarify complexity, and collaborate visually, no matter where they are. Lucid is proud to serve top businesses around the world, including customers such as Google, GE, and NBC Universal, and 99% of the Fortune 500. Lucid partners with industry leaders, including Google, Atlassian, and Microsoft. Since its founding, Lucid has received numerous awards for its products, business, and workplace culture. For more information, visit lucid.co.