Recently, software glitches have increasingly affected the average consumer – at airports, or with online banking. Often, we hear that the software wasn’t properly tested. But what does this mean exactly?
Every now and then, really spectacular software breakdowns occur. The opening of Heathrow Terminal 5 became a public embarrassment because the baggage system failed to function. More than 17 million customer accounts at RBS and its subsidiaries NatWest and Ulster Bank could be accessed for some or all of the day because the installation of customer management software corrupted the entire system. One of the biggest Austrian banks paid out €21 million to appease its customers with vouchers because the new online banking software didn’t work for days on end.
Errors like these are not only damaging to a company’s brand, but can also be very costly. The goal of software testing is to avoid such incidents and the consequences. On the following pages, we explore the topic of software testing and address these main questions:
- What’s the difference between today and yesterday?
- What must be tested?
- Who should do the testing?
We can assume that in the cases previously mentioned, the software in question was definitely tested: Banks and insurance companies know the risks of using software that has not been tested. So how can such malfunctions continue to occur? Some, but certainly not all, software glitches can be caused by storms and natural disasters. Still, this provides no explanation for the increase in software errors of late. Testing has always been done and it used to work well. And natural disasters are a known, if unpredictable, factor. So why should the tried-and-true formulas suddenly fail?
The reason is simple: Programs have become more complex. And to address this complexity, more testing is required. How much more? Take the years 2000 and 2010. In this time, the volume of data being moved around increased by a factor of 50,000. If a program was tested for two weeks in 2000, it would have to be tested for 100,000 weeks in 2010 – in other words, around two thousand years.
More interactions, not more data, increases complexity
Working and calculating this way is clearly not an option. After all, software is now more efficient, development tools allow many errors to be detected before the program is first created, and modern object-oriented software design enables developers to code neatly and in a less error-prone way. But even if testing is only increased by a factor of 50, it would still have to be tested for 100 weeks – or two years. That simply isn’t feasible.
Comparing the difference in size and quantity alone doesn’t necessarily mean that the software has become more complex. In fact, one of the main arguments for using a computer is that it doesn’t matter whether it has to perform a calculation five times or 5,000 times. It should simply be reliable. It is not the increase in the quantity of data that causes complexity, but rather the increase in possible connections and systems.
Look at the development of mobile telephony: In Germany, Radio Telephone Network C came along first in cumbersome cases, followed by the much more manageable digital cellular network D-Netz. In comparison, today’s smartphones have the processing power of mainframe computers from 20 years ago.
Apart from the pure advancement of technical data, think about all the things that can now be done with a smartphone. Above all, think about the number of other systems that can be tapped into – at the same time, even. It is the number of possible connections that causes the corresponding increase in complexity.
The main difference between today and yesterday is not the advancement in programming languages – even though developers may no longer code in Assembler or COBOL, these languages can still be used to write good programs today – but rather the number of possible solutions there are for a certain problem.
Take this analogy of trying to cross a river that is 30 feet wide without using a boat and without getting wet. In the past, there was one solution: system analysts would look for places where big rocks could be used jump across the river to the other side. Today, there are 10 different bridges crossing the river, that is, 10 different ways to solve the problem.
The software architect, then, has to choose a particular solution based on whether it meets various quality requirements. Let’s say there is a highway bridge crossing the river as well as a wooden walkway. To use the highway, you need to build feeder roads. Even if the simple wooden walkway is sufficient and building feeder roads requires more effort, the software architect may still choose to use the highway with the reasoning that other people want to cross the river, too.
It’s impossible to test every combination
Here is another example: Forty years ago, when passengers would buy a train ticket from a ticket machine, they would have to answer a series of questions, one after the other. From where do you wish to depart? To where do you wish to travel? How old are you? Are you entitled to a reduced fare? In which class do you wish to travel? And so on. If they discovered while answering the questions that they didn’t have enough money, they would have to cancel the transaction and start again from the beginning.
At today’s ticket machines, passengers will find the questions slightly more hidden in different fields. Instead of entering their age, they select standard fare, half price, or other offers. Rather than typing the destination in full, they type the first few letters, and only the possible destinations are then displayed. While the layout of the input fields suggests that the information can be entered in any order, that is still not possible. For example, if users have entered a discount ticket, they cannot subsequently upgrade to first class. However, instead of getting an error message that says, “First class must be entered before you select a discount,” users will see a message like, “You must purchase your first class ticket on the train.”
In this case, it is clear that developers made some small mistakes in the process of transferring an originally linear, simple input sequence to a graphical input system. Let’s say the machine needs to process five different inputs and they can be in any order. This means there are 120 different combinations of how entries can be made. So, it is understandable that not all input options were tested before the software was implemented.
In the past, it was possible to test each individual function and then test the complete process. Now it is necessary to test the interactions between individual functions. The number of these interactions depends directly on the number of possible sequence combinations, which can easily be a seven-digit sum. If you take a smartphone, for example, the number of possible combinations surpasses the example of the ticket machines by several orders of magnitude.
In 2000, it was stated at a conference in Germany that the number of possible states in a program the size of Excel 4 is approximately 10^80. That’s an unimaginable figure. It becomes even more unbelievable when you think that the number of distinguishable particles in the universe was estimated by Steven Hawking to be 10^160 in 2000. Both figures seem doubtful.
But even if this figure is reduced to one percent of one percent, it still ends up as 10^66. It is clearly impossible to test every possible case that could arise. In fact, that is exactly what the Austrian bank – mentioned at the start of the article – said to its customers.
So, if it is impossible to test everything, what parts of a program must be tested? This is one of the first tasks involved in software testing: to ascertain which test cases should be used. Many companies use developers to test other developers’ work. Alternatively, they ask the business department to do testing, since these employees are the only ones who really know how the program should work.
Who tests what?
Developers usually test whether the requirements have been met. If a certain requirement can be executed and the right result is delivered, the test case is in order. If the test cases are selected so that each requirement is assigned at least one test case, the program is considered to work as soon as all test cases provide good results. The business department, on the other hand, does not concentrate on the general requirements, but rather on the requirements that are important for its own activities.
And since these testers are familiar with certain customers, customer transactions, accounts, policies, and product combinations that frequently caused problems in the past, they can also refine the testing process. This is referred to as experienced-based testing, and it is an improvement on the method that uses developers alone.
Still, there is a weak point in this kind of testing: Who checks the requirements? Often, no comparison is made between the final specifications and the design specifications. This problem originates with the people placing the request. Often, they cannot visualize the behavior of the software that will ultimately be programmed. In the end, the final product may contain an “error,” which was in fact called for in the requirements. They just thought it would look different.
In addition, it is often the case that requirements are simply missing. These missing requirements may be overlooked in the testing phase both by developers and the business department because they are seen as self-evident. Using the example of the ticket machine again, it is clear that testers assumed users would know to start by touching the button at the top and working their way down. In fact, the program doesn’t tell users that they have to press the top button first, and they are technically able to press the buttons in a different order. If they do so, they won’t be able to complete the process correctly, but they also won’t get an error message because the requirement was missing from the design specifications.
Professional testers accept neither the developers’ assumption nor the business department’s confirmation. They try to put themselves in the role of the user. This is where testers’ creative thought processes start. They try to anticipate the users’ wrong entries. Here’s an example: Passwords are generally case-sensitive. Today, if users enter a wrong password, they are usually reminded of this with the message: Ensure that your CAPS LOCK key is off. But in the past, when this reminder was not so ubiquitous, users often assumed that they really had typed the wrong password. Perhaps they tried typing it again and again until they received a new message: Your password has been blocked. Please contact your system administrator.
This happens in every software program today in a different form – users make entries that infringe on a business rule that they know nothing about. Good programs will give users a corresponding error and help message. However, if developers assumed that users would know this business rule, they won’t have provided any such messages. The program will refuse to cooperate, and the user won’t know why.
65,000 errors in Windows NT
What can software testing accomplish today, in concrete terms? For the Windows NT operating system, which is small compared with today’s systems, Microsoft registered around 65,000 errors. The system is considered professional (C2 certification), and Microsoft made every effort to track down as many of the errors as possible. Due to economic considerations, however, it simply is not feasible to find all the errors. Despite professional testing, approximately three to four percent of all errors make it to production, that is, to the user. In this case, that comes to around 2,000 errors.
Operating systems are big, but even in a commercial application the number of errors can be between 1,000 and 4,000. Keeping in mind that it’s not possible to find all errors, it is important to look for those upon which users will inevitably stumble. For this, testers need to investigate the user’s typical use cases. In software projects that add more functions to existing software, the use cases are often not known or not explicitly described. In this case, professional testers would put together a list of use cases themselves and ensure that they record the related business rules in as much detail as possible.
For each use case, there is one test case per associated business rule. These test cases check whether a business rule is being infringed upon.
By Hans Hartmann, test director at Objentis since 2007.