Disturbing Thoughts: Agile Testing

"People who look for easy money invariably pay for the privilege of proving conclusively that it cannot be found on this earth."

Jesse Livermore, “Reminiscences of a Stock Operator”

Introduction

"Easy money" does not exist in the software business, just as it does not exist on the stock market. Those, who think otherwise usually, pay a hefty fee to be proven wrong. Any hope of solving software quality problems through a numbers crunching campaign, without a serious study of the nature of the problem, is futile, and will most likely make matters even worse than they were. One does not hope to get a Wimbledon medal without spending years playing tennis. Why should it be different for software?

Quality and time-to-market are the most important business success factors of any business in general and of high-tech business in particular. When given a choice, customers will not tolerate poor service and shoddy products - they will switch to another vendor. Quality is perhaps the most profitable investment of time and money, but it does not come free. One needs to learn how to achieve it in a cost-effective and pragmatic way.

Let’s see how Agile folks are solving the software quality problems and why it might work with enough time and energy invested in learning how to apply these techniques properly.

What is Agile Testing?

There is a wide spread understanding that testing means that one produces something and somebody else checks that there are no mistakes. This is a false impression.

Edwards Deming, the spiritual father of modern quality assurance, makes it crystal clear: “Cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place”. Edwards Deming, the father of the Japanese industrial revolution, formulated his famous 14 management points in the early fifties of the last century. By the end of the century, his ideas were widely adopted by the Agile Approach proponents in order to ensure the software quality in the first place.

Briefly, Agile Testing means specifying tests before development starts. These tests are run automatically by developers and by continuous integration servers as many times as required in order to ensure that what we think we developed is actually what we developed. Unless the software passes all the tests, the corresponding feature is not considered "ready."

Those who are capable of specifying tests in advance, fill the role of domain and quality assurance experts. Usually these are the most experienced members of the team. Needless to say, the very concept of blue-collar testers just does not exist in the agile team.

Now let’s try to understand what’s wrong with traditional, post-development testing.

What’s Wrong With Traditional Testing?

Delaying tests until the end of software development means serialization of the process. Serialization means very long delivery times. Long delivery times in turn means that the risk of developing a technically perfect, but completely useless system is too high. In order to ensure that we develop what is really required we have to get feedback as early as possible. When development iterations are shortened to, for example, one month the post-development testing eats up most of the iteration’s time budget, not leaving enough time for developers to develop quality code.

Frankly, if you find many defects by the end of the iteration, what can you do about it? It’s too late to change the code.

Therefore:

All critical acceptance tests must be ready before development starts.
All team members claim personal responsibility for the code quality.

There is no place for the kind of “we write code, they will find bugs in it” adversely Development-QC relationships. The whole team either succeeds or fails together.

The Importance of Test Automation

Even in traditional testing, tests usually are specified in advance, in the form of an Acceptance Test Plan document. So what’s the difference with Agile Testing? The main difference is that ATP is usually a document, and by definition it cannot be run automatically. If it’s possible at all, somebody at some point in time will convert this document into a script to run regression tests automatically. The problem with this approach is that those who specify tests cannot always verify the script, and those who write the scripts, cannot always understand the domain well enough, and thus might introduce subtle mistakes while converting the ATP document into scripts. For non-trivial domains the probability of mistakes grows enormously (and where is profit in simple domains?).

Fully automated regression tests are also often created only when the system is already developed. In fact, these tests are just a reverse engineering of what the system is doing, not what it is supposed to do.

In the case of User Interface intensive applications (e.g. the EPG), automatic regression tests created this way come very late, are too sensitive to even the slightest changes in the GUI, and in general, are not cost-effective.

If tests are not automated, but instead performed manually, the project will slow down to a crawl. The more features that have been developed, the more regression tests are required. Humans are very bad at performing repetitive mechanical tasks. Therefore, not having automated regression tests means severely limiting the project throughput.

Without thorough regression tests, we cannot guarantee backward compatibility, which in turn means we cannot deliver the next version from the main trunk of our source control to existing customers. As a result, multiple release and/or customer-specific branches will flourish in the version control system, and the overall maintenance cost will increase significantly.

Not having automatic regression tests also means we cannot re-factor our existing code base in order to adopt it to new requirements and improve its general quality. As a result, the code will very soon reach the “don’t touch me” status, and its quality will continuously deteriorate with any bug fix or change request.

Which Kind of Tests?

The Agile philosophy distinguishes between two basic types of automatic tests: Integrated and Unit tests. Integrated tests can be subdivided into Acceptance Tests, Endurance Tests, and Stress Tests.

To produce quality software, one has not only learn about each individual testing technique, but also to acquire a clear understanding of how these techniques complement each other, and why it’s so important to apply all of them in the right proportion.

Integrated Tests: Acceptance Tests

Acceptance tests should be specified before development starts in a format that enables automation. The scope of Acceptance Tests might vary from an end-to-end system to a single component. Acceptance tests are specified in the form of HTML tables called fixtures, and are run using Framework for Integrated Tests (FIT) or one of its extensions: FitLibrary or Fitnesse.

The FitLibrary extends the original FIT with additional types of fixture tables, while Fitnesse wraps it with a Wiki site in order to facilitate collaboration in acceptance tests specification. Fitnesse also has some handy tools for large acceptance test suite management.

There are three basic categories of fixture tables:

to setup initial pre-conditions
to exercise some system functionality
to verify post-conditions

The fixture table names are automatically mapped onto underlying programming language class names. The main versions of FIT, FitLibrary and Fitnesse are developed in Java and then ported to other programming languages: C#, Python, Ruby and even C++. I’ve found the C++ version to be not user-friendly, and for Integrated Acceptance Tests of C/C++ modules we use a special integration between bmock library and Java (so-called bmock console mode).

Acceptance Tests specified using FIT or some of its flavors are extremely powerful for dealing with large permutations of input values: parental rating strings formatting, error message prioritization, lengthy events sequences and complex state machines.

A critical trait of Acceptance Tests is that they are supposed to be run using only the core system under test, without involving any elements of the real environment: GUI, databases, file system, heavy communication protocols (e.g. FTP). The reason for this is that we want our acceptance tests to be fully controlled and to run very quickly (we will have a lot of them). Dealing with the real environment (e.g. real STB) will typically slow down the process significantly, and make it more complex. Using Acceptance Tests leads to a better, more modular design.

Acceptance Tests do not guarantee a high percentage of code coverage. The reason for this is that test complexity grows exponentially, and any attempt to cover all possible edge cases of all possible scenarios would lead to unmanageably large test suite. Achieving 100% coverage of lines of code is the goal of unit testing.

Acceptance Tests do ensure a proper functionality of the system, but do not guarantee its proper structure and long term maintainability. One has to combine the both Integrated and Unit tests in order to achieve required code quality level.

Other Types of Integrated Tests

Endurance Tests are intended to validate that the system under test will work a certain number of hours without crashing. More specifically, these tests validate that there are no resource (memory, file handles, sockets, etc.) leakages in the system.
Stress Tests are intended to validate the system’s throughput and latency for a certain workload (number of concurrent users). Formally speaking, the system latency is the function of the system throughput and the length of the internal queue. For single user systems like the EPG, stress tests goals typically need to be defined more specially.

Both types of Integrated Tests are usually run in environment which is as close to the real one as possible. In order to cope with test scenario complexity, we have to keep the variability of this kind of integrated tests within a limited scope - only a small number of selected and fully specified test scenarios.

Unit Tests - 100% Line Coverage

Unit tests provide a test for each branch or each method (or function) of each class (or module). In order to avoid exponential blow up of the tests' complexity, the system under test should be properly modularized.

Within the scope of particular class or module unit tests, all classes or modules, it depends on, will be replaced by mocks. There are unit testing and mock objects frameworks for each popular programming language: JUnit and EasyMock (or JMock) for Java, NUnit and NMock for C#, Boost.Test and bmock (developed by this author) for C/C++, etc.

The goal of unit testing is to achieve 100% line coverage. Only then one can be confident that all possible edge cases will not provide any unpleasant surprises in the real environment. This is usually possible only through a proper modularization of the system and using mock objects: there are types of edge cases which are virtually irreproducible in the real environment.

For some low-level rare edge cases (e.g., no memory) specific requirements do not exist. As long as the system behaves reasonably well and does not crash, any edge case handling mechanism would be ok. Applying this technique could reduce the size of Acceptance Tests suite substantially.

Unit testing comes to its full advantage only when it’s combined with test coverage measurement. It’s a big difference if one has 20% line coverage or 100%. To achieve the former is relatively easy once one adopts the unit testing approach in principle. It’s even not too hard to get 85% of line coverage. However, in order to get 100%, one needs a very high-quality modular code.

If it’s not 100%, you will never know whether the problem stems from something completely unimportant or if it’s a missed edge case. The only possibility is to develop an automatic custom of maintaining 100% line coverage all times.

We want our unit test suite to pass at least once through every line of source code. This is not the same as 100% branch coverage where all possible branches are executed under all possible conditions. If code is developed following the Test-Driven Development approach, the 100% branch coverage would be achieved automatically as a side effect. However, when talking about unit testing of legacy code, 100% branch coverage might be an impractical goal to strive towards.

Exploratory Manual Tests

The Agile Testing philosophy does not preclude manual tests. Actually the opposite is true: Exploratory manual tests are treated as an ultimate part of the Agile Testing portfolio. The emphasis here is on the word exploratory. Our automatic tests are as good as our knowledge of the system. If we missed something in the test specification, it will help very little that the target software passes all the tests. The only way to address this risk is to play with the system, trying to break it in some unusual, hard to predict way (today I would say just wearing hat of a naïve user). Some Agile teams adopt a practice to engage the whole team in the exploratory test session at the end of iteration; some teams delegate this task to domain experts, while others combine both techniques. For non-UI products, a specially tailored exploratory testing environment would be required. For example, exploratory tests of non-UI modules developed in C/C++ could be performed using the bmock library console mode.

If a discrepancy between desired system behavior and Acceptance Tests specification is discovered, this should preferably be reflected in a change request rather than in a bug report. Stakeholders too often change their mind once they get an opportunity to play with a real, even only partially functional, system. Treating all these changes of mood as bugs could easily create a false impression of the product quality and would hurt the team's motivation.

Agile Testing Summary: The Key Points

Agile Testing is about bug prevention rather than bug detection.
All regression tests must be automated. Otherwise the development speed will eventually slow to a crawl, and maintaining multiple branches in version control system will be inevitable.
Acceptance Tests specify requirements for each User Story in a form suitable for automatic execution. When the system passes its Acceptance Tests suite this means that its functionality satisfies all existing requirements. Acceptance Tests do not guarantee either proper handling of all possible edge cases, or proper maintainable code structure.
Unit Test suites should provide a test for each branch of every method of every class which leads to a 100% line coverage. The 100% coverage of lines of code guarantees a reasonable handling of all possible edge cases, and maintainable and highly modular code structure. To avoid exponential growth of the unit tests' complexity, mock objects are usually required.
Exploratory manual tests are performed at the end of each iteration by the whole team and/or domain experts in order to check if there is something missing in the formal specification of Acceptance Tests. Usually, people engaged in Exploratory Tests are trying to break it in unusual and hard-to-predict ways. Avoid interpreting of discrepancy between desired system behavior and Acceptance Tests specification as bugs, but rather, convert them into change requests.
In order to achieve a proper level of software code quality one has to combine all types of tests: Acceptance, Unit, and Exploratory. Additional types of Integrated Tests (e.g. Endurance, Stress) are added to the automatic regression test suite where and when appropriate.

Disturbing Thoughts

Monday, August 24, 2009

Agile Testing

No comments:

Post a Comment

Followers

Blog Archive

About Me