Monday, August 24, 2009

Choose Your Process: Waterfall, RUP, Agile

“… 45% of features implemented are never used, and 19% are rarely used …”
From “Making Agile Mainstream” by Per Kroll

Introduction

Delivering a technically perfect software product, which nobody needs, is probably the major risk of software business nowadays. Delivering a faulty system is not better either. To make the matters more complicated the window of opportunity gets smaller and smaller: if we are unable to deliver a commercially viable version in about nine months, it might be too late, and would be better not to start at all.

I do not know where this nine months magic number comes from. May be from the stock market cycles, but it’s my speculation. Still more and more people start to believe it firmly: deliver in less than one year; otherwise the risk of being late to the party could be too high.

Therefore there are three major risks: to deliver a faulty system, to deliver a wrong system, or to deliver a right high quality system, but too late. Sounds scary, but that’s life in high tech.

In order to survive in the software business we need to face and properly manage these three major risks. A collective set of practices and methods, which help to manage these risks in a cost-effective way is usually called Software Development Process.

In this paper I will briefly touch three the most popular: Waterfall, Rational Unified Process (RUP), SCRUM, and Extreme Programming (XP). Describing even one process in full detail would take a whole book. The main question I’m going to address is how these methods could help us to manage the risks better.

Waterfall

The so called Waterfall process suggests 7 successive development phases, namely: System Requirements, Software Requirements, Analysis, Program Design, Coding, Testing, and Operation.

The Waterfall process is superior in preventing costly faults by careful planning in advance. It works tremendously well provided that there is a clear understanding of what the system is supposed to do, there is a solid scientific background for the core system logic and there is enough time to perform all preparatory work.

This approach worked pretty well in 70’s for building the first computer-based automation systems. Thinking hard in advance does help, indeed, provided you know about what. The Waterfall approach is crucial for systems where preventing failure in advance is the number one priority since any mistake would be too costly to discover and fix later. Some products, like Circular Chips indeed require all seven phases and accompanied by careful even pedantic reviews. There might be just no any other option. Developing hardware drivers, low layers of operating systems, even compilers could and actually should follow the Waterfall prescriptions.

The Waterfall process, however, fails miserably when the software system requirements are not known in advance in enough detail. The Waterfall also starts cracking under timing pressure. As an outcome most of the contemporary business application software could not be developed following the Waterfall process since we will most likely get a technically perfect, but wrong system, or the right system, but too late.

Iterative-Incremental Software Development

The main insight comes from the observation that Waterfall could work pretty well for a relatively short period of time, say up to three months. After that the uncertainty is too high. This is a fundamental part of the whole game: many stakeholders might think they do know precisely what do they want, but when they get the system in their hands they might change their opinion, sometimes even dramatically. The contemporary software development is so-called open system with feedback. In other words what we deliver will have an impact on what we are supposed to do.

One possible way to address the risk of being wrong at large is to afford the risk of being wrong at small and to correct when necessary. In other words we have to break the whole period of, say, 9 months or more, in smaller chunks, when a part of the system functionality could be exercised and a feedback is provided.

We will call these chunks of time iterations. At the end of the each iteration we are supposed to have a stable, partially functioning system, which does something useful. Unless the whole iteration was devoted to bug fixing, each iteration will add a new piece of functionality – an increment, hence the name of this approach: Iterative-Incremental.

Having a partially functioning system would allow us not only to demonstrate but also to test it. This in turn minimizes the risk of getting unpleasant technical surprises too late. In addition it helps to address the risk of being too late. If the window of opportunity is going to close we might decide to deliver a partially functioning system following the “something is better than nothing” wisdom.

This is a general idea. How exactly iterations are used depends on the particular process.

Rational Unified Process (RUP)

RUP is defined as a framework for Use Case Driven, Architecture Centric and Iterative-Incremental software development processes.

The Iterative-Incremental part of the process helps to address the risk of being wrong or/and too late.

The Architecture Centric part of the process helps to mitigate the risk of faulty system by addressing early technical non-functional risks such as performance, scalability, extendibility, consistency, reliability, etc.

The Use Case Driven part of the process, again, helps to address the risk of being wrong by using relatively small chunks of functional requirements, called use cases, to steer technical decisions.

RUP Iterations

In RUP iterations play different roles. The first couple of iterations are used for requirements gathering and making initial go/no go decision. The next couple of iterations are used for making long-term technical decisions about the system architecture and building an initial architectural prototype. The next, usually the largest, group of iterations is devoted to building an actual product. During the last couple of iterations the built product is passed to prospective users/customers in a form of alpha or beta site.

These groups of iterations in RUP are called phases: Inception, Elaboration, Construction and Transition respectively. RUP assumes that the intensity of various technical activities, such as requirements, design, and coding, in different phases will be different.

Unified Requirements Management

RUP applies use cases as a tool for functional requirements organization. Use case is a set of possible sequences of interactions between the software system and one or more external actors. Every use case is initiated by some actor (human or, sometimes, non human), which has a particular goal to be achieved. The use case stops when this goal has been achieved or an error has been reported. There is a certain danger of misusing use cases for the system functional decomposition, which leads to too fine granularity too early. The easiest way to understand use cases is to treat them as top level chapter titles of the system user manual.

The tricky part of software requirements management is to realize that it is in fact a craft of decision making under constant pressure balancing between multiple stakeholders having different power. RUP, more specifically its Unified Requirements Management methodology, admits this fact by putting software requirements into a context.

Whenever some stakeholder presents a certain request towards the software system we are going to build, this is a requirement, but this is a requirement of a very special type: it is a stakeholder need.

When somebody from the project management group makes a decision which capability the system will have or which service will it provide it would also be a requirement, but of an absolutely different nature: it would be a system feature.

When we make a decision about how external actors will communicate with the system we come up with use cases. Many times the basic set of use cases is prescribed: Subscriber Management, Broadcast Program Scheduling, TV Program Watching and Content Delivery use cases are pre-defined by the very nature of our system.

Mapping stakeholder needs onto features and use cases forth and back is a subtle and delicate process, which requires good analytic and human communication skills.

Usually features manifest themselves in multiple use cases and here the most interesting part of requirements management starts: within every use case every feature will potentially lead to one or more alternative event flow. Ability to analyze and specify these alternative flows properly in a timely manner is the key project success factor.

Features constitute a set of variability points in our software system. Reflecting these variability points properly in code would be a subject of software architecture and configuration management.

RUP Architecture

RUP recommends reflecting all fundamental technical decisions about the software system structure in a separate document, called, Software Design Specification. This document is normally prepared during Elaboration phase. The main claim is that without having properly laid down foundations at early stages the whole system would be too shaky, inconsistent, and would most likely crack down under one or more non-functional requirements pressure: performance, reliability, etc. As Grady Booch, one of the UML creators is used to saying, building a skyscraper by dog house methods is a guaranteed way to disaster.

In RUP the Software Design Specification document has a special structure reflecting the fundamental fact that for any non-trivial software system, more than one view is required in order to describe its essential characteristics in enough details. This approach in RUP is called “4+1 View of Software Architecture”, which suggests to describe software system using Logical (structure, functionality), Implementation (build process and configuration management), Process (performance, scalability), Deployment (high-availability, communication), and Use Case (external actors perspective) views.

In many systems the Logical view plays the most critical role in managing complexity. One of the reasons that we are so often too late or develop wrong systems is that we are prone of creating monsters we too quickly lose a control over. The reason for this is that almost any contemporary software system has a large number of variability points – features. These variability points typically belong to different levels of abstraction: supporting certain graphics chip set is at another level of abstraction than supporting a particular pay-per-X business model. Mess up all these features in one monolithic block and you will never be the master of your code. The code structure will dictate what is possible, and what is not. This is the reason why so many software systems are over-featured (remember the 45% cited at the beginning?). Software developers just cannot take out a particular feature without destroying the whole system. Due to improper code structure this obsolete feature is interlocked in a single monolithic structure with all others. Some of these other features are mission critical, so it’s too risky to make any change. It’s scary to think at which extent software technical flaws could shape the whole business.

The matters are much worse when the logical structure is intentionally messed up for job security reasons. Have you ever heard the claim: “here everything is connected to everything so only we could develop such and such feature”? It happens here and now, sometimes unconsciously, sometimes not. It’s not completely inconceivable to have an architecture where business rules are implemented at the disk drivers’ level. It not only means poor engineering, it first of all means bad business.

Using Logical view one could properly reflect major variability points by properly structuring the system in terms of layers (levels of abstraction) and subsystems (loosely coupled functional blocks).

Unified Configuration Management (UCM)

This is another thing to be taken from RUP. It is not about company-wide adoption of Clear Case (another controversial topic deserving a separate paper). It’s about a particular style of work.

UCM is first and foremost about protecting your investment. Iterative/Incremental (especially use case-driven) development by definition assumes changes in code. Even though you lay out your architecture in advance, you will change a least some of your code in order to implement new features. This could not be avoided if you want to deliver “the right system”.

The code changing process must be fearless. The fear hurts our judgment and prevents us from analyzing trade-offs properly. The only way not to be scared of changes is to know you can always rollback your recent changes and start over. This guarantees you will never make any serious damage to what has already been done.

The UCM development and integration streams are exactly for this. For every small activity (task, in agile terminology) you check in to your development stream. You could always discard any recent changes and start over again using the recent version of your development stream. Once the task is done, check-in, but do not deliver yet.

Typically several tasks are connected together in a group in order to implement what in agile practices is called user story. If you check-in all changes directly to the main trunk in your version control system, it might create too much disturbance to other developers. For that reason keep your development stream as long as you are working on your part of the user story. If things go wrong you will be able to discard all changes, rebase the development stream and to start over. When the story is ready, deliver your development stream to the integration stream.

The UCM practices not only help us with developing the right system by creating a proper support for code changes. It also helps with delivering it on time through establishing an effective insurance policy against wrong changes.

AS: this is my relatively old article. Today I’m not sure about separating integration and development streams since in practice they do more harm than good. Today I would strongly recommend working on trunk only (perhaps a topic for another blog).

SCRUM

RUP is a great software development process framework. This framework, however, is currently specified in about 3000 pages. Software developers do not have time and are not patient enough to read even 10 pages. How could one expect them to read 3000? And that’s not all, after you have read these 3000 pages you will need to customize the process to your organization, or, even worse, to each project independently.

In addition RUP, as Waterfall, follows a too mechanical view of the world, which is supposed to adhere to the Aristotelian “excluded third” principle. Things are too black-and-white, too hierarchical here. In reality things are usually gray: successful software organizations seldom follow any kind of hierarchical structure prescription. Apparently a mechanical, clock work – like treatment of the software organization is limited. It typically stops working when requirements are too blurry and/or time frames are too short. Applying organic, biological analogy would be more productive.

The first thing is to appreciate the interaction. Interaction not only between the software organization and its customers (we addressed this by endorsing Interactive/Incremental development), but also between the software development team members.

Developing contemporary software is an extremely intellectually demanding activity requiring the best brains in the town to make it successful. This imposes a tough dilemma for managers: if you could really manage these guys, you have to fire them, since they are not smart enough, if they are really smart enough, how could you hope to manage them?

SCRUM, one of the most popular agile software development methods, rises to the challenge: it describes clear roles and responsibility of team members while admitting the explicitly unpredictable nature of the software development process in general. As many other agile methods SCRUM faces the three major risks simultaneously: we want to deliver the right system, at the right time of the right quality.

The SCRUM way is to establish a team of people having appropriate skills and let them to find the best way to accomplish this goal. The whole process is built on the top of very close, informal communication between team members and customers. That would allow to exhibit maximum flexibility and to keep all critical parameters: functionality, quality and time to market in a proper balance.

This close communication would be possible only if the team is of right size, usually 5-10 people. For large systems certain coordination between SCRUM teams would be necessary, and here RUP recommendations would be indispensable.

All team members in the SCRUM team are equal: there might not even be a clear separation between developers and QC. Some team members have special responsibilities. The first one is of Product Owner (sometimes may be not a part of the team- AS). This is the person, who using the URM terminology converts stakeholder needs into features, maps them on use cases and presents to developers in a form of backlog records: things that need to be done.

The Product Owner takes the responsibility to ensure that the team will deliver as many as possible useful features in right priority order: the most business value first. If the customer changes her mind, the Product Owner will defend her and will ensure the product plan will be adopted accordingly.

Having only a Product Owner on the team would make an imbalance: too many, too sharp changes could easily distract and discourage the whole team. Somebody needs to play a balancing role and ensure that what has been agreed upon will be done, and what is done is indeed done and not just pretends to be. It’s not a role of manager, but of a Shepard, who protects the flock against wolfs. The team members are engaged in very intellectually demanding tasks, and in spite of how much smart they, they could be vulnerable to external distracting factors including their own Product Owner. The team needs protection and steering. In SCRUM this role is fulfilled by the SCRUM Master, whose responsibility is to remove any obstacles, which impede the team from making progress. The SCRUM Master makes sure the process is working as it has been agreed upon. If it requires saying sometimes, “no” to even their Product Owner, (s)he will do it.

In a SCRUM the team, the Product Owner, and SCRUM Master together play a game, in which everybody has a special role. As in any game, SCRUM needs certain rules and rituals, which come in the form of Sprint (SCRUM version of iteration), SCRUM daily meetings, planning meeting, demo and retrospection.

In SCRUM all iterations (Sprints) are equal, there are no phases. Every Sprint is supposed to be exactly 30 working days. This timeframe is considered to be necessary and sufficient for the team to self-organize in order to deliver a meaningful increment. Once the Sprint plan (Sprint backlog) is approved it cannot be changed.

From the risk management perspective it means that at the worst case we will need to throw out 30 days worth of the team work if we were completely wrong. In practice this seldom happens. It also means that the Product Owner will need to wait 45 days at average to get a newest, hottest feature (s)he decided to ask for now. This is quite a long time to wait and not everybody is comfortable about it. Some people argue that this would require from Product Owner to do her homework better. Others would argue that rapidly changing priorities are the nature of the software business and that the iteration length should be shortened to two weeks, or as Extreme Programmers argue, ideally to one week.

We will leave this debate to experts. As the matter of fact stable 30 days, back to back iterations for mainstream products are yet to be seen in our organization.

The whole SCRUM process is based on self-organizing team dynamics rather than on any particular programming techniques. The team members are supposed to be smart and professional enough to find the most optimal way to make their job done. Test automation is probably the only, though a very important, exception from this rule. SCRUM makes is very clear: whatever your completion criteria is, you must automate the verification process. If tests are not automated, you won’t be able to run them as frequently as needed, and will not be able to ensure “what is done is done” by the end of the Sprint. SCRUM does not, however, say how to automate the tests. That’s again is fully delegated to the team.

For more information about the SCRUM process look here or here.

Extreme Programming (XP)

From the development process organization point of view XP is quite similar to SCRUM except that it strongly advocates weekly iterations as a better “deliver right system” risk management tool. For those for whom it works it might be a wonderful advice. While SCRUM is concentrating on what is possible, XP is looking for what is the most optimal. Each approach has its own merits and it’s not the place here to decide what is better. In its constant pursue of the best, XP came up with a bunch of engineering practices, which one has to consider as a way of implementing any Iterative/Incremental development process including RUP and SCRUM. A good overview of XP practices could be found here and here. Here I will briefly touch only those, which are directly connected to one possible way of implementing SCRUM: units of planning (user story), units of work (tasks), test automation, code refactoring and continuous integration.

Use Stories and Tasks

SCRUM does not prescribe specifically what the nature of backlog records is. In practice many have adopted the XP concept of user story as a primary unit of planning.

XP firmly believes that anything, which does not demonstrate s direct value for customer, should not be done. With this regard XP is indeed extreme. Want to decide about your database engine or to establish some performance benchmark? Find a simple scenario, which does make a sense from the business perspective (the Product Owner decides), and which would require these technical capabilities. It sounds a bit restrictive, but the whole point of this philosophy is to eliminate the waste. In XP philosophy any extra line of code is a liability, not an asset, unless the otherwise is proven from the customer’s value perspective.

A user story is a sequence of interaction between external actor and the system in order to provide the actor with some value. The most important thing about user stories is that it should be possible to implement each story in a reasonably short period of time, say 2-3 days. User stories definition sounds similar to that of use cases, but user stories are NOT use cases. The latter are groups of scenario. The former are just manageable chunks of work. In some cases a user story could span across several use cases, but usually each user story would be just one possible scenario within some use case exhibiting a certain feature. In general user stories are much less formal than use cases. They are mostly intended to facilitate communication among stakeholders.

For each approved user story the team needs to break it into elementary units of work: tasks. Many times tasks are architecture driven: change GUI, change business logic, change database, change infrastructure, change development environment (if required), and change user documentation (if required).

The team estimates user stories based on complexity using so called story points. All what is required at that moment is to asses each story against others: this story is more complex, that story is simpler, etc. During the time the team learns about its velocity: an average number of story points the team is capable to deliver per iteration. Knowing the team velocity would make the estimation process more accurate.

Tasks are usually estimated using ideal hours, or days: how much time will it take to implement the task provided there are no interruptions.

Test Automation

Initially user stories are collected using simple sentences like “As a I may do XYZ …” These are merely reminders to get into deep discussion about what does it really mean to be able to do this XYZ. However when the time comes to develop the user story we need something more specific, we need a user story acceptance criteria. The user story acceptance criteria are specified in a form of acceptance test suite, which could be run automatically. If the system does pass all these tests, it would unequivocally mean that the system does implement the user story. In this case the user story is considered to be done.

Acceptance tests are fully automated. This would ensure that any new user story implementation does not break any already implemented user story. Like UCM this is yet another important insurance policy mechanism: to prevent from falling back while moving forward (this happens too often in software to be ignored).

Acceptance tests are not the only type of tests, which are automated. In reality software quality is established at a micro level: individual functions. It’s just virtually impossible to cover all possible permutations at acceptance test level. In order to ensure this atomic level quality we need another type of tests: unit tests. I described the agile testing approach in more detail elsewhere in a separate article.

Code Refactoring

Code changes are inevitable. In fact code changes happen all the time regardless of whether one is using Waterfall or XP. The only question is how the code is changed. In XP the constant code changing is the norm of life. If code changes are inevitable it would be more practical to spread many small code changes across relatively long period of time rather than to try to perform instantly one big change. The latter is too risky from the quality and time to market perspectives.

Refactoring is a technique of improving the code structure without changing functionality. Refactoring is performed in order to clean up some mess collected during the time or in order to accommodate new functionality more easily. Refactoring is about risk management. If we do not refactor our code, then on the long haul our ability to manage properly the trade-offs between getting the right system at right time with right quality will deteriorate: we just won’t be able to keep all three characteristics close to the optimum.

Continuous Integration

Putting pieces together in software takes time. In other words integration is the number one risk from the time to market perspective. We never know exactly how much time the integration will take. Why it’s so? Because we are trying to put together too many changes, and too many things are out of the synch simultaneously. Following the XP philosophy we would like to spread the burden, to minimize the risk, which means to integrate as small as possible portions of change, which in turn means to integrate as frequently as possible. Needless to say the process must be automated.

Concluding Remarks

It would be silly to hope to cover four software development process methods mentioned above it a single article. Even in a modest scope it would require a number of volumes not talking about more technical details, which are exactly the place where the whole difference between success and failure lays. The main message I would like you to take out of this article is that any software development process is about risk management. The major three risks are: to deliver a wrong system, to deliver it too late, and/or to deliver it with poor quality.

Do not confuse means and ends: any process, which allows managing these risks in a cost-effective way, would be good. Creating a software development process from the scratch, however, is a risky endeavor: it takes a lot of time and trial and error to come up with something reasonably good. Without a good reason any organization should not do it. Anyhow there is always a stage of learning the best practices and trying to understand what does and what does not work for us, and why.

Combining essential elements of RUP, SCRUM, XP and even Waterfall would give us a good starting point for ensuring our competitive advantage. The “learn first” principle is the best advice one could expect to be given with this regard.

No comments:

Post a Comment