Disturbing Thoughts: 2009

Thursday, August 27, 2009

Knowledge Acquision Plan

Extracted from my 2005 trip report, but still relevant, I think.

Knowledge acquisition policy is one of the most basic business strategy building blocks of every high-tech company. In fact a “knowledge acquisition plan” specifying which knowledge is produced internally, and which is purchased from outside and in which form, should come before any other “acquisition plan” including furniture, computers, training courses, and hiring. Only by having a clear understanding of which knowledge constitutes the company’s core competence and which mission-critical supporting functions must be performed in-house, a conscious decision about how many employees of which skills are required could be made.

Every high-tech company spends substantial amounts of money on books, training courses, conferences, mentoring and consulting services, hiring subcontractors, and outsourcing certain projects. Even though our Knowledge and Learning Centers, IT and Finance departments provide a wonderful logistics support for these activities, having a clear long-term knowledge acquisition plan is still a would to be achieved goal. Hopefully this paper will contribute to making this goal closer.

What’s the Difference?

In order to have a cost-effective knowledge acquisition strategy one needs a clear understanding in what way are consultants different from subcontractors, trainers and mentors, and what could be an added value of each form of service.

Consultants are usually experts in some specific technical area: testing, project/product management, software engineering. Even though good consultants could (and usually should) be good trainers and mentors in their area of technical expertise, their main added value comes from a different source: general systems analysis and problem solving. In order to better understand what’s the difference let’s briefly consider each type of service.

Subcontractor

A subcontractor is a person or a company, who does possess particular skills and/or knowledge about a subject, which are not mission critical for the company business, and do not belong to its core competence. However developing products using these skills and knowledge cannot be outsourced due to some logistics, intellectual property rights or contractual limitations. For instance, developing an auxiliary special-purpose tool could be sometimes effectively subcontracted.

Subcontracting is an effective cost management tool since it allows benefit from a certain set of professional skills without establishing long-term employer-employee relationships.

To sum up, subcontractors bring particular knowledge and skills and by themselves apply this knowledge and skills to particular company’s projects.

Trainer

A (external) trainer is a person or company, who provides a (semi-) formal education in from of frontal lectures and generic exercises on a subject, which is mission-critical, but does not belong to the company’s core competence. For instance, software engineering practices might not belong to your company core competence, but still be mission-critical for your business success and thus demands a well-defined external training program.

To sum up, external trainers bring to the company particular knowledge and skills in general without applying them to any specific project.

On the other hand internal trainers and mentors usually help newcomers to acquire knowledge and skills, which do belong to the company core competence or established practices.

Mentor

A mentor would be the same (or another) person or a company, who provides training follow-up services for the employees day-to-day activities. For example, an instructor, who helps software engineers to apply the Test-Driven Development approach to developing their programs, would play a role of a mentor.

To sum up mentors help the company to apply particular knowledge and skills to the company’ projects, but (ideally) do not implement these projects by themselves.

Consultant

Although a good consultant is sometimes engaged in training and/or mentoring activities (or whatever is required to make a meaningful progress), that would not be his/her primarily contribution. A good consultant seldom solves problems, but rather helps the company to solve its problems by assessing the current situation and evaluating available choices.

When utilized wisely a good consulting service helps the company management to better understand which problems need to be solved, which would be better to ignore, and which external help in a form of outsourcing, subcontracting, tools, training, mentoring, or whatever else, is required in order to solve these problems.

In particular, world-class consultants are savvy general system analysts capable of finding problem root causes and conscious evaluation of realistic solution choices available at the company’s disposal.

Human factor plays a major role in all but the most basic technical problems, and a good consultant needs to be experienced and skillful enough to handle even the most bizarre situations caused by broken interpersonal communication, incongruent behavior, self-imposed constraints, automatic “survival” rules, poor emotional intelligence, incoherent reward/punishment schema, or sometimes just outdated and ineffective company structure. When the root cause analysis task is taken seriously, dealing with these “dark” sides of the company life is almost always inevitable.

With this regard external consultants have a certain advantage over internal ones. First, as Jerry Weinberg says “the fish is the last who sees the water” and the most challenging part of each problem is to acknowledge the problem. External consultants are by definition independent, (supposed to be) not scared of or emotionally engaged in anything and thus could see the situation as it is rather than how the company employees and/or management want it to be. Also, an experienced consultant has already seen more than one company, and since very few problems are really unique, could supply a basis for reference and information about which choices are available.

On the other hand, internal consultants are more familiar with the company business, so sometimes a combination of the both types could work the best.

We all are consultants

The modern high-tech business is turning out to be a more and more challenging business, with the ever increasing weight of human factor, where having a good consulting tool kit starts to be a matter of survival. In a sense we all are playing a role of consultant under different circumstances. To be a successful vendor we have to provide good consulting services to our customers. To be successful managers we have to be able to be good consultants to people how are reporting to us. To be successful technical leaders we have to be good trainers, mentors and consultants to our teammates. Even in order to be a good subordinate one needs to be a good consultant to his/her manager.

The Best Advice I Can Give You

Having said all that the best advice I can give, as an experienced consultant, is "never take consultants too seriously". This is a kind of vaccination you have to make before you start working with consultants. All consultants are humans and could have their own blind spots. But this is not the main reason. The main reason is that sometimes companies take consultants too seriously and involve them too deeply in strategic decision making. This is dangerous because consultants normally do not have to loose as much as the real stakeholders have. The latter makes the whole world of difference. Decisions should be made only by those, who have a real stake in the matter. Good consultant should know where and when to step aside and not to push too hard. Making decisions in principle is not the same as making decisions in real and we all have to be aware about this.

Wednesday, August 26, 2009

Why Twitter, again?

Why should everybody know what I'm doing?

First of all, who cares? Very few people, if any.

Second, today perhaps nobody knows why. There a couple of blogs (and here), which are trying to connect Twitter to Abraham Maslow's Theory of Human Motivation. The authors speculate which kind of human needs does Twitter satisfy and how. I think people largely misinterpret Maslow's theory and take his motivation theory out of context. Only those, who have never read Maslow's books could claim that "According to Maslow, everything we do is derived from and revolves around a certain need we are seeking to satisfy". According to Maslow people, who do everything for a reason, to satisfy certain need, are mentally sick. Healthy people do a lot of things without any motivation, spontaneously, because they are so. Also, according to Maslow people often behave this way or another because some of their basic needs are thwarted. It does not mean they are aware which need is deprived and are consciously trying to address it. It not too very often happens, and only with those who have relatively strong mental health.

Obviously different people Twit for different reason. Just this mechanistic calculation: this type of Twit for this need, this type of Twit for that need, looks stupid. Definitely not what Maslow was trying to say.

Why does bird fly? Because it's a bird. Why do we do a lot of things spontaneously, thoughtlessly? To satisfy our higher needs? Not always. To satisfy my higher needs I will read a book, but will not hum a song. I will hum because I'm humming now (thanks, Winny). Even not for fun. Just without any reason.

Perhaps this is the power of Twitter that it allows people to get of out controlled and motivated (Maslow would call "coping") behaviour, to throw something silly in the air without caring too much who will pick up it, just for fun. Over-controlled behavior is too boring and in fact dangerous.

Personally I follow a relative small number of Twits from people I have an interest in. I do not care too much what they write, but do get a better idea about what kind of people they are.

I started my Twit because I wanted to learn something about social networks, but after a while I realised that I like it. While I'm on travel it gives me a way to take off some burden (Oh Lord, stop feeding me tuna!). But basically I think it's completely unmotivated. Just a nice gardget to toy with. I may stop it one day, or to continue building my digital astral body till the death. Who knows.

Obviously, some people and organizations are using Twitter for a carefully crafted brain washing and self-promotion. Such danger does exist, indeed. And Twitter still needs to find a way to make money out of their technology. I think when it starts to be over-commericialized the digital flock will fly to elsewhere. All in all any fashion is not for ever.

Another comment. I found Twitter somehow similar to haiku - very short Japanese verses, popular in Zen Buddhism. I have very little what to do with Buddhism, but I like this short concise form, which helps you freeze the moment in sometimes completely irrational sequence of words. May be a kind of new era for proustian or joician flow of conscious? Hard to know, it's unconscious.

Now the only question remains. If I'm so spontaneous and free of any self-promotion calculations, why do I put these sexy icons of my Twitter, Twitterpc and Blogger as a digital signature of my e-mail. Isn't it a hypocrisy? Marketing without marketing as they call it. I hope the answer is no, even though in this kind of things it's hard to be 100% sure. As part of my work I often come across some interesting materials (blogs, presentations). I also periodically write by myself. It's a way to improve professional knowledge. I used to push this stuff via e-mail to some of colleagues, but I have never done it systematically. I also found this push style too intrusive. What if somebody does not want to get them, or do not want others to know that they do get it? For that reason I decided to put the links at the end of my e-mail. Who wants will read, who doesn't - will not. Free choice, so to speak. I was also interested to know where do these nice icons come from. Apparently each Web site has a special file called favicon.ico. Technically it's a bit cumbersome - needs to be converted to png, but looks nice. We are engaged in the electronic media business. Some familiarity with what's up will not hurt.

Tuesday, August 25, 2009

Manager's Appraisal

Today the normal procedure is that managers perform an appraisal procedure for their subordinates at least once a year. Promotions, demotions and salary updates are normally based on the results of this appraisal. My question is who performs appraisal for managers? An obvious answer is their manager up to CEO, who is accountable to the board of directors. The problem with this answer is here their manager will perform their appraisal as subordinates. And my question is who will assess their work as managers? In other words who decides whether they are good managers or not? So call 180 degree appraisal in my experience seldom works: who will take a risk to say something really serious to their managers?

Before we can reason about who decides whether the manager is good or bad there is simpler question to be addressed: what does it mean to be a good manager and for whom? Good for whom? Presumably for the business organization (the Company). In other words those managers are good who contribute to fulfilling the Company mission proportionally to their compensation. What is the Company mission? Too many people still believe that it's to maximize the shareholder value. In other words to protect private and public funds invested in the company. In reality it's not that simple. As P. Drucker claims to be profitable and at large is merely a price to stay in the business on the long haul. If so that what is the long term mission of the Company? The most leading experts follow the P. Drucker premise that the Company mission is to ensure long term wealth generation and personal growth for their local communities. How is it related to the current globalization trend is a separate story (perhaps another blog).

Here is a short list of what the Manager has to do in order to contribute to the Company long term mission:

1. To ensure that employees the Manager is responsible for have ALL necessary conditions for productive work (salary, desk, chair and computer are only part of it, see A. Maslow needs hierarchy)

2. To ensure steady extending of the company market share and development of new products and services

3. To set high quality and productivity standards, which will keep the Company above competition

4. To get rid employees, who by this way or another violate #1, #2, #3

The common belief is that the Manager has to keep his subordinate accountable. That's true, but without ensuring right conditions that would turn into permanent intimidation, micro-management, and the worst - the loss of the best people, who will just decide to try their luck elsewhere.

Today perspectives of personal growth constitute a lion share of employee work conditions package. Times when manual workers performed dirty job they hated for some money to put food on their tables are over. At least at the high tech industry. Now in order to survive the Company needs the best people it can attract. But the best people normally want to be better, they want to grow. They just spend too much time at office to afford otherwise.

To be able to grow we have to work with right people. Practical experience constitutes about 90% of the personal growth and we normally learn from each other. For that reason #4 is so important: we have to get rid of "wrong" people not just because they do not justify their salary, but because they deprive the "right" people from normal working conditions.

Who are the right people? Who can decide? Managers, for sure, who by themselves should be the right people. Sounds like a egg and chicken problem. In a sense it is, but here are three basic traits I would suggest to consider:

1. Mental health. Korzybsky and Maslow give some very good insights. Too much needs to be said here (another blog, sigh). In short, too knowledgeable engineer, who has serious communication and cooperation problems, might be not the best choice (We need geeks, who are socially responsible - Kent Beck).

2. Ability and willingness to learn fast technical and non-technical stuff. Every company has enough technology and corporate politics specifics, which everyone just needs to know. The faster the better.

3. Good, preferably wide, engineering and general (art, music, science) education. People coming for outside could very often enrich the Company technology and process portfolio by just bringing another perspective.

Who should perform this appraisal I still do not know, but at least there is an initial check list to start with. Good managers might be able to do it for themselves.

Monday, August 24, 2009

Choose Your Process: Waterfall, RUP, Agile

“… 45% of features implemented are never used, and 19% are rarely used …”

From “Making Agile Mainstream” by Per Kroll

Introduction

Delivering a technically perfect software product, which nobody needs, is probably the major risk of software business nowadays. Delivering a faulty system is not better either. To make the matters more complicated the window of opportunity gets smaller and smaller: if we are unable to deliver a commercially viable version in about nine months, it might be too late, and would be better not to start at all.

I do not know where this nine months magic number comes from. May be from the stock market cycles, but it’s my speculation. Still more and more people start to believe it firmly: deliver in less than one year; otherwise the risk of being late to the party could be too high.

Therefore there are three major risks: to deliver a faulty system, to deliver a wrong system, or to deliver a right high quality system, but too late. Sounds scary, but that’s life in high tech.

In order to survive in the software business we need to face and properly manage these three major risks. A collective set of practices and methods, which help to manage these risks in a cost-effective way is usually called Software Development Process.

In this paper I will briefly touch three the most popular: Waterfall, Rational Unified Process (RUP), SCRUM, and Extreme Programming (XP). Describing even one process in full detail would take a whole book. The main question I’m going to address is how these methods could help us to manage the risks better.

Waterfall

The so called Waterfall process suggests 7 successive development phases, namely: System Requirements, Software Requirements, Analysis, Program Design, Coding, Testing, and Operation.

The Waterfall process is superior in preventing costly faults by careful planning in advance. It works tremendously well provided that there is a clear understanding of what the system is supposed to do, there is a solid scientific background for the core system logic and there is enough time to perform all preparatory work.

This approach worked pretty well in 70’s for building the first computer-based automation systems. Thinking hard in advance does help, indeed, provided you know about what. The Waterfall approach is crucial for systems where preventing failure in advance is the number one priority since any mistake would be too costly to discover and fix later. Some products, like Circular Chips indeed require all seven phases and accompanied by careful even pedantic reviews. There might be just no any other option. Developing hardware drivers, low layers of operating systems, even compilers could and actually should follow the Waterfall prescriptions.

The Waterfall process, however, fails miserably when the software system requirements are not known in advance in enough detail. The Waterfall also starts cracking under timing pressure. As an outcome most of the contemporary business application software could not be developed following the Waterfall process since we will most likely get a technically perfect, but wrong system, or the right system, but too late.

Iterative-Incremental Software Development

The main insight comes from the observation that Waterfall could work pretty well for a relatively short period of time, say up to three months. After that the uncertainty is too high. This is a fundamental part of the whole game: many stakeholders might think they do know precisely what do they want, but when they get the system in their hands they might change their opinion, sometimes even dramatically. The contemporary software development is so-called open system with feedback. In other words what we deliver will have an impact on what we are supposed to do.

One possible way to address the risk of being wrong at large is to afford the risk of being wrong at small and to correct when necessary. In other words we have to break the whole period of, say, 9 months or more, in smaller chunks, when a part of the system functionality could be exercised and a feedback is provided.

We will call these chunks of time iterations. At the end of the each iteration we are supposed to have a stable, partially functioning system, which does something useful. Unless the whole iteration was devoted to bug fixing, each iteration will add a new piece of functionality – an increment, hence the name of this approach: Iterative-Incremental.

Having a partially functioning system would allow us not only to demonstrate but also to test it. This in turn minimizes the risk of getting unpleasant technical surprises too late. In addition it helps to address the risk of being too late. If the window of opportunity is going to close we might decide to deliver a partially functioning system following the “something is better than nothing” wisdom.

This is a general idea. How exactly iterations are used depends on the particular process.

Rational Unified Process (RUP)

RUP is defined as a framework for Use Case Driven, Architecture Centric and Iterative-Incremental software development processes.

The Iterative-Incremental part of the process helps to address the risk of being wrong or/and too late.

The Architecture Centric part of the process helps to mitigate the risk of faulty system by addressing early technical non-functional risks such as performance, scalability, extendibility, consistency, reliability, etc.

The Use Case Driven part of the process, again, helps to address the risk of being wrong by using relatively small chunks of functional requirements, called use cases, to steer technical decisions.

RUP Iterations

In RUP iterations play different roles. The first couple of iterations are used for requirements gathering and making initial go/no go decision. The next couple of iterations are used for making long-term technical decisions about the system architecture and building an initial architectural prototype. The next, usually the largest, group of iterations is devoted to building an actual product. During the last couple of iterations the built product is passed to prospective users/customers in a form of alpha or beta site.

These groups of iterations in RUP are called phases: Inception, Elaboration, Construction and Transition respectively. RUP assumes that the intensity of various technical activities, such as requirements, design, and coding, in different phases will be different.

Unified Requirements Management

RUP applies use cases as a tool for functional requirements organization. Use case is a set of possible sequences of interactions between the software system and one or more external actors. Every use case is initiated by some actor (human or, sometimes, non human), which has a particular goal to be achieved. The use case stops when this goal has been achieved or an error has been reported. There is a certain danger of misusing use cases for the system functional decomposition, which leads to too fine granularity too early. The easiest way to understand use cases is to treat them as top level chapter titles of the system user manual.

The tricky part of software requirements management is to realize that it is in fact a craft of decision making under constant pressure balancing between multiple stakeholders having different power. RUP, more specifically its Unified Requirements Management methodology, admits this fact by putting software requirements into a context.

Whenever some stakeholder presents a certain request towards the software system we are going to build, this is a requirement, but this is a requirement of a very special type: it is a stakeholder need.

When somebody from the project management group makes a decision which capability the system will have or which service will it provide it would also be a requirement, but of an absolutely different nature: it would be a system feature.

When we make a decision about how external actors will communicate with the system we come up with use cases. Many times the basic set of use cases is prescribed: Subscriber Management, Broadcast Program Scheduling, TV Program Watching and Content Delivery use cases are pre-defined by the very nature of our system.

Mapping stakeholder needs onto features and use cases forth and back is a subtle and delicate process, which requires good analytic and human communication skills.

Usually features manifest themselves in multiple use cases and here the most interesting part of requirements management starts: within every use case every feature will potentially lead to one or more alternative event flow. Ability to analyze and specify these alternative flows properly in a timely manner is the key project success factor.

Features constitute a set of variability points in our software system. Reflecting these variability points properly in code would be a subject of software architecture and configuration management.

RUP Architecture

RUP recommends reflecting all fundamental technical decisions about the software system structure in a separate document, called, Software Design Specification. This document is normally prepared during Elaboration phase. The main claim is that without having properly laid down foundations at early stages the whole system would be too shaky, inconsistent, and would most likely crack down under one or more non-functional requirements pressure: performance, reliability, etc. As Grady Booch, one of the UML creators is used to saying, building a skyscraper by dog house methods is a guaranteed way to disaster.

In RUP the Software Design Specification document has a special structure reflecting the fundamental fact that for any non-trivial software system, more than one view is required in order to describe its essential characteristics in enough details. This approach in RUP is called “4+1 View of Software Architecture”, which suggests to describe software system using Logical (structure, functionality), Implementation (build process and configuration management), Process (performance, scalability), Deployment (high-availability, communication), and Use Case (external actors perspective) views.

In many systems the Logical view plays the most critical role in managing complexity. One of the reasons that we are so often too late or develop wrong systems is that we are prone of creating monsters we too quickly lose a control over. The reason for this is that almost any contemporary software system has a large number of variability points – features. These variability points typically belong to different levels of abstraction: supporting certain graphics chip set is at another level of abstraction than supporting a particular pay-per-X business model. Mess up all these features in one monolithic block and you will never be the master of your code. The code structure will dictate what is possible, and what is not. This is the reason why so many software systems are over-featured (remember the 45% cited at the beginning?). Software developers just cannot take out a particular feature without destroying the whole system. Due to improper code structure this obsolete feature is interlocked in a single monolithic structure with all others. Some of these other features are mission critical, so it’s too risky to make any change. It’s scary to think at which extent software technical flaws could shape the whole business.

The matters are much worse when the logical structure is intentionally messed up for job security reasons. Have you ever heard the claim: “here everything is connected to everything so only we could develop such and such feature”? It happens here and now, sometimes unconsciously, sometimes not. It’s not completely inconceivable to have an architecture where business rules are implemented at the disk drivers’ level. It not only means poor engineering, it first of all means bad business.

Using Logical view one could properly reflect major variability points by properly structuring the system in terms of layers (levels of abstraction) and subsystems (loosely coupled functional blocks).

Unified Configuration Management (UCM)

This is another thing to be taken from RUP. It is not about company-wide adoption of Clear Case (another controversial topic deserving a separate paper). It’s about a particular style of work.

UCM is first and foremost about protecting your investment. Iterative/Incremental (especially use case-driven) development by definition assumes changes in code. Even though you lay out your architecture in advance, you will change a least some of your code in order to implement new features. This could not be avoided if you want to deliver “the right system”.

The code changing process must be fearless. The fear hurts our judgment and prevents us from analyzing trade-offs properly. The only way not to be scared of changes is to know you can always rollback your recent changes and start over. This guarantees you will never make any serious damage to what has already been done.

The UCM development and integration streams are exactly for this. For every small activity (task, in agile terminology) you check in to your development stream. You could always discard any recent changes and start over again using the recent version of your development stream. Once the task is done, check-in, but do not deliver yet.

Typically several tasks are connected together in a group in order to implement what in agile practices is called user story. If you check-in all changes directly to the main trunk in your version control system, it might create too much disturbance to other developers. For that reason keep your development stream as long as you are working on your part of the user story. If things go wrong you will be able to discard all changes, rebase the development stream and to start over. When the story is ready, deliver your development stream to the integration stream.

The UCM practices not only help us with developing the right system by creating a proper support for code changes. It also helps with delivering it on time through establishing an effective insurance policy against wrong changes.

AS: this is my relatively old article. Today I’m not sure about separating integration and development streams since in practice they do more harm than good. Today I would strongly recommend working on trunk only (perhaps a topic for another blog).

SCRUM

RUP is a great software development process framework. This framework, however, is currently specified in about 3000 pages. Software developers do not have time and are not patient enough to read even 10 pages. How could one expect them to read 3000? And that’s not all, after you have read these 3000 pages you will need to customize the process to your organization, or, even worse, to each project independently.

In addition RUP, as Waterfall, follows a too mechanical view of the world, which is supposed to adhere to the Aristotelian “excluded third” principle. Things are too black-and-white, too hierarchical here. In reality things are usually gray: successful software organizations seldom follow any kind of hierarchical structure prescription. Apparently a mechanical, clock work – like treatment of the software organization is limited. It typically stops working when requirements are too blurry and/or time frames are too short. Applying organic, biological analogy would be more productive.

The first thing is to appreciate the interaction. Interaction not only between the software organization and its customers (we addressed this by endorsing Interactive/Incremental development), but also between the software development team members.

Developing contemporary software is an extremely intellectually demanding activity requiring the best brains in the town to make it successful. This imposes a tough dilemma for managers: if you could really manage these guys, you have to fire them, since they are not smart enough, if they are really smart enough, how could you hope to manage them?

SCRUM, one of the most popular agile software development methods, rises to the challenge: it describes clear roles and responsibility of team members while admitting the explicitly unpredictable nature of the software development process in general. As many other agile methods SCRUM faces the three major risks simultaneously: we want to deliver the right system, at the right time of the right quality.

The SCRUM way is to establish a team of people having appropriate skills and let them to find the best way to accomplish this goal. The whole process is built on the top of very close, informal communication between team members and customers. That would allow to exhibit maximum flexibility and to keep all critical parameters: functionality, quality and time to market in a proper balance.

This close communication would be possible only if the team is of right size, usually 5-10 people. For large systems certain coordination between SCRUM teams would be necessary, and here RUP recommendations would be indispensable.

All team members in the SCRUM team are equal: there might not even be a clear separation between developers and QC. Some team members have special responsibilities. The first one is of Product Owner (sometimes may be not a part of the team- AS). This is the person, who using the URM terminology converts stakeholder needs into features, maps them on use cases and presents to developers in a form of backlog records: things that need to be done.

The Product Owner takes the responsibility to ensure that the team will deliver as many as possible useful features in right priority order: the most business value first. If the customer changes her mind, the Product Owner will defend her and will ensure the product plan will be adopted accordingly.

Having only a Product Owner on the team would make an imbalance: too many, too sharp changes could easily distract and discourage the whole team. Somebody needs to play a balancing role and ensure that what has been agreed upon will be done, and what is done is indeed done and not just pretends to be. It’s not a role of manager, but of a Shepard, who protects the flock against wolfs. The team members are engaged in very intellectually demanding tasks, and in spite of how much smart they, they could be vulnerable to external distracting factors including their own Product Owner. The team needs protection and steering. In SCRUM this role is fulfilled by the SCRUM Master, whose responsibility is to remove any obstacles, which impede the team from making progress. The SCRUM Master makes sure the process is working as it has been agreed upon. If it requires saying sometimes, “no” to even their Product Owner, (s)he will do it.

In a SCRUM the team, the Product Owner, and SCRUM Master together play a game, in which everybody has a special role. As in any game, SCRUM needs certain rules and rituals, which come in the form of Sprint (SCRUM version of iteration), SCRUM daily meetings, planning meeting, demo and retrospection.

In SCRUM all iterations (Sprints) are equal, there are no phases. Every Sprint is supposed to be exactly 30 working days. This timeframe is considered to be necessary and sufficient for the team to self-organize in order to deliver a meaningful increment. Once the Sprint plan (Sprint backlog) is approved it cannot be changed.

From the risk management perspective it means that at the worst case we will need to throw out 30 days worth of the team work if we were completely wrong. In practice this seldom happens. It also means that the Product Owner will need to wait 45 days at average to get a newest, hottest feature (s)he decided to ask for now. This is quite a long time to wait and not everybody is comfortable about it. Some people argue that this would require from Product Owner to do her homework better. Others would argue that rapidly changing priorities are the nature of the software business and that the iteration length should be shortened to two weeks, or as Extreme Programmers argue, ideally to one week.

We will leave this debate to experts. As the matter of fact stable 30 days, back to back iterations for mainstream products are yet to be seen in our organization.

The whole SCRUM process is based on self-organizing team dynamics rather than on any particular programming techniques. The team members are supposed to be smart and professional enough to find the most optimal way to make their job done. Test automation is probably the only, though a very important, exception from this rule. SCRUM makes is very clear: whatever your completion criteria is, you must automate the verification process. If tests are not automated, you won’t be able to run them as frequently as needed, and will not be able to ensure “what is done is done” by the end of the Sprint. SCRUM does not, however, say how to automate the tests. That’s again is fully delegated to the team.

For more information about the SCRUM process look here or here.

Extreme Programming (XP)

From the development process organization point of view XP is quite similar to SCRUM except that it strongly advocates weekly iterations as a better “deliver right system” risk management tool. For those for whom it works it might be a wonderful advice. While SCRUM is concentrating on what is possible, XP is looking for what is the most optimal. Each approach has its own merits and it’s not the place here to decide what is better. In its constant pursue of the best, XP came up with a bunch of engineering practices, which one has to consider as a way of implementing any Iterative/Incremental development process including RUP and SCRUM. A good overview of XP practices could be found here and here. Here I will briefly touch only those, which are directly connected to one possible way of implementing SCRUM: units of planning (user story), units of work (tasks), test automation, code refactoring and continuous integration.

Use Stories and Tasks

SCRUM does not prescribe specifically what the nature of backlog records is. In practice many have adopted the XP concept of user story as a primary unit of planning.

XP firmly believes that anything, which does not demonstrate s direct value for customer, should not be done. With this regard XP is indeed extreme. Want to decide about your database engine or to establish some performance benchmark? Find a simple scenario, which does make a sense from the business perspective (the Product Owner decides), and which would require these technical capabilities. It sounds a bit restrictive, but the whole point of this philosophy is to eliminate the waste. In XP philosophy any extra line of code is a liability, not an asset, unless the otherwise is proven from the customer’s value perspective.

A user story is a sequence of interaction between external actor and the system in order to provide the actor with some value. The most important thing about user stories is that it should be possible to implement each story in a reasonably short period of time, say 2-3 days. User stories definition sounds similar to that of use cases, but user stories are NOT use cases. The latter are groups of scenario. The former are just manageable chunks of work. In some cases a user story could span across several use cases, but usually each user story would be just one possible scenario within some use case exhibiting a certain feature. In general user stories are much less formal than use cases. They are mostly intended to facilitate communication among stakeholders.

For each approved user story the team needs to break it into elementary units of work: tasks. Many times tasks are architecture driven: change GUI, change business logic, change database, change infrastructure, change development environment (if required), and change user documentation (if required).

The team estimates user stories based on complexity using so called story points. All what is required at that moment is to asses each story against others: this story is more complex, that story is simpler, etc. During the time the team learns about its velocity: an average number of story points the team is capable to deliver per iteration. Knowing the team velocity would make the estimation process more accurate.

Tasks are usually estimated using ideal hours, or days: how much time will it take to implement the task provided there are no interruptions.

Test Automation

Initially user stories are collected using simple sentences like “As a I may do XYZ …” These are merely reminders to get into deep discussion about what does it really mean to be able to do this XYZ. However when the time comes to develop the user story we need something more specific, we need a user story acceptance criteria. The user story acceptance criteria are specified in a form of acceptance test suite, which could be run automatically. If the system does pass all these tests, it would unequivocally mean that the system does implement the user story. In this case the user story is considered to be done.

Acceptance tests are fully automated. This would ensure that any new user story implementation does not break any already implemented user story. Like UCM this is yet another important insurance policy mechanism: to prevent from falling back while moving forward (this happens too often in software to be ignored).

Acceptance tests are not the only type of tests, which are automated. In reality software quality is established at a micro level: individual functions. It’s just virtually impossible to cover all possible permutations at acceptance test level. In order to ensure this atomic level quality we need another type of tests: unit tests. I described the agile testing approach in more detail elsewhere in a separate article.

Code Refactoring

Code changes are inevitable. In fact code changes happen all the time regardless of whether one is using Waterfall or XP. The only question is how the code is changed. In XP the constant code changing is the norm of life. If code changes are inevitable it would be more practical to spread many small code changes across relatively long period of time rather than to try to perform instantly one big change. The latter is too risky from the quality and time to market perspectives.

Refactoring is a technique of improving the code structure without changing functionality. Refactoring is performed in order to clean up some mess collected during the time or in order to accommodate new functionality more easily. Refactoring is about risk management. If we do not refactor our code, then on the long haul our ability to manage properly the trade-offs between getting the right system at right time with right quality will deteriorate: we just won’t be able to keep all three characteristics close to the optimum.

Continuous Integration

Putting pieces together in software takes time. In other words integration is the number one risk from the time to market perspective. We never know exactly how much time the integration will take. Why it’s so? Because we are trying to put together too many changes, and too many things are out of the synch simultaneously. Following the XP philosophy we would like to spread the burden, to minimize the risk, which means to integrate as small as possible portions of change, which in turn means to integrate as frequently as possible. Needless to say the process must be automated.

Concluding Remarks

It would be silly to hope to cover four software development process methods mentioned above it a single article. Even in a modest scope it would require a number of volumes not talking about more technical details, which are exactly the place where the whole difference between success and failure lays. The main message I would like you to take out of this article is that any software development process is about risk management. The major three risks are: to deliver a wrong system, to deliver it too late, and/or to deliver it with poor quality.

Do not confuse means and ends: any process, which allows managing these risks in a cost-effective way, would be good. Creating a software development process from the scratch, however, is a risky endeavor: it takes a lot of time and trial and error to come up with something reasonably good. Without a good reason any organization should not do it. Anyhow there is always a stage of learning the best practices and trying to understand what does and what does not work for us, and why.

Combining essential elements of RUP, SCRUM, XP and even Waterfall would give us a good starting point for ensuring our competitive advantage. The “learn first” principle is the best advice one could expect to be given with this regard.

Agile Testing

"People who look for easy money invariably pay for the privilege of proving conclusively that it cannot be found on this earth."

Jesse Livermore, “Reminiscences of a Stock Operator”

Introduction

"Easy money" does not exist in the software business, just as it does not exist on the stock market. Those, who think otherwise usually, pay a hefty fee to be proven wrong. Any hope of solving software quality problems through a numbers crunching campaign, without a serious study of the nature of the problem, is futile, and will most likely make matters even worse than they were. One does not hope to get a Wimbledon medal without spending years playing tennis. Why should it be different for software?

Quality and time-to-market are the most important business success factors of any business in general and of high-tech business in particular. When given a choice, customers will not tolerate poor service and shoddy products - they will switch to another vendor. Quality is perhaps the most profitable investment of time and money, but it does not come free. One needs to learn how to achieve it in a cost-effective and pragmatic way.

Let’s see how Agile folks are solving the software quality problems and why it might work with enough time and energy invested in learning how to apply these techniques properly.

What is Agile Testing?

There is a wide spread understanding that testing means that one produces something and somebody else checks that there are no mistakes. This is a false impression.

Edwards Deming, the spiritual father of modern quality assurance, makes it crystal clear: “Cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place”. Edwards Deming, the father of the Japanese industrial revolution, formulated his famous 14 management points in the early fifties of the last century. By the end of the century, his ideas were widely adopted by the Agile Approach proponents in order to ensure the software quality in the first place.

Briefly, Agile Testing means specifying tests before development starts. These tests are run automatically by developers and by continuous integration servers as many times as required in order to ensure that what we think we developed is actually what we developed. Unless the software passes all the tests, the corresponding feature is not considered "ready."

Those who are capable of specifying tests in advance, fill the role of domain and quality assurance experts. Usually these are the most experienced members of the team. Needless to say, the very concept of blue-collar testers just does not exist in the agile team.

Now let’s try to understand what’s wrong with traditional, post-development testing.

What’s Wrong With Traditional Testing?

Delaying tests until the end of software development means serialization of the process. Serialization means very long delivery times. Long delivery times in turn means that the risk of developing a technically perfect, but completely useless system is too high. In order to ensure that we develop what is really required we have to get feedback as early as possible. When development iterations are shortened to, for example, one month the post-development testing eats up most of the iteration’s time budget, not leaving enough time for developers to develop quality code.

Frankly, if you find many defects by the end of the iteration, what can you do about it? It’s too late to change the code.

Therefore:

All critical acceptance tests must be ready before development starts.
All team members claim personal responsibility for the code quality.

There is no place for the kind of “we write code, they will find bugs in it” adversely Development-QC relationships. The whole team either succeeds or fails together.

The Importance of Test Automation

Even in traditional testing, tests usually are specified in advance, in the form of an Acceptance Test Plan document. So what’s the difference with Agile Testing? The main difference is that ATP is usually a document, and by definition it cannot be run automatically. If it’s possible at all, somebody at some point in time will convert this document into a script to run regression tests automatically. The problem with this approach is that those who specify tests cannot always verify the script, and those who write the scripts, cannot always understand the domain well enough, and thus might introduce subtle mistakes while converting the ATP document into scripts. For non-trivial domains the probability of mistakes grows enormously (and where is profit in simple domains?).

Fully automated regression tests are also often created only when the system is already developed. In fact, these tests are just a reverse engineering of what the system is doing, not what it is supposed to do.

In the case of User Interface intensive applications (e.g. the EPG), automatic regression tests created this way come very late, are too sensitive to even the slightest changes in the GUI, and in general, are not cost-effective.

If tests are not automated, but instead performed manually, the project will slow down to a crawl. The more features that have been developed, the more regression tests are required. Humans are very bad at performing repetitive mechanical tasks. Therefore, not having automated regression tests means severely limiting the project throughput.

Without thorough regression tests, we cannot guarantee backward compatibility, which in turn means we cannot deliver the next version from the main trunk of our source control to existing customers. As a result, multiple release and/or customer-specific branches will flourish in the version control system, and the overall maintenance cost will increase significantly.

Not having automatic regression tests also means we cannot re-factor our existing code base in order to adopt it to new requirements and improve its general quality. As a result, the code will very soon reach the “don’t touch me” status, and its quality will continuously deteriorate with any bug fix or change request.

Which Kind of Tests?

The Agile philosophy distinguishes between two basic types of automatic tests: Integrated and Unit tests. Integrated tests can be subdivided into Acceptance Tests, Endurance Tests, and Stress Tests.

To produce quality software, one has not only learn about each individual testing technique, but also to acquire a clear understanding of how these techniques complement each other, and why it’s so important to apply all of them in the right proportion.

Integrated Tests: Acceptance Tests

Acceptance tests should be specified before development starts in a format that enables automation. The scope of Acceptance Tests might vary from an end-to-end system to a single component. Acceptance tests are specified in the form of HTML tables called fixtures, and are run using Framework for Integrated Tests (FIT) or one of its extensions: FitLibrary or Fitnesse.

The FitLibrary extends the original FIT with additional types of fixture tables, while Fitnesse wraps it with a Wiki site in order to facilitate collaboration in acceptance tests specification. Fitnesse also has some handy tools for large acceptance test suite management.

There are three basic categories of fixture tables:

to setup initial pre-conditions
to exercise some system functionality
to verify post-conditions

The fixture table names are automatically mapped onto underlying programming language class names. The main versions of FIT, FitLibrary and Fitnesse are developed in Java and then ported to other programming languages: C#, Python, Ruby and even C++. I’ve found the C++ version to be not user-friendly, and for Integrated Acceptance Tests of C/C++ modules we use a special integration between bmock library and Java (so-called bmock console mode).

Acceptance Tests specified using FIT or some of its flavors are extremely powerful for dealing with large permutations of input values: parental rating strings formatting, error message prioritization, lengthy events sequences and complex state machines.

A critical trait of Acceptance Tests is that they are supposed to be run using only the core system under test, without involving any elements of the real environment: GUI, databases, file system, heavy communication protocols (e.g. FTP). The reason for this is that we want our acceptance tests to be fully controlled and to run very quickly (we will have a lot of them). Dealing with the real environment (e.g. real STB) will typically slow down the process significantly, and make it more complex. Using Acceptance Tests leads to a better, more modular design.

Acceptance Tests do not guarantee a high percentage of code coverage. The reason for this is that test complexity grows exponentially, and any attempt to cover all possible edge cases of all possible scenarios would lead to unmanageably large test suite. Achieving 100% coverage of lines of code is the goal of unit testing.

Acceptance Tests do ensure a proper functionality of the system, but do not guarantee its proper structure and long term maintainability. One has to combine the both Integrated and Unit tests in order to achieve required code quality level.

Other Types of Integrated Tests

Endurance Tests are intended to validate that the system under test will work a certain number of hours without crashing. More specifically, these tests validate that there are no resource (memory, file handles, sockets, etc.) leakages in the system.
Stress Tests are intended to validate the system’s throughput and latency for a certain workload (number of concurrent users). Formally speaking, the system latency is the function of the system throughput and the length of the internal queue. For single user systems like the EPG, stress tests goals typically need to be defined more specially.

Both types of Integrated Tests are usually run in environment which is as close to the real one as possible. In order to cope with test scenario complexity, we have to keep the variability of this kind of integrated tests within a limited scope - only a small number of selected and fully specified test scenarios.

Unit Tests - 100% Line Coverage

Unit tests provide a test for each branch or each method (or function) of each class (or module). In order to avoid exponential blow up of the tests' complexity, the system under test should be properly modularized.

Within the scope of particular class or module unit tests, all classes or modules, it depends on, will be replaced by mocks. There are unit testing and mock objects frameworks for each popular programming language: JUnit and EasyMock (or JMock) for Java, NUnit and NMock for C#, Boost.Test and bmock (developed by this author) for C/C++, etc.

The goal of unit testing is to achieve 100% line coverage. Only then one can be confident that all possible edge cases will not provide any unpleasant surprises in the real environment. This is usually possible only through a proper modularization of the system and using mock objects: there are types of edge cases which are virtually irreproducible in the real environment.

For some low-level rare edge cases (e.g., no memory) specific requirements do not exist. As long as the system behaves reasonably well and does not crash, any edge case handling mechanism would be ok. Applying this technique could reduce the size of Acceptance Tests suite substantially.

Unit testing comes to its full advantage only when it’s combined with test coverage measurement. It’s a big difference if one has 20% line coverage or 100%. To achieve the former is relatively easy once one adopts the unit testing approach in principle. It’s even not too hard to get 85% of line coverage. However, in order to get 100%, one needs a very high-quality modular code.

If it’s not 100%, you will never know whether the problem stems from something completely unimportant or if it’s a missed edge case. The only possibility is to develop an automatic custom of maintaining 100% line coverage all times.

We want our unit test suite to pass at least once through every line of source code. This is not the same as 100% branch coverage where all possible branches are executed under all possible conditions. If code is developed following the Test-Driven Development approach, the 100% branch coverage would be achieved automatically as a side effect. However, when talking about unit testing of legacy code, 100% branch coverage might be an impractical goal to strive towards.

Exploratory Manual Tests

The Agile Testing philosophy does not preclude manual tests. Actually the opposite is true: Exploratory manual tests are treated as an ultimate part of the Agile Testing portfolio. The emphasis here is on the word exploratory. Our automatic tests are as good as our knowledge of the system. If we missed something in the test specification, it will help very little that the target software passes all the tests. The only way to address this risk is to play with the system, trying to break it in some unusual, hard to predict way (today I would say just wearing hat of a naïve user). Some Agile teams adopt a practice to engage the whole team in the exploratory test session at the end of iteration; some teams delegate this task to domain experts, while others combine both techniques. For non-UI products, a specially tailored exploratory testing environment would be required. For example, exploratory tests of non-UI modules developed in C/C++ could be performed using the bmock library console mode.

If a discrepancy between desired system behavior and Acceptance Tests specification is discovered, this should preferably be reflected in a change request rather than in a bug report. Stakeholders too often change their mind once they get an opportunity to play with a real, even only partially functional, system. Treating all these changes of mood as bugs could easily create a false impression of the product quality and would hurt the team's motivation.

Agile Testing Summary: The Key Points

Agile Testing is about bug prevention rather than bug detection.
All regression tests must be automated. Otherwise the development speed will eventually slow to a crawl, and maintaining multiple branches in version control system will be inevitable.
Acceptance Tests specify requirements for each User Story in a form suitable for automatic execution. When the system passes its Acceptance Tests suite this means that its functionality satisfies all existing requirements. Acceptance Tests do not guarantee either proper handling of all possible edge cases, or proper maintainable code structure.
Unit Test suites should provide a test for each branch of every method of every class which leads to a 100% line coverage. The 100% coverage of lines of code guarantees a reasonable handling of all possible edge cases, and maintainable and highly modular code structure. To avoid exponential growth of the unit tests' complexity, mock objects are usually required.
Exploratory manual tests are performed at the end of each iteration by the whole team and/or domain experts in order to check if there is something missing in the formal specification of Acceptance Tests. Usually, people engaged in Exploratory Tests are trying to break it in unusual and hard-to-predict ways. Avoid interpreting of discrepancy between desired system behavior and Acceptance Tests specification as bugs, but rather, convert them into change requests.
In order to achieve a proper level of software code quality one has to combine all types of tests: Acceptance, Unit, and Exploratory. Additional types of Integrated Tests (e.g. Endurance, Stress) are added to the automatic regression test suite where and when appropriate.

Thursday, August 20, 2009

Testing Untestable

Robert Martin from Object Mentor has recently posted a blog where he describes how after 10 years of practicing Test-Driven Development (TDD) he encountered a bug he was unable to create a meaningful test for. I'm full of real respect to Uncle Bob, learned a lot from him and continue learning. I also did not manage to disprove his claim due to the lack of some technical details and lack of time to discover them by myself. Still I want to reflect on some premises made (or at least perceived from) this memo. These premises suspiciously look like identification mental disorder: considering closely related or similar things to be identical (described by A.Korzybsky in his "Science and Sanity"). The reason, why it so important is quite simple. Today thousands, if not more, of software developers practice TDD based on the assumption that everything is testable, at least in principle. Normally when I present TDD to novices a typical response (especially from the embedded software world), with a kind a skeptical grim, is: "Yeah. It all looks good in theory, but in the real world the most interesting bugs are NOT testable this way and have to spend the most of your time on the real equipment in the real environment". If not everything is testable than following this kind of excuse slippery road nothing will ever be testable. Therefore any particular case requires a very careful study in order to understand what happened and what could be done. Now to premises, exposed (or at least perceived by me) in the Uncle's paper:

If I, Uncle Bob, who has been teaching the whole world how to TDD, cannot test it, this bug in non-testable.
My class unit test is the minimal unit test possible, because see above
If I cannot test it using my favorite IDE, it's untestable
If I cannot test it using my unit test library (JUnit is this case) it's untestable
If I cannot test it in batch mode using my favorite build system (say, ant), it's untestable

Just to reiterate, I have no idea if this was the real intent of Rob Martin or not, I just claim that this was my impression. Here are my fixes to these identifications (first thing first):

We all have blind spots. Gurus and experts are especially prone to this, since they are too much convinced by the rest of the world that they do know the best. If such a thing like untestable bug does exist, it should be verified and analyzed by a larger community. What if some of us, mere morons, will find a way to test it?
Agile test automation need a very accurate definition of terms and conditions (see below). What one developer considers as a minimal unit test could still be an integrated test under certain angle of view (see below).
Tools are very important and useful things, but they are by no means identical to the unit test practice. If something cannot be tested using JUnit it does not mean a unit test is not possible. It might require some more imagination and effort, but still be possible.

Now I want to describe the basic premises of Agile Test Automation (specifically unit and acceptance testing) as I understand them. Basically learned from gurus like Robert Martin and Kent Back, but they probably have never formulated them in exactly this way:

Unit and Acceptance test automation suite specify unequivecally that if the system does pass all these tests it does behave according to requirements under assumptions reflected in tests. There is no claim that the system does not contain bugs in some sense. Even more this test automation suite IS the system requirements. Anything else are just wishes or speculations. If, for example, it's essential for our system that Java HashSet has a fixed order of elements when converted to List and had duplicates (R. Martin's case), we have to specify an automated test, which validates this assumption (in practice it's a bit more complicated, see below).
For every branch or every method of every class from whatever we decide to be our system core it is possible to write a unit test, which validates that this particular branch is developed according to the specification. All assumptions about the class surroundings are reflected in the unit test using mocks.
At the system boundary it is possible to introduce simple adapters, which will make unit testing of the core more convinient. Unit testing of these adapters might be impractical and therefore they should not contain any essential functionality, but rather to just raise a level of interfaces.
For every assumption about the system software behavior it is possible to write a simple, unit test, that disproves this assumption. The opposite, that is to write an automated test, which proves that all assumptions about the underlying system are correct in general case is not possible, or at least is not practical.
By passing all unit tests it is not possible to draw a conclusion that our system behaves correctly as a whole. For that purpose an acceptance test suite is required. As it was stated above collectively unit and acceptance test suites specify under which assumptions what functionality the system has to provide.
It is not possible to proof that the system will never fail, will do things not reflected in the automated test suite, or will not have some unpredictable defects emerging from putting multiple features together. The latter could be spotted only with a manual exploratory test.

To keep things simple in this blog I do not address the issue of additional types of tests such as stress, endurance, etc. See my separate post on the subject. Now to the specific point mentioned in the Robert Martin's post. My interpretation is as follows. There was an implicit assumption made somewhere in new Fitnesse design that Java HashSet will preserve the order of elements in convertion to List, even when there are duplicates. There was a suspicion that this assumption is incorrect and Robert came up with a conclusion that this kind of bug is not unit testable. Quatation from his blog:

"Unfortunately, the order of the rows in the list that was copied from the set is indeterminate.

Now, maybe you missed the importance of that last paragraph. It described the bug. So let me re-emphasize. In the case of duplicate rows I was depending on the order of elements in a list; but I built the list from a HashSet. HashSets don’t order their elements. So the order of the list was indeterminate.

The fix was simple. I changed the HashSet into an ArrayList. That fixed it. I think…

The problem is that I had no way to reliably create the symptom. I could not write a test that failed because I was using a HashSet instead of an ArrayList. I could not mock out the HashSet because the bug was in the use of the HashSet. I could not run the application 1000 times to statistically see the fault, because the behavior did not change from run to run. The only thing that seemed able to sometimes expose the fault was a recompile! And I wasn’t about to put my system into a recompile-til-fail loop."

As I (Asher Sterkin) mentioned I was unable to get a failing test due to the lack of some details, but I still do claim that it's always possible to create a simple unit test, which disproves ANY specific assumption about the underlying system. Here is a Java class I wrote specifically for this purpose:

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public final class TestSetDuplicates {
 public static void main(String[] args) {
  Set<String>rawSet = new HashSet<String>();
  String values[] = new String[] {
   "SuiteChildOne.SuiteSetUp",
   "SuiteChildOne.TestOneOne",
   "SuiteChildOne.TestOneTwo",
   "SuiteChildOne.SuiteTearDown",
   "SuiteChildOne.SuiteSetUp",
   "SuiteChildOne.TestOneThree",
   "SuiteChildOne.SuiteTearDown"
  };
  for(String i : values)rawSet.add(i);
  List<String> list = new ArrayList<String>(rawSet);
  for(String i : list) System.out.format("%s \n",i);

  System.out.println("");

}
}

This is obviously not a JUnit test and strictly saying is not test at all. It's a part of what's going on to be a unit test for HashSet to List conversion functionality. It also doesn't do too much, and this is a very important point: whatever test code we write it must do as little as possible in order to avoid any side effects we may be not able to predict.

Now, what was the Rob Martin's point: "The only thing that seemed able to sometimes expose the fault was a recompile!" In other words, for some reason recompilation has presumably an impact on the order in which HashSet will handle duplicates (to stress again, I miss many technical details therefore it's just a speculation, but hopefully plausible enough). Ok, if recompilation has a potential impact, let's test it. Here you go:

require 'ftools'

def compile_run()
 File.delete('TestSetDuplicates.class') if File.exist?('TestSetDuplicates.class')

 `javac TestSetDuplicates.java`
 return `java TestSetDuplicates`
end

first = compile_run()
puts first
1000.times do |i|
 print "\r#{i}"
 current = compile_run()
 raise "Inconsistent HashSet Behavior(#{i}): #{current}" unless first == current
end

This silly Ruby script does just this: recompile, run the Java test snippet and compare with the first run to see if there are any differences. Do it as many times as required (1000 in this case). If the number is an issue I could put this script on a nightly run for 1,000,000 times or more. Frankly I do not understand Rob's statement "And I wasn’t about to put my system into a recompile-til-fail loop" Lack of confidence in the JVM behavior is a serious enough problem to spend some time on its proper investigation.

Together the Java code and Ruby script constitute a unit test for this particular aspect of HashSet to List conversion. On my computer it did not produce any interesting results. In other words there were no any differences in order. I did it 1000 times and my gut feeling is that if the problem did exist it would show up itself at least once. Why it did not fail? There could be a number of reasons:

I misunderstood the Rob's problem. The most probable cause. Perhaps I just need to free up some time, to grab the Fitnesse code from github and to investigate it first hand.
The problem is correct but it fails only on Rob's computer, on his operating system, and/or on his version of JVM and JDK.
HashSet to List conversion is determinate and the problem is elsewhere.

As for now we just do not know, but the current premise is that whatever the case we will ALWAYS be able to construct a minimal convincing test, which disproves this or another assumption. There is more, which could be said about our hierarchy of assumptions. We do assume, that qunatum mechanics equations correctly model behavior of electrons. We do assume, that basic electronics elements of our computer (transistors, integrated circuits) are correct from the engineering perspective. The same is about CPU chip, memory, mother board, etc. We do assume that drivers of every device behave correctly. We do assume that our operating system does not contain bugs, which will affect our program. We do beleive that our virtual machine and framewor libraries are correct. The basic premise is that whenever there is a suspicion in any level we will be able to test this suspicion at THAT level without need for the whole system. Roughly saying the whole modern engineering and science are built on the top of this premise.

The problem is obviously is not with this particular functionality for that particular problem. Probably considering Fitnesse release pressure it would the most optimal resolution to just find a workaround. The problem is of more philosophical nature. One should not confuse goals with methods and the methods with tools. Our goal is to get enough confidence that our software does fulfill well defined requirements under certain conditions (expressed in a form of implicit and explicit assumptions). And we want to do it automatically every time there is some change. This is just a reasonable risk management strategy, which helps us to avoid last minute unpleasant surprises. We normally do not bother to check ALL our assumptions about underlying infrastructure" computer hardware, operating system, virtual machine, SDK. We just rely on the documentation. This is perfectly reasonable approach since checking all these assumptions might be impractical. However if we suspect some particular assumption we could and should check. TDD is a method to achieve this goal. It prescribes certain rituals, which help to achieve the goal in a cost effective way. More specifically it relies on the premise that adding tests after the code has been written is not practical. This premise is fully supported by the current experience. Tools just support rituals prescribed by the method in the maximum possible convenient way. Not to undermine there importance: without strong tools support many important methods including unit testing will remain in theory. Still when using a tool gets into a conflict with the method we have to choose method, and when applying the method gets into a conflict with the goal we have to stick with the goal and to adjust the method. Hopefully such kind of "faith crisis" does not happen too often.

The more fundamental problem is with our understanding about how do we think and how do we solve the problems. "Map is not the territory" claimed Korzybski. Whatever picture we hold in our brains is just a model, map, abstraction of the outside world. There is always a possibility that there is a MISTAKE. We cannot avoid it completely, but we could make it less probable. The biggest problem is with experts, gurus, Masters. Sometimes we acquire an overconfidence in our ability to grasp a correct picture. "It's obvious", we say. Let me put it blunt: nothing is obvious, there always must be room for a doubt. The less we suspect something the higher the probability that the mistake is exactly there, the more catastrophic could be consequences. Experts like Rob Martin are supposed to know this the best. As we can see this does not always happen. A good lesson for us, mere mortal morons: the expert opinion is just an input, no less, more.

Disturbing Thoughts