Over time the concept of a post go-live stabilization period has emerged. For many companies, this is a period of time after deploying a new major release to production where previously unknown issues are identified and rectified. Additionally, end users spend a period of time becoming proficient in the new release and eventually become comfortable as both the technology matures and their knowledge and comfort levels stabilize.
These periods have become the norm since "it just seems to work out that way." As we project managers have deployed a few net new implementations or major releases, it seems this period is inevitable. The project teams have done their due dilligence in design, building, and testing but inveritably there are issues either technical or functional that are always found after the deployment. We have done our best and "it is what it is." We find a previously unseen path in a business process or an additional value in a status dropdown menu. Perhaps, we also identify an end user who needs additional training in the execution of a particular business process. It could even be an overnight batch job that runs longer on the last Sunday of the month due to other software jobs running at the same time causing operational issues.
This leads us to a fundamental question. Are stabilization periods merely an inevitable part of a software deployment or are they indicators that we project managers have areas of our planning and risk assessments on which we could improve? As touched on above, the stabilization can occur in either technology or functional areas or both. Let's examine some ideas on how to reduce or eliminate the need for a stabilization period.
As project managers, one of our fundamental tasks is to assess risk, manage risk, and build project plans that minimize the risk profile. Below are a series of thoughts and concepts to minimize these risks and improve our project plans with the intent of minimizing or even eliminating the need for a stabilization period.
The first set of ideas takes the concept of a BETA testing period and moves it to the left. With BETA testing, project managers will typically deploy the solution to a small subset of users and enable them to operate the solution under true production conditions. These conditions use production data, real life business processes, address real life scenarios including all edge cases, as well as address user interface design and ease of use.
Free Form User Acceptance Testing
In most projects, we will define a set of test cases for testers to execute. We measure the completion percentages, define exit criteria for completion of the test phase, and base our completion decisions on quantitative data. These are all good things as they provide us a firm factual basis for making our deployment decisions. These techniques are a fundamental part of our projects and should remain so. What I am advocating is an extension beyond traditional UAT. The issues arise when we assume our use cases or test cases represent 100% of the scenarios the solution is designed to address. Although this is the goal, most project teams find this a very difficult task to achieve. Even with dedicated test teams, the task is challenging.
In a BETA testing period or stabilization period, the testers are free to execute any business process, address any scenario, and use any set of keystrokes imaginable. This is truly representative of the production environment since the users will not have defined keystrokes to address every production scenario. What would happen if we defined a second phase of the UAT to be a free form period? We could encourage the testers to execute their daily work since it is this work where the edge cases will be identified. Given the scenarios are randomized, the testers also get the added benefit of getting beyond scripts and are required to truly understand the software and become proficient with it but in an environment and timing that does not impact production work.
Effectively, what this does is move the "stabilization period" back into the project where we can plan and monitor its progress. The number and severity of errors will become known over time so the duration of this second phase can be difficult to pre plan. The advantage to this technique is we uncover and repair any errors in a test environment and not in a production environment where the business impact is much greater.
Run batch jobs in production once confirmed functional
In projects that are net new implementations, we can extend this same concept to any interfaces we build. The main issues with new interfaces are typically three fold: 1) interaction with other batch processes, 2) data moving from one system to another, and 3) target system functioning properly with data from source system.
During the course of our projects, we will build interfaces that both maintain lookup data like products in a eCommerce solution or transactional data like leads sources in a CRM solution. In each of these instances, unforeseen edge cases can arise and we would like to catch and repair issues associated with these cases. So let's move this identification process to the left and back within the product schedule where we can better manage the uncertainty.
As the interfaces are built, turn them on and let them run. If the eCommerce solution has a daily update of products from a PLM solution, turn it on and let it run each night. Over the course of the rest of the development, this interface will run enabling both the technicians monitoring the interface to become accustomed to it running, will catch any errors resulting from edge cases enabling the project team to fix them, and will identify any unusual run times and make the needed adjustments. All of this is done before the solution is in production enabling the project manager the time to appropriately plan and schedule the repair activities.
For interfaces such as lead generation and especially in cases where manual steps like pointing to a lead file are required, the administrators and even the end users have more opportunties to become proficient in the mechanics of uploading the file for example so are more proficient long before the solution goes to production.
The above concepts are driven by moving the stabilization activities to the left back into the project. The next three concepts address how we can improve the identification of those cases normally found by free form testing and ideally identify them in the early stages. Even with the techniques above, the project manager must address unforeseen development activities that can impact the project timeline. Identifying these cases during the planning cycles enables a more complete project plan.
Testing that simulates real world
Since we started the examination of concepts beginning with production and working back to the project beginning, let's continue this approach and address the formal test phases. What can we do here to improve our testing.
The fundamental premise of testing is to create or recreate the production scenarios so our project teams can identify and repair the issues eventually leading to the satisfaction of all business requirements or use cases. This remains a good objective but we can augment that objective by examining how we both structure the test cases and execute them in the context that more closely resembles the actual production environment.
Let's look at an example of a solution for financials. In this example, the solution includes a wide variety of business modules including Accounts Payable, Accounts Receivable, General Ledger, Billing, and Purchasing. In examining these modules as a set, they have a monthly rhythm as companies close the books monthly and define financial reports on a monthly basis that roll up into quarterly reports. If a public company, the quarterly reporting includes SEC reporting. Quarterly reports roll up into annual reports.
Could we structure the testing of a system like this to simulate a years worth of activity? The test period itself would be much smaller than a year but we could define a month as a week of testing. So week 1 of testing would translate into the first production month. Close the books on Friday and start the second month at the beginning of the next week. This would enable the test team to execute all the test cases associated with purchasing materials, receive the invoice, perform all required matching, cut the check from AP, then post the journals to GL and close the books at the end of the month. The same concept applies to the invoicing and AR side. Some purchases or invoices would remain in AP or AR until the next month as would naturally occur in production.
With this same technique, the quarterly reports and SEC filings could be executed and eventually annual reports. Having this in context both exercises the system through an entire year but also assists the end users in thinking through all aspects of the process as it is not within the context of software requirements but in the context of their every day jobs of which they are far more familiar.
Naturally as the issues are identified, the project team would prioritize and repair them. By structuring the testing within a more functional context, the project manager can achieve a production simulation that more closely represents what would happen during the "stabilization" period.
Can we also test the beginning of the business process first? For example, lead generation and processing begins with the generation or loading of leads before proceeding to lead assignment, then working the leads and creating opportunities, etc. If we flush out the lead generation functionality first as each subsequent sub process like lead assignment is developed we will naturally test the lead generation then lead assignment. The software will dictate we add leads before we assign them so the initial sub processes are continually regression tested as each sub process in the entire business process is added. Combine this will using production data and we are now continually regression testing previous builds, verifying edge cases by using production data, and minimizing the defect detection risks inherent in formal testing.
This leads us to the last two concepts which stem from how can we improve the initial set of test cases.
Business process flows that lead to use cases
The key to better test cases in my experience is getting two sets of people involved at the outset of the project. The first is actual users of the system. Typically, our project teams will ask the business users for their requirements or use cases and anticipate the resulting list of items will be complete and accurate. In my experience, this is a difficult question for a business user to answer because it is not within a context the business user typically works and thinks.
If we ask the question in the context of the business user, we can anticipate a better and more complete response. I like the use of business process flows as a predecessor to use and test cases. We ask the business user what activities do they perform on a typical day. Yesterday can be a great start or what did you do last week. It starts with the day to day activities and moves into the more unusual cases that we would call edge cases. Document the activities within a business process flow as this both defines a good mechanism to extract use cases but also provides a mechanism for defining alternate flows and negative cases.
When we reach a decision point, there must be a yes and no patch. Remembering to come back to the less traveled path may be difficult but stands out in a process flow. It also spurs us to ask the question "what if that does not occur." The business process flows enable our project teams to be more systematic in our investigation and documentation of the requirements. By being systematic, we can collect a higher percentage of the actual cases that will be observed in production.
Testers involved in defining edge and negative cases
As a supplement to this process and if a dedicated test team is available, these individuals should be involved from the beginning of the process. Testers by nature tend to be detailed people who are adept at finding both the edge and negative cases. Dedicated test teams also tend to have good relationships with the business users so they will feed off each other as they collectively think through how the business operates. Both of these dynamics are positive for the project manager as the end result is a more complete and detailed set of use cases or requirements from which the plan is planned.
If we follow stricter and more structured quality assurance and testing techniques, the length of stabilization periods become far less than previously experienced and may even become negligible. Thinking through the reasons for stabilization periods and structuring our project plans to minimize these risks enables project managers to have smoother running projects and minimize or eliminate the need for stabilization periods.