Datameer Blog post
Five Best Practices for Software Maintenance
by Joel Stewart on Jul 18, 2018
In this blog, we cover five best practices for system administrators to keep users satisfied when it comes to maintenance updates: schedule, think holistically, review urgency, test changes incrementally and repeat changes into production.
Let’s be candid, maintenance for software is not a thrilling topic but it is a critical one to keep your users’ experience optimal. Of course, new features are appealing and users are eager to work with the latest technology. For many of us, we’ve grown accustomed to having the most up-to-date software available through our mobile devices and even the online services we use regularly for email, search or shopping. Today, we’re not talking about new features rolling out — we’re talking about maintenance releases, or “bug-fix only” software updates. System administrators have a lot of considerations when keeping their users up-to-date. Today, we’ll discuss five best practices for system administrators to keep users satisfied when it comes to maintenance updates: schedule, think holistically, review urgency, test changes incrementally and repeat changes into production.
Ensure that you schedule maintenance windows regularly with the user base. Setting an expectation for maintenance windows ahead of time is crucial to users so they can plan ahead for any critical work that must be delivered. This is similar to construction notifications you may see on the streets of your daily routine. When you know that maintenance is upcoming, you can plan ahead to take an alternative route or travel at a different time. Knowing when maintenance is scheduled to happen can significantly improve the user experience. I recommend that these windows be regular and predictable for the foreseeable future and, most importantly, published to all users. For example, every Sunday between 12:00 AM – 2:00 AM for a production environment and everyday between 12:00 PM – 2:00 PM for a non-production environment. Note that not every eligible window will maintenance time be taken, but it is regularly available to administrators if required.
2. Think Holistically
The next best practice for system administrators is to think holistically about the software stack. Specifically, subscribe to receive notifications for newly available maintenance releases for the entire software stack, not just a single application. Let’s consider a typical Datameer installation as an example. Most commonly, Datameer installations include the following dependencies: operating system, Hadoop, Java and MySQL. Each of these software elements are maintained on a different schedule and are distributed as separate software binaries. Correcting or preventing an issue in any one of these elements will have a positive impact on your users and deems equal attention from the system administrators. Lastly, when monitoring for new maintenance releases, it’s also important to pay attention to any end-of-maintenance dates for versions installed in your software stack. Versions that are no longer receiving maintenance updates at all should trigger an upgrade project to keep your users current.
3. Review Urgency
As you receive notifications about the availability of new maintenance releases, make it a priority to review the list of fixes thoroughly. Reviewing this list helps you understand the urgency for when patches should be applied, especially if multiple system elements have maintenance patches released around the same time. When reviewing this list, administrators should do their best to review from a user’s perspective. Specifically, ask yourself the following two questions: what is the benefit of applying this fix? What is the risk of not applying this fix? Fixes for issues that your users have actually reported are commonly the most urgent to resolve. At times, there will be several maintenance fixes that are required for the full software stack. When multiple changes are desired, it is important to rank them from a user’s perspective so that they can be applied in this order — more on this in the next section.
4. Test Updates
The next best practice may seem obvious, but so many try to skip it: test the maintenance updates! Specifically, test the maintenance updates in a non-production replica environment first. When applying the maintenance updates, I recommend a scientific approach. By scientific, I specifically mean that we should perform a “control” test without any maintenance updates applied. After that, apply a single maintenance update and re-run the test suite. The tests are intended to verify two things: (1) that behaviors that changed matched expectations; and (2) that no unintended changes were observed.
To summarize, the testing process should have three steps: (1) perform a “control” test, (2) apply one maintenance update and (3) perform a validation test. These three steps should all be performed during the non-production maintenance window — this allows administrators to measure the duration required to apply the update can successfully complete during the expected time. For the most conservative system administrators, it’s also a good practice to test reverting a maintenance update in case production produces different results than non-production.
5. Replicate in Production
The final best practice that I want to share is to repeat the steps in production. Emphasis is intended on the repetition of the steps. This step again takes a scientific approach. Specifically, we’re expecting that if we perform the same steps in production that we did in non-production that we should get the same results. No matter how similar the non-production and production environments are to each other, there will always be some difference in the environment. Since there are differences, it is still recommended to perform the following three steps: (1) perform a “control” test, (2) apply one maintenance update and (3) perform a validation test before concluding the maintenance window.
By using these best practices, system administrators can keep users satisfied and productive with regular maintenance updates.