Being named as one of the best IT experts in Canada, iRangers team is often involved in various assessments, discoveries and troubleshooting engagements. While analyzing these cases from several past years, we found some interesting commonalities. No matter which technology is being deployed, assessed or troubleshot, whether it is Microsoft, Citrix, VMware or other vendor products, there are several common themes and trends. In this post we would like to share our observations and describe these common issues, and, what is more important, explain how to avoid these problems by taking proactive steps.
It may seem obvious, but lots of our clients underestimate the importance of running current software. Moreover, we found that the most common issue causing all sorts of problems to our customers is out-of-date software. This includes a wide variety of software from server hardware firmware and driver patches to workstation OS updates and Microsoft Office patches. You may think being a couple of patches behind is not such a critical issue, but we found that in vast majority of cases, our client was experiencing a known issue that was resolved in a current release. Another important aspect is that older software versions may not be supported by the vendor anymore. As an example, Microsoft support for Exchange ends 12 months after the next service pack releases or at the end of the product’s support lifecycle, whichever comes first, which means Exchange Server 2010 Service Pack 2 is no longer supported. In addition, maintaining software currency is especially important to be protected from known security issues.
It is also important to note that patching must not be limited to main product only, but include all inter-dependant components such as backup software, Java components, etc. to ensure compatibility.
The general recommendation is to maintain an N to N-1 software and hardware update strategy policy. N in this case is the latest service pack, patch, major update, maintenance release, driver, firmware version, etc.
Keeping software up-to-date is even more critical if you have cloud or hybrid environment. For example, hybrid configuration with Microsoft’s Office 365 requires that all participating servers MUST run the latest or at least the prior version of Exchange to be supported and compatible with Office 365.
Change Management and Test Lab
Change management and control is a formal process that is used to ensure the environment remains healthy. Change control enables you to build a process by which you can identify, approve, and reject proposed changes. It also provides means by which you can develop a historical accounting of changes that occur. We often see that even those of our clients who have change control implemented only use it for larger changes and forego this process for what may seem to be a less significant change.
In addition to building a change management process, it is also important to ensure that the planned change is validated in the test lab. We understand that maintaining a test lab today is not practical and may not even be possible, especially taking into consideration that not only main components but also 3rd party servers and applications must be included in testing. However, since most datacenters are fully virtualized today, there are tools available to quickly and efficiently provision an exact replica of your production environment.
Another common issue we see is bundling multiple changes together in a single change request. Making multiple changes at once when troubleshooting an issue should be avoided by any means. First, if the issue gets resolved, you do not know which particular change resolved the issue. Second, it is entirely possible the changes may aggravate the current issue.
Modern datacenters are complex environments that consist of hundreds and thousands of components. Sometimes we see our clients are adding even more complexity to achieve additional redundancy, optimize performance, etc.
The more complex the hardware or software architecture, the more unpredictable failure events can be. Managing failure at scale requires making recovery predictable, which drives the necessity for predictable failure modes.
We recommend our clients to keep the environment as simple as possible. The key elements of achieving simple, redundant environment which is optimized for performance is standardization and following best practices.
Another common issue we observed is that customers ignore recommendations from hardware and product vendors. We often hear various explanations and reasons why vendor’s advice about configuring or managing their own product was ignored. However, it is very rare to see a case where a customer knows more about how product works than the vendor.
It is important to note that most often the vendor’s recommendations and best practices come from analysis of data from a number of clients, product feedbacks and field observations. That is why if the vendor tells you to configure X or update to version Y, chances are they are telling you for a reason, and you would be wise to follow that advice.
We have noticed that many organizations tend to skip the planning and design phase of deployment or upgrade project and proceed with environment build and test (and in some cases build and rollout). As a result, the majority of such projects result in overtime, unexpected issues, budget overruns and improper implementations that require review and re-design, unexpected side effects and sometimes complete failures.
Another important step that is often forgotten is continuing to collect and analyze data after deployment is completed and adjusting it if changes occur. Frequently we see customers implement an architecture and then question why the system is overloaded. One should remember that environment is constantly changing. A good example of this is bring you own device which was not an option in many customer environments, whereas, now it is becoming the norm. As a result, customer’s messaging environment changed – users required larger mailbox quotas, the proliferation of devices, the capabilities within the devices, etc. These changes affected messaging environment design and consumed more resources. In order to account for cases like this, you must baseline, monitor, and evaluate how the system is performing and make changes as required.
As you can see, many of these issues could have been avoided by taking proactive steps.
We hope this post will uncover potential issues that may affect your environment and give you some food for thought.
Right in your email inbox
Useful data from iRangers Experts
Subscribe to our mailing list and get interesting updates and tips.
Thank you for subscribing.
Something went wrong.
One thought on “N-1: Where IT Departments S….?”
I found this article to be very useful in my research of the value of maintaining infrastructure consistently at N-1.
Thank you very much.