Blog
Paying Down Technical Debt in Your IT Infrastructure
Tyson Supasatit
February 15, 2014
The Phoenix Project co-author Gene Kim spoke at the ExtraHop sales conference in January, explaining technical debt and the 'IT death spiral.'
The idea of "technical debt" is one of the most important things I learned from The Phoenix Project. Described as "a novel about IT, devops, and helping your business win," the book has deeply influenced the IT operations community. In the simplest terms, technical debt is the result of not doing things right in the first place. Here's Erik, a lean-methodology guru in the book, describing technical debt:
"… like financial debt, the compounding interest costs grow over time. If an organization doesn't pay down its technical debt, every calorie in the organization can be spent just paying interest, in the form of unplanned work."
As illustrated in The Phoenix Project, the accumulation of technical debt results in constant firefighting and an inability to implement new projects quickly. A less recognized yet equally damaging result is the increased waste and noise in the IT infrastructure, causing:
- Unnecessary infrastructure purchases
- Greater load on critical resources
- Low signal-to-noise ratio
- Security vulnerabilities
- More places for malware to hide
Supporting Continuous Improvement with ExtraHop
Many organizations use ExtraHop to support continuous improvement environment, applying methodologies adapted from lean manufacturing. ExtraHop's Atlas Services remote analysis reports are a perfect fit for these "lean IT" efforts. IT organizations receive regular analysis across all tiers of their environment, identifying both acute and chronic issues, and then use these reports to create work items for their kanban-type scheduling systems.
By dedicating resources to paying down their technical debt—fixing misconfigurations, adjusting settings, optimizing scripts, decommissioning legacy systems, etc.—these IT organizations are freeing up capacity, increasing goodput, addressing issues proactively, and improving signal-to-noise ratios so that it is easier to spot anomalous behavior.
Real-World Examples of Paying Down Technical Debt
DNS
less than 1 percent across their entire environment!
The red bars at the bottom show DNS errors. After problems are fixed in the middle of October, the errors drop significantly
In August, DNS servers responded with 409,404 errors for 4.1 million DNS requests—an 11.6 percent error rate.
After the problems are fixed in October, the DNS servers responded with 15,987 errors for 3.09 million DNS requests—an error rate of less than 1 percent.
TCP
recreates the TCP state machines
In August, out-of-order segments and tinygrams were contributing to network congestion.
After the problems were fixed in October, out-of-order segments and tinygrams were reduced by 90 percent.
HTTP
This is a large environment with upwards of 3,000 web transactions per second at peak periods, and analyzing large amounts of data at the level of detail that ExtraHop does is no trivial task.
HTTP errors are reduced by 9.5 times after the problem is identified and fixed. In large environments, it can be difficult to analyze all transactions with sufficient detail to pinpoint problems.
Database
The problem causing the '(ORA-28000) the account is locked' errors is fixed on March 12, resulting in an almost complete elimination of database errors.
After the fix is implemented on March 12, database server response time is much more predictable (and fast, with responses in less than a millisecond).
LDAP
A general configuration change results in 5 times less load on the LDAP server and dramatic reduction in LDAP errors.
Make It Easier to Take the Doctor's Orders
Check out the sample Atlas remote analysis report below and then visit the web page to learn more.
Discover more