Previously the purview of dedicated centers of excellence, or even exclusively the procurement and finance teams, cloud cost management is rapidly becoming a required skill for anyone who consumes cloud resources on a day-by-day basis—and that includes software developers.
The emerging approach for cloud-first organizations is to have a central team that can manage broad consumption issues, like using the cheapest possible infrastructure for the job and negotiating committed-use discounts with vendors, while responsibility for the cost of individual services is pushed out to engineering teams that are incentivized to run as cost-effectively as possible, without sacrificing business value.
“You need that central expertise but also engineers to understand what they are spending in the cloud. … You want them to feel empowered to do something about their spending and how it stacks up to the value they are driving,” said Eugene Khvostov, vice president of product and engineering at cost optimization specialist Apptio. “Every organization is different and has different maturity levels and styles, but some of the more successful cases we have seen push that information to the edge and get engineers involved in that challenge, rather than issuing a mandate from on high.”
This can be a difficult shift to make, however, especially for organizations accustomed to lengthy procurement cycles and those that look to insulate their software developers from worrying about the total cost of their own services in a push for greater digital momentum. But now, as cloud costs continue to rise in the wake of the COVID-19 pandemic, the tide might just be turning.
Optimizing costs, not just code: Introducing finops
In their 2020 O’Reilly book, Cloud FinOps, J.R. Storment and Mike Fuller explain that in the old world of procuring enterprise hardware, engineers and operations teams would have to think about the cost of infrastructure well in advance. “Now, in the cloud, they can throw company dollars at the problem whenever extra capacity is required,” they wrote.
Although this has allowed for faster, more-effective development cycles, it also introduced a new set of considerations around the cost and business impact of those infrastructure choices. “At first, this feels foreign and at odds with the primary focus of shipping features. Then they quickly realize that cost is just another efficiency metric they can tune to positively impact the business,” they wrote.
A senior product manager for cost engineering at streaming giant Spotify, Janisa Anandamohan, wrote in a recent blog post, “we know engineers are natural optimizers when it comes to reliability, security, performance, etc. And now we’re telling them, ‘Hey, add costs into the mix.’”
While that optimization piece is one part of the puzzle, the more significant change is how to bring together previously disconnected groups in engineering, finance, and beyond. This organization-wide approach to proactively managing cloud costs is commonly known as finops. As defined by Storment and Fuller, “finops brings financial accountability to the variable spend model of cloud. But that description merely hints at the outcome. The cultural change of running in cloud moves ownership of technology and financial decision-making out to the edges of the organization.”
A cultural shift of this magnitude naturally equates to enterprise-scale challenges. Finding a way to get engineers to act was the most commonly cited finops challenge by respondents in the 2021 State of Finops report from the Linux Foundation-led FinOps Foundation, with 39% admitting to struggling to gain broad buy-in from their engineers. “One known finops challenge is to not only start the practice up, but to encourage and incentivize cloud users (like devs and engineers) to participate in cloud cost management,” the report said.
Here’s how five companies have gone about realigning their teams and incentivized engineers to take better care of their cloud costs.
Airbnb reins in spiraling cloud hosting costs
A few years ago, popular travel accommodation booking website Airbnb realized it had a big problem: Its monthly Amazon Web Services (AWS) cloud bills were growing faster than company revenue.
“We had a problem, but we lacked an in-depth understanding of how teams use AWS resources, and how planned architectural and infrastructure changes would impact our future AWS costs,” Airbnb engineers Jen Rice and Anna Matlin wrote in a company blog post.
However, given Airbnb’s “you build it, you run it” engineering philosophy, Rice and Matlin quickly realized that “adding significant friction for our engineers would be met with heavy resistance.” So the Airbnb engineers set out to build up the cost-attribution data required to start to show its data-driven developer community just how big a problem they were facing to gain some buy-in to finops.
At Airbnb, the approach to consumption attribution “was to give teams the necessary information to make appropriate tradeoffs between cost and other business drivers to maintain their spend within a certain growth threshold. With visibility into cost drivers, we incentivize engineers to identify architectural design changes to reduce costs, and also identify potential cost headwinds,” Rice and Matlin wrote.
This shift brought with it a centralized cost-efficiency team, armed with “a birds-eye view of the entire Airbnb ecosystem,” they wrote, and tasked with finding significant cost-savings opportunities. For example, Airbnb now leans heavily on AWS Savings Plan options, complete with “a set of prepared responses that move certain workloads on and off Savings Plan to keep utilization healthy,” they wrote. This team is now supported by a set of AWS cost champions, who sit in all product development organizations to support at the local level.
The result of all of this effort has been a major organization-wide shift. As Rice and Matlin wrote:
In addition to the various technical and organizational efforts to manage AWS costs, we saw a profound cultural change toward cost awareness and management. This shift was both top-down and grassroots. Leaders mentioned the company-wide cost goal during all-hands meetings. The finance team created a company-wide award for financial discipline, presented by the CFO, which recognized employees who had driven important cost-savings initiatives. In scrappy Airbnb style, the infrastructure organization held a cost-savings hackathon that spawned a number of impactful efficiency projects. Engineers learn best practices from one another and discuss new savings opportunities in a Slack channel. Upon launch, the AWS Attribution Dashboard became the most viewed dashboard at Airbnb and has since remained in the top list. Seeing this cultural change, we are optimistic that the recent cost reductions Airbnb achieved are not a one-off, but rather a new muscle that we will only strengthen with time.
As a result, Airbnb saw a $63.5 million year-over-year decrease in hosting costs, which contributed to a 26% decline in Airbnb’s cost of revenue in the nine months that ended in September 2020.
Sainsbury’s realigns engineering around cost accountability
Like many enterprises today, cloud investment at British retailer Sainsbury’s has been focused on building new features and digital capabilities for customers, which led to a rapid escalation in cloud service consumption. “Somewhere down the line, the operations team was trying to keep a lid on spend,” group CIO Phil Jordan told InfoWorld.
Now, following an intensive four-month change and training program throughout the COVID-19 pandemic, developers, operations, and product people are all part of what the retailer calls “engineering families,” which have full life-cycle accountability to the business. This new operating model pushes end-to-end accountability for a product or service out to the engineering teams, including cost management, vulnerability management, risk management, and partner management, all without being overlooked by the now-disbanded Service Operations team.
Those teams are now directly incentivized in line with a new set of devops research and assessment (DORA) metrics—deployment frequency, mean lead time for changes, mean time to recover, and change failure rate—plus service performance, total cost of ownership, and development cadence. Cost-management tools from vendor Apptio have been brought in to give engineering a more transparent view of their specific cost base, a tool Jordan said the company is placing “a lot of faith in to give those new teams full transparency of cost.”
Sainsbury’s piloted this new mode of working with the data engineering team throughout 2020, and “it was unequivocal that we demonstrated it drove efficiency, speed of delivery and colleague sentiment improved,” Jordan said.
Naturally, not everyone was on board with the change. “Some heads of engineering didn’t make the journey with us; they [just] wanted to do develop,” Jordan admitted. However, bringing together dev, ops, and product “has helped us pull together expertise to make engineering think more holistically,” he said.
Pushing responsibility out to engineering teams was a significant shift for Sainsbury’s, but Jordan said that it could account for up to 20% in IT cost savings in the long term.
Spotify taps cost insights to align infrastructure costs to customer growth
Similar to Airbnb, the music streaming company Spotify has worked hard over recent years to build cost optimization into the engineering process across the company after its infrastructure costs started to outpace user acquisition.
As an engineering-led company, Spotify decided to build its own cost-management tool called Cost Insights, which is built into its internal developer platform called Backstage and has since been open-sourced. Because Spotify mostly runs on Google Cloud, Cost Insights is currently geared to Google Cloud resources.
As RedMonk analyst James Governor detailed in a blog post, the idea behind the tool is “that engineers and engineering teams are incentivized to take more responsibility for the costs associated with the products they’re building. Modeling cost becomes part of the engineering process, rather than being a separate process for finance teams to manage.”
A culture of sharing cost-savings was encouraged through the Cost Insights portal itself and through an internal wiki called Our Cookbook. This encouraged competition among teams to drive down their costs and share major wins with the rest of the organization.
Cost optimization isn’t completely decentralized at Spotify, however. A cost-management organization is tasked with intervening if they see a team or service quickly ramping up costs, engaging with that group to find out why and what can be done to bring things back under control.
“Spotify found that the best way to get involvement was to encourage engineers to use Cost Insights in a cadence alongside existing quarterly planning. If there were issues that the cost team felt needed attention, they’d alert a team before those meetings. That said, if costs are rapidly escalating out of control for a particular service, that’s something that should generate an alert, so anomaly detection is on the Cost Insights product roadmap,” Governor wrote about Spotify’s efforts.
At Spotify, these costs are benchmarked against engineering resources, so if a team wants to optimize a service it must account for the value of that work in terms of full-time employees that could be hired using the savings. “Early experiences with Cost Insights allowed Spotify to fund the equivalent of 25 teams across the company,” Governor wrote.
Nationwide banks on finops as part of its cloud transformation
Financial services company Nationwide presciently decided to implement cost considerations in tandem with its broader cloud transformation program, which is currently in its third of four years—meaning that finops principles were baked in from day one.
However, that early start didn’t mean there was no engineering pushback. “The main value driver of cloud is speed of development, so you go from a traditional centralized procurement model to a world where every app developer is in procurement as well, so you turn into the Wild West without someone, or a team, looking at the financial implications of that,” Joseph Daly, director for cloud optimization services at Nationwide, told InfoWorld. “Everywhere I have gone there is initial resistance to this, as it is seen as additional bureaucracy which slows them down.”
With a degree in accounting from Miami University, it’s not that surprising that Daly boiled down the company’s approach to cloud cost optimization into a formula. “Your cloud bill equals usage multiplied by the rate,” he said. “We centralized rate management for things like savings plans and reserved instances at a high level for the enterprise. Then for usage we decentralized for application teams to be responsible themselves. Being informed needs a tagging strategy and structure, so when developers provision they tag something in a meaningful way.”