Scaling Agile Development Methods
In July I was proud to be invited to speak at Agile Conference Europe. Here is a video of the talk I gave on the key differences between Scrum and the Scaled Agile Framework (SAFe) and how the practices SAFe advocates address many of the challenges inherent in scaling a change effort.
I’ve included a written version of my talk below.
Comparing SAFe and Scrum
Scrum and SAFe are both methods that have been developed by Agile practitioners to help teams increase Agility. In this article, I will compare and contrast Scrum and SAFe based on my impressions of and experience with each. In order for my impressions to be meaningful to you, I want to start off by first walking through a little bit of who I am and what professional experiences have shaped my perspectives on Scrum and SAFe. After a brief introduction, I’ll move on to compare Scrum and SAFe using the lens of what it takes to scale any initiative within a large enterprise. My hope is that, whether you choose to adopt Scrum, SAFe, or neither, you’ll leave today with some useful information about how to scale an initiative in a large organization. Finally, I’d like to leave you with my thoughts about the utility of applying Lean Thinking to creative work and share my view of where we need to head next as an Agile community.
The Origins of My Perspectives on Agile
My foundation is in industrial and systems engineering, which I studied at the University of Southern California. Industrial engineering is the source of many productivity improvement methodologies, whether Lean from the Toyota Production System, or Six Sigma from Motorola, or now Agile. IE focuses on statistics and on different operations research and process improvement techniques. And what’s nice about an education in industrial engineering is it provides an understanding of the statistics and mathematics behind these methods that we use as Lean and Agile practitioners. And that understanding helps a practitioner to approach any codified Lean or Agile frameworks with a critical eye and helps mix and match methods in order to stick closer to the intent of the Agile Manifesto rather than follow one methodology without question.
Transitioning out of college, I went into systems engineering for a software development organization and I was helping to run operations for a team of 100 people. We were inside an organization of 800 people, that was inside of an organization of thousands of people, that was inside of an organization of tens of thousands of people. So doing that I got to see some of the design challenges that come up, some communication and scaling challenges, and some techniques that help to resolve those.
I liked helping to resolve organizational and scaling challenges so much that I switched into an internal consultant role and started working with Theory of Constraints and Lean, and I helped to scale those methods out across a large engineering organization. During that process, I was lucky enough to get to meet Mike Cohn and Dean Leffingwell and was able to talk with those gentlemen about some issues with each framework. Those conversations have shaped my thinking about Scrum and SAFe and I’ll share some of those thoughts with you in this article.
Scrum Alone Doesn’t Provide Agility
Scrum alone is not a sufficient framework to be successful with development.
Please don’t misinterpret the statement above, I like Scrum a lot. I mentioned I worked with Theory of Constraints, and a lot of the literature there is very narrative driven. The Goal and other ToC books are presented as novels and it’s really hard to turn those stories into practice. When I found Mike Cohn’s books on Scrum it really helped to make Theory of Constraints actionable through the Scrum model of managing work. But at the same time, when we switch to Scrum and we try to move away from big-design-upfront requirements documents, the quality tools and automated tests that document our code as artifacts of the development process are key enablers of that transition. And if those enablers aren’t in place for a team–especially within an enterprise–there are some risks. So, my position is that Scrum alone is not enough.
How do Scrum and SAFe compare?
SAFe is a huge topic. Scrum is also a huge topic. So, I want to focus my comparison of these frameworks on a few meaningful differences that relate to the challenges of scaling something to a point of broad adoption: architecture, quality, managing development work, financial communication, and politics. Let’s look at these challenges one by one and see how well Scrum and SAFe address each.
When big enterprises with big architecture or systems engineering departments consider adopting Agile one of the first questions that often gets asked is “what will happen to all of our architects and systems engineers?” This is an important question. If people are going to adopt a new model of doing something, they need to understand how they fit into that model. It is a lot easier to accept a transition plan that includes your job, and it’s much more difficult to adopt a model of work that excludes your job.
Additionally, architecture really is important as a technical enabler. I had a chance to talk with Mike Cohn about architecture. I asked: “How does Scrum handle architecture? How does that work?” And a lot of our conversation centered around the concept that, in a smaller team, there might not be much need for detailed architecture. Yet, when you’re trying to coordinate the work of hundreds or thousands of employees on a given topic, architecture can become a little bit more important. And what Mike brought up is that from his view, the Scrum approach was not necessarily designed for those types of massive teams. He recommended isolating architecture work into one or two iterations that focus on architecture and design and then, once architecture and design are stable, to move into a series of development iterations.
SAFe, in contrast, has been organized with the intention of being used inside of massive organizations where architecture and design can matter a lot. SAFe introduces the concepts of architectural epics and with architectural runways to handle these issues. The analogy is that if you’re going to land an airplane you have to have the runway in place first. The architectural runway metaphor seeks to communicate that, in order for a development team to leverage an architecture, it has to be sufficiently complete that they know how to build with it. Architectural epics encapsulate the effort required to put architectural runway in place so that architectural changes can be deployed inside an engineering or design organization.
At the team level, Scrum uses Scrum to manage development. So, that’s pretty straightforward. SAFe also uses Scrum to manage development, but a modified Scrum that introduces normalized story points. When we look at managing the program level, Scrum uses the Scrum of Scrums. SAFe uses the Agile Release Train and focuses on features prioritized using the Weighted-Shortest-Job-First scheduling policy. When we talk about managing at the enterprise level, Scrum doesn’t really have that language and SAFe introduces the epic Kanban concept. SAFe material on portfolio level management is still a little bit abstract when it comes to step by step implementation guidance, but at least begins to raise the issue.
When we look at code quality, text book Scrum stops at the acceptance criteria written against user stories. And the real challenge with that is that if we are really going to get rid of our design specs upfront and use user stories for Agile development, and then we’re going to throw the user story away (which is what we do if we operate by the book in Scrum), then we lose that documentation of our product unless we have some of the extreme programming quality tools like automated testing, auto generated documentation and those sorts of things. I like that SAFe requires XP tools.
Politics and Policy
Politics and policy are also very important considerations within large organizations. Policy is especially important for organizations that seek to sell to city, state, and national governments or highly regulated markets. Scrum doesn’t really address politics or policy, so this is an area where SAFe has a clear advantage. The current SAFe Framework (2.5 at the time of my presentation) does a good job of addressing internal organizational politics by including roles that are traditionally seen as in opposition with Agile in the SAFe “big picture” for how Agile organizations operate (architecture, executive leaders, etc.). The SAFe community is still working to get acceptance in the government acquisition process, which will require formal policy and guidance to be put in place. If SAFe is able to accomplish that milestone, companies will be able to use the framewor with confidence that their customer will buy what they are developing. This is especially significant if an organization is working in an RFP (request for proposal) context, which is common if you’re developing for the government.
Having a financial language for explaining project management decisions is important because it enables the design and engineering department to communicate in a meaningful way with the CFO of a finance organization. “The Phoenix Project,” one of the top rated Agile books of 2013, makes the case that an organization needs to establish effective communications between finance, design, and operations and must align engineering activities with the needs of the company in order to be successful. SAFe uses the “cost of delay” concept, borrowed from the work of Don Reinertsen, as a financial model in order to enable engineering-finance communication.
Cost of delay is what really got me excited about what SAFe is trying to bring to Agile, so I want to take a moment to explain “cost of delay thinking.” The cost of delay is the cost of coming late to market with a new product or service. We see cost of delay in three primary forms: theft, obsolescence, and lost sales. Let’s look at each of these.
If you are late to market, your idea can be stolen. We see the risk of theft when Apple is coming out with a product and we start finding out about what cases the vendors are designing, and what shape the next iPhone or iPad is going to be before it gets released. That forces Apple to deliver quickly, before copycat devices are able to ship.
Obsolescence is if you’re a Blu-ray manufacturer and you really want to launch a great Blu-ray player, but all your customers have moved on because you took too long in development and now your customers are watching their movies on Netflix.
Finally, lost sales is that delta between the number of products that you could have shipped if you’d launched on time and the market share you end up with when you release late or with a delay.
As a quick aside, Don Reinertsen, in Managing the Design Factory, ISBN-13: 978-0684839912, introduces the Cumulative Flow Diagram as a tool for visualizing cost of delay. Mike Griffiths has written a fantastic article called “Creating and Interpreting Cumulative Flow Diagrams” that explains how to construct these diagrams. CFDs are a very simple, low data tool that will help you to see where the risks are of delays in your development process so that you better understand where to focus process improvement resources.
What does it mean to scale?
Above, I introduced how SAFe added architecture concepts, code quality concepts, and policy concepts that extend Agile beyond what Scrum deals with directly. To further understand how those extensions help scale Agile inside of large organizations, let’s look at what it means to scale something in general. Again, remember that the context of the scaling is an organization with hundreds, thousands, or even tens of thousands of employees.
Scale can challenge large organizations in a number of ways. Geographic scale can introduce the need to collaborate across multiple cities, continents, and time zones. Public policy integration can become a challenge when the government is a customer or, in a situation like Tesla is facing in the United States where car dealerships are lobbying against Tesla’s ability to sell direct to consumers. Task variety and product variety can drive complexity as a company grows and takes on more initiatives. Similarly, it may be a challenge to include suppliers in a change initiative or product launch.
Each of these issues, and many others, come into play as we’re scaling an Agile transformation initiative and some of them might be more significant in hardware, or mixed hardware-software projects, or in highly regulated industries. These issues lead to barriers that can hinder an Agile transformation. SAFe attempts to address some of these barriers in ways that Scrum does not.
Sponsorship. Having a sponsor for an initiative is the number one rule of almost every change methodology that I’ve ever encountered and, yet, so many times we go into change without having a sponsor and we fail. And then we wonder why did failure happen? So, SAFe attempts to highlight the importance of sponsorship by including sponsors part of the “big picture.” Inclusion in the “big picture” helps the organization see that sponsors are necessary and also helps sponsors feel included as stakeholders in the change effort.
Confidence. People need to be confident in what you’re trying to do because–even though as Agile coaches we see the work queues around selling burritos, developing software, or developing lawnmowers as statistically the same–for the people involved in those processes there’s a strong sense of uniqueness to consider. People are going challenge you: “Is there anyone else out there in my industry, in my unique situation who’s been successful with what you recommend that I try to do?” Like Scrum, SAFe is beginning to address this with case studies and success stories. Yet, SAFe does more to address organizational uniqueness than simply provide case studies. Dean Leffingwell, in his seminars for coaches, explains that there are hooks in SAFe that an organization can use to safely tailor the framework to meet organizational needs. And that adaptability and extensibility helps people to adopt the change.
Language. Commonality of a language is really important when we try to scale an idea or way of working across dozens, or hundreds of teams. People need to know what each other are talking about for something to grow and having something like the SAFe “big picture” to communicate with can help.
Education. Having experienced multiple certification programs as a coach, I think the SAFe certification framework is favorable to coaches. And that might be why SAFe is picking up some steam. In some cases, SAFe may require less years of experience and administrative work than competing certifications and it’s possible that introduces some risk, yet large organizations rapidly need to train dozens or hundreds of trainers in order to complete a timely transition to an agile development system, so it’s important that a robust certification program exists, the learning path is clear, and processes are integrated with tools. These features helps to make a framework repeatable and teachable.
Policy. When I got trained for SAFe, the Software Engineering Institute was at the same training. SEI is the organization that popularized CMMI. The SEI representative was there because they were interested in answering the question: ”Can SAFe be recommend to the Government Accounting Office, or the US DoD, or the Food and Drug Administration as an acquisition policy approach?” And the SAFe leadership and thought leaders are really putting in a lot of effort to make those discussions happen, trying to get that acquisition approach to come into being, which would enable large organizations to invest in rolling out a scaled agile change effort.
Tools. Tools are important because they are the embodiment of knowledge. By creating tools we reduce the learning curve necessary to adopt a given process and we systematize work, making it repeatable and teachable. SAFe has begun to influence Agile tool makers Rally and VersionOne to incorporate its language and concepts into their software. Scrum, of course, has a variety of mature and well established tools.
SAFe: Some Areas of Caution
There are a few aspects of SAFe that I am cautious about and it is important to highlight those. The normalized story points bother me. I think there is a lot of value in recognizing a team of unique people, working together, as a unique entity, and what their independent velocity is. And I think that if you take the time to understand velocity and how it works, the issues that story-point normalization seeks to address kind of drop away and aren’t as big of a barrier as they might initially seem.
SAFe prioritizes features and user stories using a Weighted-Shortest-Job-First heuristic. While I really like WSJF as a concept, I’m not sure we know yet how to use it in most organizations because I’m not sure organizations understand how to measure the value of an epic, feature, or user story. Often (and especially for organizations coming from earned value management), value is assumed to equate to what a unit of work costs to perform and that’s not going to get you the right weighting under weighted-shortest-job-first. So, there is some work that needs to be done in terms of “How do you do WSJF?”
We see the terms of SAFe being introduced into Agile project management tools. And the challenge here is that tools these are expensive and licensed by the user, so expensive that often teams cannot afford licenses for all team members. What I’ve observed, is that if even one team member lacks access to the tool then, typically, somebody else is reporting status for that team member and putting his or her data into the tool. When that happens, the team really isn’t using the methodology. So, I see this as a weakness right now.
I’m concerned about premature adoption of SAFe as an acquisition framework. SAFe may or may not reach that stage. But, you know, if it did, as it is today, I don’t think it gives us the jump we’re looking for over the legacy acquisition framework which is the Waterfall method. I don’t think the current implementation of SAFe solves the issues that make projects fail in the Waterfall context. I do think SAFe is evolving to address those issues, and I expect to see solutions in future iterations of the framework.
Lastly, not as important as the others maybe, it bothers me that there is this emphasis, always, on face-to-face. We live in 2014. We have amazing collaboration tools available to us. And if you follow 37Signals (now Basecamp) and read their book “Rework”, they talk about how they built a team with people all over the world. It allowed them to get the best people. And that skills differential of the people they brought into the organization outweighs the communication difficulties that they face. They’re also able organize the work in an intentional way that leverages where people live and the times of day that overlap and don’t overlap. So, I’m not a big fan of the insistence on face-to-face.
My conclusion about SAFe is that it’s very useful today as a framework for enterprises to get started with the transition to Agile but further iterations are needed to fully address the issues that lead to schedule slips and new product introduction failures.
What’s Next for Agile?
I want to briefly explain what future iterations of Agile might look like; at least one possibility. So, where do we look for the end state? Where might SAFe be going? Where might other frameworks be going?
A lot of Lean product development comes from Toyota. Lean definitely shapes SAFe, and it seems like it also broadly shapes Agile thinking. And so, if we accept that we want to achieve the Lean outcomes of flow, of value, of reduced waste, and of respect for people, then it’s interesting to look at a question: “Have we made an assumption here that Toyota does design the same way that they handle production?” It turns out that we have. What MIT has found is that Toyota doesn’t use point-based design. There’s only a couple of papers on this. It’s hard to find this stuff. It’s kind of like Toyota open-sourced their production process because that helps them interface with suppliers, but they didn’t open-source their design process to retain competitive advantage.
Let’s consider how an electric engine might be developed under the systems of point-based design and set-based design to get a sense of the impact that switching to set-based design can have on development process efficiency. I don’t know if Toyota developed their engine technology this way, but I’m using engine technology as an example because it is familiar to most people with access to cars.
In point-based design, an organization focuses all of its resources on one type of car engine, which is specified up front in a requirements document. Other engine technologies are not investigated. This is a high risk approach because, should the new electric engine fail to meet its development schedule then the whole new car development project will become delayed. Whether Agile, or Waterfall, or whatever approach, if the new electric engine doesn’t work and there’s no backup plan because point-based design has been used, what ends up happening is the schedule slips because the design effort has to reworked: “Hey, we need to go back to the internal combustion engine. Let’s start from scratch.”
The point-based design approach works in production, where the processes and technologies necessary to assemble a proven car design are well understood, and in a production context Lean excels as a process improvement and control methodology, however; what MIT has found is that Toyota handles design differently from production. Toyota applies what MIT calls set-based design. They keep multiple technology alternatives in play throughout the development of a new vehicle. So, if Toyota wants to launch an electric engine, they’ll also have a hybrid engine as a moderate risk option and they’ve got the internal combustion engine that they know how to build and that they can fall back on, and they mature all three in parallel. And what that does, is it gives a set of potential technologies. Those technologies are modular. They can be switched out for one another and that means when Toyota gets to a decision point and says, “we need to pick an engine today and start the factory or we’re going to miss our target shipment date and we’re going to have lost sales,” they are able to move forward. Toyota is able to select, from a set of alternatives, the technology that is both closest to what they want AND sufficiently capable at that point in time. Maybe they don’t get to the electric engine, but they’ve got the hybrid working, so they can leave the gasoline engine behind and introduce the hybrid vehicle.
I’d like to conclude by issuing a challenge to the Agile community: as a group how do we challenge the assumption that design is the same as production? And, if we acknowledge some of the uniqueness around design, how do we then incorporate set-based design (guided by MIT’s observations at Toyota) into Agile frameworks? If we can figure out the answer to these questions as an agile community, then we can feed higher quality output from our design processes into our existing Agile production, DevOps, and maintenance workflows. Whether the answers to these questions are provided by SAFe, or by any other framework, getting the answer is the catalyst that will boost us ahead, that will get us to the next level of effective new product development.