For the companion UXM essay spun from this conversation, see The Map Is Not the System.

Transcript

Speaker labels and timestamps follow the source transcript; light edits may apply for readability.

Josh 00:00

Well, yeah, it's great seeing you again, Marina. And we're we're super excited to have you here on the show. your new book, Crisis Engineering, is really fascinating because it's kind of about being ready for a crisis and using crisis to make meaningful change, You've worked in and around government for years. So I wonder if you could talk a little bit about.

Josh 00:19

how your relationship to crisis has changed in that time and what led you to some of the realizations in this fascinating new book.

Marina Nitze 00:28

Yeah. So I entered the federal government in 2012 and a year later became the Chief Technology Officer of the Department of Veterans Affairs, which is the largest civilian agency by a large margin. And that was also the year that healthcare.gov happened. So I now have business partners who all met me during the healthcare.gov tech surge. And a thing that kept happening to us in parallel, as you know, I was at the VA, they rescued healthcare.gov, we stood up the US digital service together, was this constant confusion that, you know, it's like chicken little, right? I'm walking around being like, what? There are hundreds of thousands of veterans that cannot enroll in healthcare. They're dying. on the front page of the New York Times. And the website where they would enroll in healthcare literally won't open. It's that broken. Why is everybody, you know, sipping their coffee at a normal pace? Why isn't this a crisis? And then when we all left the government, we founded our crisis engineering firm called Layer Aleph, although we didn't have the name for what we did at the time. And we went into huge healthcare systems. We went into banks, so well outside of government as well, and kept seeing the same thing. However, we would also observe things, and healthcare.gov was one of them, right? Where there was a very useful crisis, where suddenly a crisis caused. rapid decision-making timeframes where suddenly people were making huge change in minutes to hours. And we really wanted to understand more about why that was possible. And so we dug into a ton of the academic research. We kept doing crisis engineering on our own. And that's what really led to this book. So we wanted to frame what makes for a window where rapid transformational change is possible, which we say is a crisis, how you can harness that. So whatever you've had in your back pocket or the change that you've wanting to be made. Milton Friedman has that famous quote. In a crisis, the ideas implemented are whatever's laying around. How is what you have with whatever's laying around? And then we also have a really practical toolkit for you having an incident, you're having an outage. How are you standing up your crisis engineering center? How are you maybe prepping for known spicy days like a launch or a deadline so you're ahead of the game so that when there is a potential crisis, you can really jump on it and make as much rapid transformational changes possible in that very small window.

Robb 02:44

I feel like since our last conversation, that's all I do lately is look through this lens of crisis. I can't tell if it's an accurate lens to look through for everything or if it's just that this is all I can think about. But changing my relationship with the word and the concept to like an opportunity versus crisis as something you avoid and apologize for and get fired for and all of those things that tend to come to mind. And then of course the dark mirror side of crisis, which is, you know, people who do prepare for it, but in like ways to get bad things that they couldn't have happen, happen. And then understanding like, I'm going to say bad actors. tend to probably predict crisis better than good actors, because of just optimism versus pessimism in general. And saying, wow, if bad actors are going to be better at being prepared for crisis, someone needs to write a book, to get the good people out there to be prepared. Because those who will be prepared at the moment, those who appear decisive, will be.

Robb 04:01

the ones who ultimately get hurt, right? I mean, what's your thoughts?

Marina Nitze 04:05

Yeah, absolutely. This is a toolkit that we want widely available. You you can hire us to come and help you transform your crisis into a moment of opportunity, but you can also read the book to try to do it on your own. There's only four of us at Layer Aleph, so we need way, way, way more crisis engineers. We need way more people who understand what a useful crisis looks like, what those indicators are, and then how you can jump immediately, because that crisis window can often be hours. It is generally not longer than days. better or worse, depending on your version of the story, I think we're seeing more and more crises either go without anybody taking advantage of the moment for transformation, or yes, actors that you wish were not the ones taking advantage of the moment for transformation.

Robb 04:50

Yeah, and what's your thoughts? Like, why do people tend to be so adverse to planning for crisis? Is this this like, when things are going good, we always assume they'll keep going good. And when things are going bad, we always assume things will keep going bad like the stock market.

Marina Nitze 05:05

Yeah, I think this is core. So we learned about this concept years in that really we felt like explained a lot of this. And it's Carl Weck, who was an organizational type out, like studied organizations. And he called it sense making. And sense making is what are familiar with Daniel Kahneman. Like this is in his work as well. Our brains are constantly constructing stories of our reality and of how the world works. It's happening automatically. You cannot stop your brain from doing this. You may not even notice that it's doing this love.

Marina Nitze 05:35

time. And those stories that we're constructing are not necessarily based on facts and they are often not based on all the facts. They are often based on a convenient subset of facts. This has been studied in places like jury deliberation rooms. I think like a core tenant of being an American, right? Is we believe you have a jury of your peers, they're going to go in this room, they're going to deliberate, they're going to think very hard and carefully about the evidence. That is not how it works. Jurists enter the deliberation room with a story of what they believe happened and then they assemble the facts to fit that story. We might not like that, but if we acknowledge it, it can change kind of how we interact with that. So I think when you apply sense-making to an organization or a crisis, the story we're telling ourselves is, I'm getting status reports every day. I am on top of my people. I have a good team. I have a good product. Things are going great. That is a story that we're telling ourselves, which is why we are often taken aback by crisis, or we think a crisis can't happen to us, or that incident, or that outage, or that failure can't happen to us. A flip of that can be also sense-making can very much rationalize constant failure. And I think this is a thing that happened in government a lot. You the backlog is growing, but it's growing very slowly over time. And that becomes like the reality. The reality is it takes us four years to do this. The reality is there are 38 steps in this process and they are immovable. But a crisis can actually show you like the pandemic, That, whoa, suddenly we can shift this and like that process is actually not as immovable as we thought. Or you could have, you know, new people in charge and suddenly you can discover that the immutable laws of bureaucracy are not all that immutable after all. They can actually change. And that throws people's sense making for a loop. And the key there is you want to grab them in that window where their brain is assembling a new story.

Marina Nitze 07:27

How can you get that new story to align with reality and ideally your version of reality?

Robb 07:33

Yeah, it's like breaking mental models essentially in a sort of UX world, But one of the things I found was super important about the book was you really have to agree on the definition of a crisis. And it seemed like, in my mind, there's plenty of people who are willing to raise the crisis flag when it's not a crisis yet, because they're trying to kind of take advantage of this moment before it happens. And then there's people who don't acknowledge there's a crisis when they're right in the middle of it. And so the real starting point just seemed to be like, can we agree on what constitutes a crisis and what you guys cover seems like, love the, particularly the part that says like when your core assumptions are challenged and your current processes just won't work anymore.

Marina Nitze 08:30

Yeah. So we have five indicators of crisis and some people are helped if I reframe this to say indicators of a useful crisis. If you are in a crisis or not in a crisis, I'm not trying to have an emotional argument with you about that. I'm trying to talk about what tools will work to get you out of your situation and get you stronger on the other side of it. If you're not in a crisis or again, maybe you want to call it a useful crisis by our definition, then crisis engineering tools will not work.

Marina Nitze 08:59

Other tools will work. I previously wrote a book, Hack Your Bureaucracy. You can try those bureaucracy hacking, but the crisis engineering toolkit won't work. And the flip of that is when you're in a true crisis, that's not the time for incremental bureaucracy hacks. That's the time for huge transformational change. So the five indicators are, the first one is a fundamental surprise, right? You didn't see it coming. So this is why when people talk about a budget crisis or a housing crisis, that's going on for years and years and years and years. That doesn't hit that criteria because nobody is like, whoa, wait, where did this come from? As you just said, a fundamental disruption in your core function. So you cannot process claims. You can't get orders out the door. Your website or your app is down or it's so slow that it's effectively down, right? Now somebody has to do something. You cannot remain in this state. That's pretty much. always an indicator that you're in a crisis because somebody has to do something. A rigid timeline. This can often be deadlines, but it's funny because people often say like, well, tax day, that's a rigid deadline. We went back and the IRS moves tax day like most years if you go back. There is an outage or a holiday or the pandemic or something is always happening. But often there is something like a rigid deadline. know, Taylor Swift going on stage is happening at a moment. Right, you can't fix that seven days later. So you may have a rigid deadline there. High visibility. Are you on the front page of the newspaper or today, you know, are you on the top? Are you the trending topic on social media? Is your CEO talking about it constantly? Is it inundating your internal channels and your Slack and your team's channels? And then it's a failure of sense making, which is what is happening is not making sense. And your understanding of how things worked and how your organization works suddenly has broken down.

Marina Nitze 10:50

And this is the window for magic because people hate cognitive dissonance. They hate it so much and they will do everything they can to resolve it as rapidly as humanly possible. So if you can get in that window with your ideas and the changes that you wanted to see, that's where you can make, you know, years, decades, centuries possibly of change in a very, very, very short time.

Josh 10:52

Hmm. And it's funny how much of it is tied to storytelling in a way, like the stories we tell ourselves and these stories we need in order to keep organizations running almost are really, really important. And that windows fascinating too. It's just this short window you have, like what happens, like you mentioned the housing crisis, right? if the right story isn't told, it then does crisis sort of metastasize into something immovable in a way that that you have to wait for a new crisis to sort of kick it into a different state of crisis so that you might have that opportunity again.

Marina Nitze 11:47

Well then I think that becomes sort of a chronic problem as opposed to a useful crisis in our definition. And it's interesting, we've had some workshops and some webinars with local and state officials and we talk about this and their first things are always, I'm in a budget crisis, I have a housing crisis. And when we push them on this, we're like, okay, but did that create this moment of rapid transformational change for you and cognitive dissonance? No. But what they constantly point to is the day the bridge collapsed, the day the highway broke.

Marina Nitze 12:14

when it was like, whoa, now we cannot actually keep functioning as we are. We can clearly keep functioning with our current housing problem because we are. We are continuing to get up every day and like people are continuing to go to work and write progress reports about it. But you cannot continue getting up and going to work when the bridge is collapsed. And that is when suddenly like your zoning rules and regulations that you want changed, right? Your procurement regulations that you want changed, your funding, whatever that may be. That's the window where you can say, whoop. up, we're going to rebuild this bridge a different way. think Governor Shapiro in Pennsylvania has a literal example of rebuilding a bridge in something like three days by circumventing a lot of rules. Once they're circumvented once, now you have changed the story of what is possible. It is possible to fix the bridge in three days, which I don't even know anything about bridges. I think changes my mental model of how bridges and infrastructure work. You know, I live in Seattle where we're expecting the next light rail line and like when my great grandchildren are going to college.

Josh 13:13

Hahaha

Marina Nitze 13:15

If you told me suddenly the light rail is going to take six months, you would change my story of how the world works.

Robb 13:21

Yeah, there's a few things there to unpack. Like one is that crisis breaks consensus, that's a huge one, isn't it? I mean, isn't that, you know, there's like risk mitigation. So like, how do we, how do we mitigate risk? How do we, manage risk and not take any unnecessary chances that then shifts to somebody do something now and how that breaks consensus because like it's so easy to say do nothing. It's so safe. Like there's do something. There's a thousand right things I guess you can recommend. But the other side of the coin is like or just do nothing, which is only one choice.

Marina Nitze 14:09

Or continue doing what you have been doing, continue meeting about it. Right? Because we keep rewarding that behavior. I'm going to study this. We're going to have a commission. I'm going to have a special meeting about this. And we keep doing that and thinking that it's going to do something. And I have seen that happen in the White House, in the Situation Room, on very, very high profile things. And that is not actually what fixes something. It's studying it, discussing it, and to a large extent, debating it around the table. I'm not saying there's not room for debate and for different perspectives, but that's ultimately

Marina Nitze 14:38

What you have to do once you have like quickly debated is something that we push for very hard. You have to take a novel action. You have to try something. This is probably easiest to think about in the sense of like a tech outage, right? You could sit and study your network diagrams for six days if you wanted, but what you need to do is have a theory. Okay, I think it's DNS. And then you try something to fix that. And it works and then you are correct and your story continues or it does not work. And that is often when you discover that your entire

Marina Nitze 15:08

map is not correct of how your system works. And again, it's maybe easier to think about this in terms of like a utility or a tech company, but it really works culturally as well. You how your organization works, your map of it right now is, I would bet money on it broken in various ways, but that's not uncovered until you try an action against it and then discover that things happened or things did not happen against what your story would have told you happened.

Robb 15:36

And it does underlie this idea that the same person that might have blocked something in a crisis will just probably keep their mouth shut or possibly even nod and raise their hand. And this kind of comes down to this idea, I think, of being perceived as indecisive or where that's a bad thing and then indecisive meaning like let's just keep doing things the same way, being interpreted as a good thing. we hire people to what? Mitigate risk or to handle crisis. Which one? I guess the larger the organization, the more it's mitigate risk. And then a crisis hits and they're like, wait a second, I'm not wired for this. Or are they?

Marina Nitze 16:24

Yeah, I mean, I think it's possibly impossible to hire most of an organization, especially the larger you get, that all have huge risk tolerance. I think that's absolutely a core of where the federal government is breaking down in outcomes right now, because the risk framework does not measure for outcomes that are experienced by end users. It measures for compliance. Did you fill out the 75 rows of the ATO spreadsheet? Did you submit the paperwork online? I mean,

Marina Nitze 16:49

There's public dashboards that show 90 % of projects are on time and on budget. That's insane. That is absolutely not correct. But that's what people are being measured against. So think about how a crisis can transform how those measurements work. That would be an incredible transformation coming out of a crisis to say like, whoa, whoa, whoa. I believed, you know, I was getting these status reports and I had a dashboard that I could look at and I believed it to be true. The crisis revealed to me that something I was fundamentally surprised by this outage or this inability to do our mission. And going forward, I'm not going to do that old thing anymore. Now we're going to actually measure in this new, whatever the new way may be. We're pretty obsessed with Anthony Downs who wrote Inside Bureaucracy, which unfortunately is wildly out of print. can get like, there's like four copies on Amazon for $300 each. But he has this concept of rear view metrics, which I think frankly, most organizations I'm encountered run on right now, right? You have someone who has maybe source data and over some period of time, it's usually like weeks to months.

Marina Nitze 17:48

It's transformed into PowerPoint decks or transformed into a, even a dashboard is often, you know, imported here and tweaked there. And by the time it gets to the executive that could do something about it, the feedback loop is broken. The data is too slow or it's been too colored. And so Anthony calls these rear view metrics, cause you're seeing them behind you. And what you actually want to see is more like what the healthcare.gov rescue stood up, which was all 55 of you were reporting green until the site went live. And it is not.

Marina Nitze 18:17

So how can we have a real time dashboard that's actually pulling from all of your operational systems and saying like it is, you know, the web page is loading for this many people in this period of time.

Robb 18:31

that's sort of rather around my brain around all of this is, you know, if you sort of boil down AI into one core skill, it's prediction. And, and prediction, you know, has a key place in this like predicting the crisis, predicting how to avoid a crisis, getting fired because you didn't predict properly, getting fired because you weren't prepared because you didn't predict it was going to happen. Predicting in a crisis what will mitigate and solve things. This is something we all pride ourselves on, which is probably part of why we get stuck in these moments of not admitting there's a crisis. Because if you didn't predict it, then. then maybe your job's on the line. So you're like, well, incented to say it's not a crisis because that was your job to predict it. How do you guys sort of address that aspect of it in terms of just breaking through that idea upfront to say, look, crisis is going to happen.

Marina Nitze 19:39

I think we have a pretty controversial take on this, which is if you were to completely avoid crises and completely like see everything coming and stop it ahead of time, then you're actually missing your window for transformational growth. That said, I we get asked our probably number one question is how do I prevent the next crisis? And so we have a kind of a twofold answer to that. One is I'm absolutely not saying that you should be gobsmacked on Monday when your entire production system goes down because you have no monitor. There are lots of things that you can proactively see and get ahead of. It does not mean that you need to be like falling on your face from things that were clearly visible ahead of time. So get that infrastructure in place. And we talk about that in the book, some of how to do that. Then there's this middle ground of, know, have, again, we call them known spicy days. You have a launch, you have a deadline, have something where you're expecting a huge traffic surge. Get that crisis engineering center stood up.

Marina Nitze 20:34

If you're in a position of leadership, have that thing in your back pocket that you want laying around so that when things go awry or the map is revealed to work actually differently than people thought, you're poised to jump on that and make that transformational change and get it to stick. Because like, Marina's idea from when we had that problem and it resolved it in seven minutes, that's going to stick better than pitching it up through six years of committee. But then the third piece is, yes, the absolute thing that people try over and over is to brush the

Marina Nitze 21:03

real crisis under the rug to make it smaller, to minimize it, to get through it. And what is possible often is you can get through it. Like the time will run out, you sold the tickets or you didn't to the event, but the event happened. But you never fundamentally changed the piece of your organization that you could have in that window. And that's what we really wanted to focus on with the book, was how you can use that window. Don't hide from it. Don't shove it under the rug. But really use that window to make as much positive transformational change as you can.

Robb 21:32

Yeah, every time I was kind of thinking through it and looking at those ideas, kept visualizing like a glucose meter for some reason. the idea is like, you know, keeping your glucose between like 40 and 170, right? Like not down, not up and saying like, if you could just keep your crisis.

Josh 21:32

Hmm. You

Robb 21:53

within the healthy, like you don't want it to be a flat line. Like you said, predict every crisis. There's no, it's just a flat. Like that's anybody whose glucose was like flat line. There's actually something wrong with you. That's not like a good thing. and so it's almost like, like managing that, that crisis is will happen these, and these are, these are okay. Just don't let them fall below 70. Don't let the damage be irreversible. And then, it's okay to spike up and make a lot of change. Just careful how much change you make. Don't go above 170. And kind of live in almost an organization that lives in that band but is volatile within that band.

Marina Nitze 22:42

Yeah. Oh, fascinating. So I'm a type one diabetic and I have a glucose monitor. So I can relate to this one. And I would say like, there you go. A crisis is going to be that you have not been on top of your health and you drop below 30 and you pass out or anorecoma or you have a car crash. And that may be the crisis that you need to wake up and be like, whoa, I actually do need to like follow a low carb diet and have like tight glucose control because I had this huge crisis. Humming along at like 180 for years and years and years.

Marina Nitze 23:10

is damaging your body, but you aren't having that one moment where you're going to see it until maybe you get that catastrophic diagnosis.

Robb 23:16

yeah. it's like crisis management is the way to make sure that you don't have to deny because the crisis is like predicted that there will be one, it's managed, it's not ideal. You're not in an ideal state, so you have to do something. But You can still take credit for having that baseline. It's the fire drills and the preparation for for a crisis, but not knowing which one it will be.

Marina Nitze 23:47

think that's a great way to put it. You don't want to be taken, again, blindsided by something that was extremely obvious to see. That is not our advice at all. Get that squared away. But you want to be prepared to take advantage of and not sweep under the rug and not hide from the next crisis that you cannot see because that's actually a huge learning opportunity for your organization.

Robb 24:07

Yeah, maybe it's just talking about it. Like just beginning to talk about the crisis and like, heaven forbid, I would knock on wood, like, no, just let's, let's talk about it. Let's not worry about talking about it and thinking that somehow, like that superstition of you talk about it, it will cause it.

Marina Nitze 24:28

And I think you can also put the infrastructure in place. I expect some people are possibly reading our book in the middle of a crisis. And if that's you, we put some cheat sheets at the end of each chapter. You can read the bullets and then come back and read later. Our hope is that more people will get it in their hands and get to read it through before their next crisis. But your middle ground can be, can you get that infrastructure in place so that when the next crisis happens, and again, you probably shouldn't be able to predict that or else you've got some other stuff going on, and you can stand up your crisis engineering center. You're going to have your

Marina Nitze 24:56

your crisis engineering lead identified, you're to have your communication channels set up. People will know how to call the center, for example, or send in a report. These are some fairly simple steps that most places we come in in a crisis don't have in place. Do you have a status page that is separate from your normal infrastructure? Because sometimes when you're especially in a tech outage, you may be inundated with your own workforce, refreshing the page compulsively. And that's actually taking you down in a new way, or they're all distracted now in various signal chats discussing like, it an outage? Is it my computer, et cetera? Whereas if you had a clear communication channel that people could rely on, those are some pieces of the puzzle that you can have in place ahead of time so that you're not dealing with that level of change. You may be in your first crisis, that's fine, but then you have a toolkit together so that next time you can deal with that, or you can take advantage of that next level.

Robb 25:51

Yeah, I've seen that before the system goes down for some reason, reason A, and then your monitors get into an infinite loop because they're not reaching anything. And then that creates problem B, your system can't get up because you're essentially doing a DNS attack against yourself.

Marina Nitze 26:11

Yeah, absolutely. And think about no matter how perfect your organization may be, we are all increasingly interdependent. So you may have a perfect organization with perfect everything. You your code is fully tested. You might be telling yourself now like, it's fine, you know. But when Amazon Web Service goes out, suddenly people's mattresses are breaking. And that's a thing that you have to really tabletop out and be ready to take advantage of that crisis and be stronger as a result rather than be a victim.

Josh 26:40

Yeah. Last time we talked, we realized that your book, Hack Your Bureaucracy had some, some sort of foundational similarities to our book, Age of Invisible Machines, because they're both kind of about systemic change and trying to drive it incrementally and acknowledging that it's slow and it's hard, but that it's worth it if you really do it the right way. And, you know, not long after our book came out, ChatGPT was unleashed on the world and A lot of the things that Rob had been thinking about for years all of a sudden were happening and everyone was very excited about them. But there was this other thing where, you we assumed that organizations would be finding every way possible to use this technology and reading our book, of course. And then what we've seen is that adoption has lagged in a major way while consumer adoption is just like floored. so one of the, actually it's a crisis we've talked about on this podcast that we're we're kind of waiting for, but like outbound AI in the hands of consumers, right? Like it's really not that expensive or difficult for someone who's a little bit irritated at a company to put together an agentic attack that'll just flood their call center. And so it feels like we're maybe entering into a time when crisis and maybe a lot of these crises will be ones that can push organizations forward in a positive way, but where they're going to be coming. at a rapid clip and like there could be some really major ones on the way that will start kind of affecting everyone government and business. Yeah.

Robb 28:09

Yeah, panic buying.

Marina Nitze 28:11

Yeah, that is a great example of how AI can generate some crises that people I think are not predicting and are not expecting. If it takes down your call center, if it takes down your ability to fulfill claims or orders, right? And then that creates on the part of that organization, that fundamental core disruption of process that will force them, they have to make a change if they are to survive. So either, if it's a private sector company, it will collapse, go out of business, go bankrupt. If it's a public sector, it has to change. It has to, or it cannot continue delivering those benefits. So as later off, all our projects are under NDA, so we can't talk about them, which made it for a slightly frustrating place to put a ton of stories in the book. But the one that we could talk about was when we helped California end their unemployment claims backlog in the middle of the pandemic because the governor ended up publishing our report publicly. So we could pull from that. And one example of a breakdown in sense making was When we talked to people over and over and over, they would say, don't worry, we have the call center. When people are struggling, they're at the end of their rope. And people were literally dying by suicide in this because they were losing their homes and they had no money, they had no food. Don't worry, they can call the call center, the call center and every up down sideways on the ladder, people would say, don't worry, we've got the call center. And so we went to visit the call center and we walked into. a large room of empty cubicles with one gentleman in the corner. And he was super kind, smart, happy to talk to us. He was very confused by why we were there to ask him about the call center, because he did not run a call center. He had a team of unemployment specialists who had phones, but whose primary job was to process claims. And what that revealed was that there was never a call center. The call center was The phone number would route randomly to employees' desks. And as the pandemic went on and people went home, they would ring to desks with nobody at them. Nobody had told this gentleman that he was renting the call center because everybody had developed this story that there was this call center full of staffed experts at unemployment claims and what showing up. So our novel action was like taking our feet over there, which in the middle of the pandemic before vaccines was actually kind of numb. and finding that the call center was actually simply routing around to people's desks and that actually all of those people had gone home. So therefore there was no call center and it could not answer the call. But the map, if you had looked at it and boy did we, right, had a fully staffed call center on it. So that was an interesting example of these kind of discoveries and forced story retelling may be outcomes of

Marina Nitze 31:02

what you're pointing out, like this increased AI traffic.

Robb 31:05

Yeah, I don't know, I get this image in my head, everyone sitting around the table and saying, well, at least we've got the call center and someone like.

Robb 31:17

but we don't have a call center. And then everyone else going, well, let's not worry about that right now.

Marina Nitze 31:22

Nobody knew that there wasn't a Why would there not? And in fairness, nobody here was like dumb or incompetent. Why would there not be a call center? Your experience for years has been people will call a number that you give out and somebody answers the phone and they generally know what they're doing. So it made complete that nobody had any reason to question that story.

Robb 31:36

Right. Yeah, so in this particular case, we're looking at employees. I'm just imagining like the risk vector, right, has increased dramatically because your customers can attack from any channel. You can't tell who is a human and who's not. Your employees. Also, you know, your prospects, if you put out a job wreck and you get thousands and thousands of applicants, you don't even know which ones are actually real people. like you could almost say like, where do you begin questioning the breakdown of your current processes when when so many potentially are about to break

Marina Nitze 32:24

I had a little bit of a snarky recommendation in the book, which was, was like, do you have a map printed of how your system works? Whether it's a system diagram or people flow to the call center or order from Flamelin, whatever it may be. Great, take that and flip it over. Now you have a piece of paper.

Marina Nitze 32:41

and grab a market and you have to with your feet to the extent humanly possible, like walk through what the system may be. So to your point, if it's like going to a call center or we maybe get, you know, what's your hiring process like? What's the, what data, if any, do you have at different steps? And then where are people infiltrating? mean, that's an example I'm hearing from literally everybody I talk to right now that they're, being inundated with. 100x, 1000x applications. now that process is broken down. They can't even get through them. That's just a volume problem, right? Then it's also most of these people are not real. Then it's some of these people are North Koreans that are trying to infiltrate my company and posing as them. They are on video chats as other people. So there's all these steps that you've got to walk through. And depending on your company, that might be frustrating or that might be absolutely mission critical. ending, right? If you're, I don't know, a greeting card company, like it will be annoying to get a thousand applications and you probably have to change your process for hiring if that happens. But if you are, you know, a mission critical system and foreign actors are successfully infiltrating your HR system, you need to fundamentally make some changes. And in many cases, you might be missing the high visibility for your own sake. I hope you are on that. But maybe the other indicators of crisis are there so that you can rapidly change your hiring process. because that is not something that you can take three years and a committee to think about. This is happening today. And if you try to make a small bureaucracy hack tweak, that foreign actor is making much better tweaks against you.

Josh 34:18

Yeah, so it's almost like someone who who wanted to use crisis engineering in an organization or in the government right now would possibly just be like cataloging all these weaknesses, right? it would be frustrating, I guess, because like maybe you want to affect positive change, but you have to resign yourself to like, well, it's not going to happen incrementally. But here are all these points that are poised to break within this organization. And here's a plan. for each of them or one plan that I can, you know, pivot the story on and push it in when, when something breaks. well, I guess when we were talking about like a housing crisis, right? There's like the actual crisis and then there's the lingering problem. And there's sort of this almost like taxonomy problem, right? And it's, it's felt similar to artificial intelligence, Like everyone's supposed to adopt AI. It's supposed to be this and this and this and your company. So it's sort of like this weird bubbling pool.

Josh 35:13

of crisis that has all sorts of spots where it could really flame up and no one's really sure where or when that'll happen.

Marina Nitze 35:19

Yeah. And I want to be clear, like we think a crisis is a moment for transformational change that often cannot happen outside of it. But I fundamentally do not believe that you can't change an organization at all. Again, I wrote another book, you wrote a book about how to do this. So, and one of the actually a core recommendation across all these books is that you have to interact with the system as it is, not the picture of it, but interact with it as it is with your feet to the extent possible or your fingers on a keyboard into terminals.

Josh 35:27

Sure.

Marina Nitze 35:48

to understand how it is really working. And then to the extent that you can outside of a crisis, how are you trying to make change or to prove, collect data, measure data differently? I mean, if someone had been motivated and knew that that was a change they wanted to make, anybody at any point could have walked into the call center that wasn't a call center. But the crisis was what revealed because people weren't in the room, right? You could have actually probably walked into that building and believed it was a call center because you would have seen people sitting at their desks.

Marina Nitze 36:16

It wasn't until they got up and left that it became clear that there was no call center. You can map a lot of things. You can make a lot of change. I was successful at this at the VA. I've done similar successful work in foster care. It's a lot slower and a lot more incremental. But then ideally, that toolkit that you've got that you've been working on for hacking your bureaucracy and invisible machines is when that crisis hits, again, you probably aren't going to know when, you are the one that has the truce.

Marina Nitze 36:46

The closer to the truth story, you know what reality is and you know where you need people to get to. So when they're trying to resolve their cognitive dissonance, they are very, very likely to latch onto your version.

Josh 36:58

That's such a great point. Yeah. And like the, like what you said too, about using your feet and your fingertips, because it's interesting that even almost more, especially where this advanced technology is involved, it's it's human to human interaction that on earth, the, the, problems that exist. Like I think it was Jennifer Paul could tell us that story about, about you getting to the bottom of this carbon copy problem, right? Where

Josh 37:22

where one agency is like insisting on sending carbon copy because they think the other agency needs it and then the other agency just accepting them because they think that they have to be coming this way, right? Like that's one of those things that AI alone might never have gotten to the bottom of it was like a person talking to another person and then talking to another person.

Marina Nitze 37:41

Yeah, I mean, really is so much. You can hire me to do this, but you can also do it yourself is whether it's AI or a crisis or outside of a crisis is understanding how a process works from end to end. It's almost never somebody's job. It's almost never somebody's purview. And I'll tell that story quickly on the carbon paper. I was helping a state try to reduce their foster grant licensing process. This was not a crisis at the time, but the story people believed is that it simply took six months to license grandma.

Marina Nitze 38:11

And the huge problem with that was that grandma had that child in their home for those six months. Licensed meant paid, got even daycare stipends. So it meant supported grandma. So this six months, everybody just believed this is how it works. Like nobody was really believing that it could be dramatically reduced. I followed the process from start to finish, which was nobody's job ever. Everybody had an individual role. And at one of the steps, the woman pulls out a carbon copy form to request the grandma's driving record from the DMV. Now a side note on this, ultimately we removed this step altogether because the question was, why do you need her driving record from the DMV? And people were like, well, maybe she had a lot of tickets. And it was like, okay, so the kid has been with grandma for six months and maybe she has a lot of parking tickets. What are you going to do about that? And the answer was nothing. So at a meta level, we actually removed this step entirely, which is a different lesson. But in this moment, I'm sitting at this woman's desk, she's pulling out the carbon copy form and she's complaining to me the whole time. The DMV, they live in the 19th century, they make me fill out this carbon copy form, I have to get a stamp, I have to get an envelope, this is the bane of my existence, I hate this part of my job, you know, I really hate the DMV, I'm very frustrated with them. Because I had no boundaries, I went to the DMV and I said, hello, show me how you fulfill these requests. And the woman there said, no problem. I open my email box and I click in this folder and they're there and I usually send them back within about an hour. And I said, well, wait a minute, where's the carbon copy paper? And she was like, my God, were you at Child Welfare? Those people live in the 19th century. These carbon copy forms. I don't know why it must be their policy to like use carbon copy, but I really hate it getting them. It's the bane of my existence. I introduced these two totally dedicated public servants who were doing this really awful task.

Robb 39:45

Hahaha

Marina Nitze 40:00

We shaved 30 days off that process and again ultimately eliminated entirely because it was nobody's job ever to look across those silos or to look at that process from end to end. And everybody's story was that everybody was doing their part, right? Like those individual actors would never tell you that their part was broken. It wasn't broken in their version of the world. It wasn't until you zoomed out that you were like, whoa, whoa, whoa. There's a huge disconnect here.

Robb 40:25

I love this lens. I've been thinking a lot about this and we've talked a lot about it in the book, but this idea that there's a lot of folks who talk about like the jobs that the new jobs that will be created by AI, right? And they're like ridiculous, like prompt engineer, like, no, that's not going to happen. agent engineers like nope, that's not gonna happen. But crisis engineering, like that's probably that might just be the biggest, let's say area, I don't know what they'll call these roles. And they'll call them whatever they'll call them. But that will be the job they're doing no matter what. So let's just say call it what it is. That

Marina Nitze 41:01

Yeah, we're going three for three because my colleagues helped create SRE at Google and then we created the digital service teams that are now in states. So we're going three for three. We want crisis engineers out.

Robb 41:13

Yeah, so I think this could be like, this could be the, you know, the biggest area of opportunity for new job creation around AI. And it makes a lot of sense. looking at this even deeper, like not as a role to have a bunch of worriers in a room, but crisis engineers as an opportunity. which is how I like you've positioned the book, it kind of gives good license for them to go through the organization, like you said, fingers, and look at opportunities through the lens of, mitigating crisis versus the lens of getting rid of jobs, right? You know, instead of like the automation folks that are going to come in and try to go through and find ways to reduce costs.

Robb 42:02

they're now going through with a different lens, is mitigating crisis. It just makes so much sense. And I'm not saying it as I'm like, undercover. I mean, no, it actually makes sense to take that lens. It's probably the most actionable lens to take.

Marina Nitze 42:21

Yeah, mean, I'm perhaps biased, but if I had an organization, I would very early want a crisis engineering function and person and center and standard operating procedures. Among many things that a crisis does, it also like absolutely knocks down those doors, right? So you could on a regular Tuesday be trying to meet your peers in other parts of organization or being a new person and trying to understand all the parts and pieces. And people will be like, in six months or sometimes I don't even want to talk to you, right? Or. Or you might not even know, depending on the size of organization, all the nooks and corners of it. But a crisis, suddenly, I mean, people are going to come to the table or there are going to be consequences, right? And so that could be if you open up communication channels. We talk some in the book about after crisis care, because some of it is like people on the crisis effort will be burnt out. Like they just worked incredibly intensely for potentially weeks and that you need to have mandated vacation time.

Marina Nitze 43:20

be really on the alert that these people now have a huge new skill set and many, many new relationships across the organization. So you might want to harness that, right, with new roles or swaps or potentially even depending on situation, you might have someone on the vendor team that comes inside to you or you might have someone on your team that goes out to the vendor team. There could be some interesting and ultimately positive swapping. but also that you want to keep bringing that team together and any new communication channels that you stood up, right? If you suddenly have the seven parts of your organization that were able to communicate rapidly and in real time, solve problems, answer questions, don't lose that when the crisis is over. Like you can keep bringing those folks together. The communication level and cadence will not be at the level of the crisis, but there's no reason it needs to go back to not existing at all.

Josh 44:08

Yeah, we talk a lot about canonical knowledge or source of truth knowledge being like foundationally important, right? If you want to build things with AI agents, like they need to know what to say and they need to like pull from the correct information. But you know, a lot of that is explicit information that's locked away in PDFs and stuff. And obviously, LLMs are pretty good at summarizing that and reading that. but I think the richest knowledge is what you're talking about. It's this implicit knowledge and it's the knowledge that's gathered by getting up and walking and talking to people and mapping the whole thing. And I think, you know, that's, that's weirdly the easiest area to overlook, I think, as people are moving towards this technology.

Marina Nitze 44:49

Yeah, people don't do it. And if you're looking at deploying AI or any change at any organization, AI or not, I mean, that's the part that almost nobody does. I will say it almost surprises me a little bit sometimes still when I come in in a crisis and I'm like, I don't know, we've been doing this for nine years. Like, didn't try it before you called me, okay. Again, we are available for hire. But no, and you don't have to be.

Robb 45:05

That's awesome.

Marina Nitze 45:13

It's powerful to do as a CEO, it's powerful to do as a CTO, but it's also something you can largely do from any seat. You can at least go up and down from where you are. And gosh, if I was trying to drive AI adoption in my space, wherever I was, however, in the field or however high up in leadership I was, I'd be wanting to find those back and forth steps where I can start making the case, where I can start hitting a pain point,

Marina Nitze 45:42

I don't know that AI, I'm not yet persuaded that it's going to drive mass unemployment, but it could really drive much greater worker satisfaction, accuracy, better outcomes. mentioned I do a lot of work in foster care and you've got social workers who are way understaffed, way overburdened with caseloads that are spending six hours asking an 18,000 page PDF case file the last date the kid went to the dentist. Let's throw LLMs at that and then what's the risk?

Marina Nitze 46:12

that you put the wrong dentist date down pretty low. And you still have a human. I'm not suggesting that AI read the case file and write the report by itself without a human being double checking that if it says,

Marina Nitze 46:22

they went to the dentist in 1917 or something that you kind of check that. Huge applications when I was at the VA, right, for benefits. Why isn't it going through your whole electronic health record and pulling out your nexus to your disability and surfacing that to the claims processor? nobody wants to do that work. And it's not automatically denying a claim, but it's lifting up the information that somebody needs. That's a huge win.

Robb 46:45

Yeah. Yeah, what I like about the looking at it through the crisis lens is a lot of those wins, right? A lot of those applications of AI we see there is like, as you said, like you got to walk through from the beginning to the end. And a lot of these are like going like right in the middle and then automating something, right? Without looking at the front and without looking at the end. And you're sort of like, am I automating something that shouldn't exist? Right? Just because I've like, I've like found some sort of pain point. So I'm going to go after this pain point, not asking like, should this even exist in the first place? And should we go up channel? And to your point, who's looking at that? Like no one, no one's going to look at that if you don't assemble some structure for viewing it holistically.

Marina Nitze 47:40

Yeah. And if I'm trying to make a case, you for more AI adoption or like the examples that I just made. No, you could try normal channels, try the bureaucracy hacks. It may work. It may make incremental change for sure. But then also, if you now encounter a crisis by our indicators and now, for example, people are like literally unable to do their jobs because they're so inundated with claims or cases or whatever. That's a great moment where you want to be like, I piloted this with my team of six people. Here's a way that we can apply LLMs. It will work through this. backlog 1000x faster. Here are the risks that we've already thought about. That's the kind of plan that if you've got in your back pocket, potentially will be executed and then become standard operating procedure the next morning.

Robb 48:22

Yeah, just to test out this like, what is a crisis and what isn't, just to do a quick run through. so, as consumers and employees, right, become proficient in this technology, which they are, they're starting to learn what it can and can't do for themselves. their expectations of companies go up.

Robb 48:43

Right? Because now they know what's possible. Now they're like, why am I on hold? Like, I know, I know this is unnecessary right now. Does that, eventually constitute a crisis or is that one of those just chronic, you know, we could be better, but

Marina Nitze 48:47

think it could easily generate a crisis in some of the examples you gave earlier where like suddenly they're inundating a support line in a way that the support line cannot possibly continue to function and so it must change. And that might mean that the support line changes or that might mean the product line changes, right? Because it can't be supported anymore. It could fundamentally change like order fulfillment. And it's starting to your point about changing expectations. I don't think it's a crisis that in general expectations are changing. I think that's a chronic.

Marina Nitze 49:31

ongoing description, but I think that can very easily generate a crisis that people did not see coming in a wide variety of organizations.

Robb 49:40

So it's just, it's your point of like, it's high glucose above the margin for a long period of time until finally it's a heart attack or whatever.

Marina Nitze 49:48

Yep, yep, yes, yep. It's gonna be that until yes, until then the doctor tells them that they need their foot amputated and then they're like, okay, well now I'm gonna get Dr. Bernstein's book and follow a low carb diet.

Robb 49:59

Got it, got it. Well, this was.

Josh 50:00

If there is someone who has followed that process, They know why that person's on hold, even though that person thinks that they shouldn't be on hold and is probably right. If there's someone in the organization that at least understands why they're on hold and everything that's leading up to it, then when that crisis hits, when the call center gets flooded, they're gonna be the ones well supplied to help rewrite the story, right?

Marina Nitze 50:26

Absolutely, yes. And again, they may be able to make incremental change along the way. Certainly, like if you're saying like, hey, 20 % of our support is a reset password because there's no reset password button, right? That might be something that, you know, over a number of months, you can pitch to the committee or do a pilot or something like that. But to your point, it's when the crisis hits. And that may not be predictable because it's going to be the 999th call that is the straw that breaks the camel's back. And now things have collapsed. Now they need, they have to change to continue to survive. and they don't know what that change is going to be yet. They will find something, because the cognitive dissonance is too strong, but if you're there with a new plan, that's the magic.

Robb 51:05

Yeah, it's what you said, like the predicting of these things is very difficult because the implications of these crisis exposed things no one saw coming. And that example is a great one. You oh, we automate password reset and 90 % of our calls are password reset until most people are trying to get a human on the phone, like literally avoiding all the automated. things you have, like that is the objective of these AIs is, if you get to a person, you have a better chance of actually getting a 10 % coupon or a free month on your broadband connection.

Marina Nitze 51:43

A really interesting thing, and this was pre-AI when we were on the California project, was that the phone number for deaf and hard of hearing got out and people realized in Reddit forums that that line was answering and helping people. So it went from a volume of like 40 a day to being inundated and then taken down because it literally could not handle under the load. AI will find that a million times faster and will be able to do TTY with it, right?

Robb 52:10

Right. Yes.

Marina Nitze 52:13

You can't, if you leave that, obviously that's a good function to have. I'm not saying not to have that function, but if you're not otherwise supporting, then these things that you may not expect will collapse as well.

Robb 52:18

Of course. Well, awesome, this was great. I loved this conversation. I am obsessed with crisis now. I really am, truly. I think I'll be on this for a long time. Crisis engineering, think, is definitely the next big field everybody should be looking at. I think it's very human.

Josh 52:26

Yeah, fascinating.

Marina Nitze 52:42

Would you like to do our PR for us? That would be great.

Marina Nitze 52:46

Thank you so much. This was a really rich conversation and it's fun talking through this now instead of just writing it.

Robb 52:53

Awesome, awesome.

Josh 52:54

Yeah, this was fascinating.