...

Microservice Migration Roadmap

Originally aired:

About the Session

So you’re convinced that adding microservices to your IT organization’s offerings is a good idea. But how do you get there? What does it take to move from monolithic releases to microservice releases and how can you do this in a way that doesn’t disrupt your current business? In this talk, Mike Amundsen — noted author, speaker, and software architect — describes the STAR (Stabilize, Transform, Add, and Repeat) method for safely and effectively migrating your existing IT infrastructure to a microservice platform. All without interrupting your current IT services.

Transcript

Okay. Let's get started. Hi, my name is Mike Amundsen. This is the Microservice Migration Talk, a roadmap talk. The title is slightly different. It's been updated a little bit, but I assure you you're in the right place. This is me. This is how you find me on LinkedIn and GitHub, Twitter and all sorts of other places.

I'd love to connect with you. I'd love to learn what you're working on, what projects you're doing. if you can, if you connect to me on LinkedIn, please remind me that it's from the architecture event, and I'll be sure to follow you along. I would love to learn from you. Everything that I teach is really stuff that I've learned from other people.

Now in this particular slide deck, this is going to be material taken from two different books that I've worked on with a really great team, a team called the API Academy. So first is the Microservice Architecture book, there's some foundational elements in the beginning where we'll talk about that.

Mostly from this idea of continuous API management, this idea of continuously modifying and changing and working over time, evolving the architecture, as Venkat was just saying, if you were in the Venkat stream just a little bit ago. So that's where this material is coming from. Now, there's a handful of things that I want to talk about, and I broke it down into some key parts.

Why do we want to do migration? What's going on here? Migrations are almost always painful. What's the motivation for doing this? And a lot of the times that has to do with unlocking business value. You've got systems that are healthy, that are happy. Lot of intellectual property stored somewhere, and you need to get it out into the world. How do you do that? Migration can help. For this session, I'm going to skip the services in a nutshell, I'm going to assume we sort of understand what microservices are because we've got a short amount of time. I won't go into some of the basics about services.

Let's just assume for a moment, whether it's services or APIs, you've got a bunch of parts that you want to connect, whether they're in a monolith somewhere, or they're connecting from some other system, it doesn't matter. It's okay. We'll get through that part.

Then I want to talk about some basic principles, like migration. Why are we doing it? How do we do it? What are the tools that we can use? And it turns out those tools are very, very similar to the same tools that we use for code migration. But we're now talking about service or network migration, so we'll talk about those. And then finally, I want to introduce what I call the S*T*A*R method, the Stabilize, Transform, Add and Repeat method. That's the key to safely and successfully transforming your system, migrating from one place to another, from one time to another all the time. And that's what you'll be doing continuously. So that's what we'll talk about.

So, first of all, I just want to talk about what's in a name, you see lots of names: SOA, microservices self-contained systems, right-sized services, all these other things. Really, it all boils down to the same kinds of things. I really liked what Werner Vogels had said when he talked about this, they said they were working on this way to scale Amazon, and they said, we're using service-oriented architecture. It's services. We're based in services. Those services happen to be small, but connected by network pieces, but it's service-oriented architecture and we're doing this because it lets us do things rapidly and independently.

Rapid and independent is a key. And it's incredibly difficult. It's incredibly difficult to do, but that's what we're aiming for, and that's why making services smaller makes sense, because once the service is smaller, I can release a smaller part. I can test a smaller part. I can fix and code a smaller part. And that's what that's really good about. So, no matter what you'd call it, SOA, microservices, whatever. It's fine.

So, let's talk about why we would ever do something as crazy as a migration, as dangerous as a migration for a running business, a business that's up and working already.

Well, it turns out all organizations, no matter what size they are, but especially as they mature, as they get larger, as they grow over time, have all sorts of technology all over the place. Technology that's five years old, technology that's 50 years old. I've got some organizations that have technology that's 70 years old. In some, sometimes they have technology that's just months old or they're still experimenting. It's all over the map. And eventually what we want to do is we want to harmonize pieces. We want to get pieces to start working together. So we want to do some migrations.

Now, there've been lots of talk about why people migrate. And there's a great ResearchGate publication that talks about why people want to migrate. And what I find really interesting is some of the most common reasons for migration has to do with scalability and agility.

We want to make things more agile. We want to make things more flexible, but we also want to make them more scalable. There's lots of reasons for this. Now, extensibility and maintainability and reliability are up there too. We want all of those, but for the most part, most people talk about this notion of scaling and being agile.

And when they approach this notion of migration or transformation, they really approach it from a couple of points of view. When my colleagues and I were working on the Microservices Architecture book, we interviewed a handful of companies and asked them, why are you engaged in microservices or transformation or changing from one service platform to another, and very commonly an answer would be" we want to build products."

We want to build products that other people can use either use internally inside our organization, or use externally, that we can sell to partners or something like that. So they have a very big product focus. Now, a lot of other people that I talked to when we were doing the interview said, we're trying to build a toolkit, we're trying to build a platform. We're trying to build a set of tools that we can use over and over again in the future. We're trying to do lots of reusability and so on and so forth.

Now, this gets to be a pretty dangerous idea. When you build a platform what you're really building is a whole system and family of products. So, you're supposed to be solving other people's problems, not yours.

So when you build a platform, you better be populating it with products people love, products people want to use. If you go out and build a platform and you fill it full of products or services or capabilities that nobody wants, or that's too expensive to use or too difficult to use, you're going to be in big trouble., alright? So you want to be sure you're aware of that.

Finally, another big reason that people gave us when we were asking them why they do this migration is they say they want to get rid of technical debt. They want to re-architect. They want to change the way their system works. They want to get rid of the old and bringing the new, they want to refresh what they have.

This is the most dangerous reason for migration, and it is usually a failure. Doing all this work, migrating all these things, just to have the same operational functionality, to have the same things, but in a brand new platform that you spent months or years, or several years to build is not a good idea.

You want to think in terms of products people want or platforms that host products people want, but just in terms of getting rid of legacy or migrating to something new is a bad idea and it doesn't usually go well. So, another thing to keep in mind is as you migrate your system to use more microservices, use smaller bits, incorporate other people's APIs, partner's APIs, APIs from third parties, you're actually adding a great deal of complexity to the system.

When we were working on the microservices book, we talked about three levels of complexity: modularized, cohesive, and systematized or systematic.

Modularized systems have independent deployability and they offer testability. That's an amazing using set of tools. That's the first rung on the ladder, independently deployable and testable. Most organizations would be so fantastic, they would be so much better off if they did just that. But there's a lot more to it, and the next level is having cohesive systems where it's aligned with your organization. You're building things that are well aligned with business. You can do replaceability, you can move one service out and put another service in because you have shared interfaces and alignment and understanding. You have composability.

This is incredibly difficult. I have very few clients that are actually in this space, because this is now much more complex. You're actually building tools for other people to build tools, and that's an amazing system.

Finally, you have this idea of a systematized or system level or designing the system approach where you achieve resiliency and availability and agility and runtime scalability, where you can actually move, not just hardware, but software from place to place. You can purchase new things. You can bring your things. You can plug things in.

This is amazingly difficult. This is like planetary-sized difficult. Google, Netflix, some of these other, you know, sort of native cloud companies, that's what they do, but I don't work with a lot of Googles; there's one. Mostly, I'm working with organizations that don't have that planetary need, that don't have that level of abstraction.

So, focusing on being modularized, independent deployability and testability is more than enough to start with. And if you get great at that, and you still think you have opportunity and extra money on hand, you can try moving up a little bit higher.

I love this feedback from Rob Brigham who talked about Amazon's story. You know, he says, if you go back to 2001, we were a huge monolith. We were an architectural monolith. It took forever for code to change from check-in to production.

That's the measure. How long it takes between you make a change in your code until it shows up in production. That's the measure you want to look for, that's the key to this element, and that's why Amazon took on this notion of micro services, smaller services, service-oriented architecture. They didn't do it because it's cheaper. It's more expensive. They didn't do it because it's simpler, it's more complicated and more complex. They didn't do it because it's easier. It's actually much less deterministic and harder to predict and harder to manage.

They did it because they wanted to change their speed from check-in to production. If you can improve your speed from check-in to production, without adopting microservices or doing a major migration, that's exactly what you should be doing.

Somehow it all comes down to money, right? We're trying to save money in what we build. We try to earn more money in what we sell. We're trying to conserve the money that we have, saving that is really important.

Another thing that we heard, as we mentioned earlier, is this notion of speed.

Not only do we want to save money, but we want to do things faster, but at the same time, you have to do things in a way that's safe. We need security. You can't give up security in order for speed. You can't make the car lighter by getting rid of safety panels and expect to survive crashes. We need both of those things.

The real challenge is doing speed at scale, is doing things faster and staying safe and doing it at scale, so one of the things we've talked about in the book is balancing speed and safety at scale, and when you're doing migrations, this is your job. You need to balance speed and safety at scale constantly. Anything you do has to be done in order to take care of the system.

Let's talk about this idea of unlocking business value. We know about some of the dangers. I kind of freak people out a little bit. Now let's talk about some of the power.

What motivates this notion of migration? Where is everything? Often, our catalogs, our API catalogs, our service catalogs look like this desktop, right?

It's in here somewhere. The part is in here, I know we have it somewhere. I just can't find it. So, you end up building things over and over again, you can't figure out what's going on. That can be really frustrating. Data and services are often stuck inside or buried in that mess somewhere. You want to get this data out or you want to get this capability out, but it's buried somewhere in some legacy code or it's behind a firewall or it's difficult to use, or it uses ICM files or XML or something that we're upset with. We need to unlock those.

Here’s the other thing that's really important, and that’s the notion of cost. We spend all this money. I just gave you millions and millions of dollars for a migration, and you're telling me you need more money now, when I asked for a new service? I thought this was all going to be free! Why does it cost so much money to get this data that I already have, that I already built something four or five years ago? That's often a motivator for migration.

It's really important to remember what Hung LeHong says at Gartner. You want to renovate the system. You don't want to get rid of the core or replace it.

You're not trying to build a new building somewhere outside at the other end of town. You're trying to renovate the building that you have, and when you renovate, you have to be able to operate inside it while you're doing it. And that's really important.

Down to cost and risk management. This is a slide from a thing called The Standish Group. They have a great report. This is called the CHAOS Report. You might also call it in German, the Schadenfreude Report. The I get to Watch Other People Fail Report. They follow really complex projects from large multinational organizations or governments, and they say “How did they do? How did this go?”

And what happens in the red zone? Those are failed projects. In the green zone, those are successful projects, and they have this complexity and size scale. I can't really explain it here, but if you look up The Standish Group and the CHAOS Report, they'll show you. And what they tell you is: the smaller the size, the least complex, the more likely to succeed. In fact, if you look at the subtleties in this chart, you'll find that you can squeeze out more complexity as long as you keep things simple. That's really important.

We'll talk about this idea of how we reduce risk. John Allspaw, Etsy. John helped foster the kind of thinking that gives us dev ops and so many other amazing things. He says you “lower the risk of change through tools and culture”, through the tools you give people and the way you have them act, and that is the idea of reducing risk and managing its scale.

Here's the thing: they're going to be lots of changes. There are always going to be changes. What we don't want is breakage. Change, we want; breakage, we don't. And there was a great line from a former CEO of CA technologies. He says “Built for change”, build it for change, build it in a way that makes it possible to change. And that's really what this is really about.

That's the whirlwind through the process. What are we really trying to do? Why are we really trying to do it? What are the [inaudible]? Let's get to the real meat of this whole thing. So, when I talk about this idea of the S*T*A*R treatment, giving your system the S*T*A*R treatment: Stabilize, Transform, Add and Repeat, this is this idea that you're in charge of what happens next. If you want to transform your system, you take a hold of it. And this is really where we're going to do it.

But first I just want to talk about a couple of basic things along the way. What are the tools we're going to use on this adventure? As we transform our system, what are the things that we're going to do differently? What are the things we're going to carry with us on this journey? And what I want to talk about then are some basic principles of transformation, of migration, of turning this into something.

One of the first principles. You may know this story. How do you eat an elephant? You eat it one bite at a time. There are other ones. I happen to be a vegetarian, so I don't need elephants. I don't think anybody does or should, but there are other analogies. How do you build a castle? One brick at a time. How do you build the pyramids? How do you build a monument? All these things, right? Think about all these things. We do them one step at a time, and that's really important. You don't go all in on a big bet. If you're going to migrate your system, you don't do all of it by Friday. You don't have a big switch that you turn that that says “Yesterday, it was version one. Today, it's version two for everyone on the planet”, that's not how it works. You take it step by step, piece by piece. And Netflix understood this really well. Adrian Cockcroft, who was the Cloud CTO at Netflix, I think he's at Salesforce right now. Might be at Amazon. I can't remember where he is now. Somebody might be able to remind me and put it in the chat line. “Whenever you do a transition”, that's what they call the migration, the transition. They said you're transitioning from one thing to another. “Whenever you do a transition, do the smallest thing that teaches you the most and do that over and over again.”

I love that idea. Do the smallest thing that teaches you the most and then keep finding the next smallest thing, and the next smallest thing, and the next smallest thing. Change the arguments in a flag on a command line now, change the query possibility, change the short algorithm, do little things. When you work on the little things you can make big changes and that's how you migrate a system.

Facades

Another really important thing that we know when we talk about migrating code is this notion of employing facades, stranglers and refactoring. We know about facades in code. We know this idea that if we have really complex bits and pieces, that we can put up a facade in front of it to simplify it in some way. We use facades for aggregations. We can use facades for all sorts of things. But the idea is to give people something simple to work with. Facades are the same thing. Creating services that talk to other services are facades, simplifying whether or not there are five services or eight services behind me.

Remember in a very complex system, it's not some simple three layers: UI, middleware, database. That's not how life works on the web. There's stuff all over the place. What you really have is you talk to a service and that service solves your problem, however that is. Maybe that service does it itself. Maybe that service talks to two services. Maybe that service talks to five services. Maybe it talks to five services today, it talks to 10 services next week, it talks to two services two years from now. All you know, is you just talked to one. Facades are incredibly powerful ways to deal with migration, focus on the facade that people will use. We'll talk about how to do that.

Richard Carr from BlackWasp, who's a great developer and thinker says the facade pattern is a way to simplify interfaces of more complex subsystems, and we'll have lots and lots of subsystems, especially when we have lots of small services.

Stranglers

I also like this idea of a strangler or a deprecation pattern. You're migrating from one to another very slowly over time. So in the first step, maybe you've got clients that go through a load balancer and that load balancer talks to all the old code, your monolith or your old service, whatever it is, and then maybe you take a small piece of that.

You take the search feature out of your monolith and you put it in a separate service. The client still talks to the load balancer. The load balancer knows that search goes over here. Everybody else goes over here. Fine. Then you move search, then you move all the editing services and stuff like that.

Now you have sort of a balance. The load balancer always knows where to route things. The client never knows. Client talks to the load balancer. That's the façade. Eventually you're almost finished. You've got all, but a few reporting tools or something. These are all production releases by the way. This is not all in tests. These are production releases.

Finally, you get everything into the new system at some point along the way. That might take weeks. It might take months. It might take years. That's migration. That's the roadmap. That's what you need to be doing.

Paul Hammond from ThoughtWorks is the one who talked about the strangulation idea. It's a safe way to phase one thing out and replace it with something else. And remember you're doing it piece by piece by piece.

Refactoring

And then finally there's just simple refactoring. I might have, for example, a service bus. It has lots and lots of parts. I need to refactor that service into some service aggregation, so I'm going to rewrite those pieces over time. I'm going to simplify what's behind the interface. So, refactoring just like refactoring in code is really important. We keep the same interface. We move things around. It's the same thing in services.

Martin Fowler's got a great way of describing this: “When you refactor, you're improving the design of the code after it's been written”, and in the case of services, you're improving the design after it's been released. That's migration. That's what you want to be doing.

APIs are Forever, Code is Not

Here's another really important one. Remember: APIs are forever, the interface is a promise that you make forever, but code is not. Often, we think of it in reverse. We think of code as our asset. The thing that we're holding onto, the thing that's really valuable to us, but often it's not the code that's important. It's the interface. It's the interface that everyone else uses. That's what they know.

They don't know the code. They don't know if it's written in Go or JavaScript or PHP or C# or Java, who cares! It can be written in something else next week. It's the API that's really important. Again, going back to Amazon, Amazon has a great story on this. They said that they knew in designing their API, it was super important because they'd only have one chance to get it. One chance to get it right is pretty dangerous. That's a lot, but I think the hyperbole makes sense. Think about this carefully. You're about to make a promise. Hyrum Wright, I think it's called Wright’s Law. You could look up Hyrum. Wright. He worked at Google and he had this axiom, it's called Hyrum Wright’s Law, Wright’s Law, Hyrum's Law. And he basically said, when you publish an interface where no matter what you say about it, no matter what you promise, if it's used often enough, eventually there'll be a client somewhere that has a fatal dependency on every aspect of that interface. You can't take it away once it's published. So, Amazon understood this very well.

Instant Reversibility

The last thing I want to talk about in terms of basic principles is this notion of instant reversibility. If you're doing continuous change, if you're making small changes all the time, you also need to be able to reverse that change. If it doesn't go well, you need to put it back. You need to move it back. That should happen instantaneously. The most common way to talk about this is blue-green deployments. Blue-green deployments, where I have all of the version one on the green servers and then all version twos on the blue servers, and then I change the router to talk to the blue servers and I'm all set.

You can do this blue-green, this flip, this switch back and forth for every single service that you have. You don't have to do it as a platform. You can do it on every single service, and you do that at the router. So, this is a really important way to do this. And that means if you flip it forward and it doesn't work, you need to flip it back. That means you have to be reentered.

Think about the case where you're going to start collecting new data in the new system, but it doesn't work, and you have to go back. What are you going to do with that data? Think about if you want to change your data models in some way, your change your storage in some way, you have to be able to flip it back. It's really important to think about that and plan about that ahead of time. Data is key, data is very important in all of this reversibility.

So again, using Martin, “Blue-green deployment gives you a rapid way to rollback - if anything goes wrong”, I would add, “and it will, and it will, of course it will.”

They know things always go wrong. That's just the nature of things. It's the nature of how things work.

Alright. So, take one bite at a time. Employ facades and stranglers and refactoring. APIs are forever, but code is not. Don't worry about that. Focus on the interface. And then finally, continuous change means you also need to support instant reversibility. These are the principles you're going to go armed with. I would add to this basic set of principles that you have to have what I call the Hippocratic oath of computing systems: “First do no harm”. Any change you make needs to be transparent to the network. If I'm using your service and you make some change to the backend or some change to the middleware, I should never know.

Maybe I know there's a new feature. Maybe I know there's a little bit more performance, but it doesn't break anything I have. There's no fatal dependency. There's no direct coupling between your changes and my changes. Remember, that's the independent deployability that we talked about at the lowest level of that migration pattern.

Finally, what is this roadmap? What is this really about? What does this really mean? Let's take it in the few steps along the way. Remember: Stabilize, Transform, Add and Repeat.

Stabilize the Interface

The first one is stabilizing that interface. What does stabilize the interface mean? What are we really talking about here? What we need to do is use the facade we talked about before. So now that we have a facade, we can make sure all consumers can talk to a proxy. So, whether you have whatever interface you have or whatever service you have, often, what you've got are all sorts of services, somewhere, talking to all sorts of clients, somewhere else.

Those clients talk directly to services or some services talk to other services in the backend, and this is sort of a hodgepodge of mess. What you need to do is you need to insert a proxy between them.

Now, this is a very popular pattern right now that's being spoken of as a service mesh. You use Envoy proxy or some of these other tools like that. Or you can think of Kubernetes as this way of doing proxy. So, there's always a central location where everything goes through. Now, a central location doesn't mean there's just one box. You can have a whole cluster, you can have several clusters, you can do it in several places, you can do this in groups. But the idea is the individual consumers talk to a proxy that handles the details of routing in the backend. And that's really important. This is a challenging moment. This is the moment when you have to actually get buy-in from the consumers. Now, depending on the complexity of your system, you may already have lots of routing and gateways in place. This is just going to be installing an additional layer, additional routing tool, and the clients are already used to this idea. But if you have a lot of independent clients that are not inside the gateway, they're not inside this space, you're going to have to route them in. Hopefully, that just means that they just need to change their starting URLs or they need to change their URL case configuration. Hopefully that'll be the case, but sometimes applications hard code things and it's going to be really hard for them, so you have to work out the details. Now, like we said before, from learning from Netflix, remember: when you're doing this, start with the smallest thing that you can learn the most from.

Turns out if you can already work with some clients that are working through a proxy, start with them. Don't start with the hardest case that can't change. Start with the simple cases first. This proxy is the anchor, the start of this whole process.

The proxy is a stabilizer proxy, and it's a pass through proxy. If I've got a service that offers A, B and C capabilities, I can't have this proxy filter out all of the D capabilities and not allow clients to see D. I have to make sure it's a pass through. So that's wrong. I can't fancy up what the output is. If it turns out the service currently outputs XML in A, B, C and D, I can't have the proxy suddenly change it to JSON. That's going to break somebody along the way. That's a bad idea. What I need is a completely pass through proxy. If the service offers A and B, it goes through the proxy as A, B, C, and D, A, B, C, and D comes out the other end.

Why are we doing this? What's the advantage? Well, first of all, we're creating a central control point. Secondly, we're creating a management point. Now I know who's using the service. I know who's using the service. I know how often they use it. I know what its performance statistics are. I know what its profile is.

When I make changes, when I make migrations, I'm not going to introduce something that's less reliable, runs slower or screws things up. I get this information now and I get a profile of just how often the service is used, what its performance criteria is, what its SLA is and make sure that any change I make does that or better.

It's also important to keep in mind that ESBs go behind the stabilizer proxy as well. You may have one ESB that talks to lots and lots of other services, got lots of other rules there. That's fine. You need to still put that behind the proxy. You may also even have external APIs. You may have somebody calling address validation or some other third party, external APIs. That goes behind the proxy as well. You have to keep that in the same loop, so it's really important that you think about it in all these steps.

As I mentioned before, you don't have to proxy your entire company before you move on to the next step. Start small, work small. You can proxy one small group. Maybe there's a group that's ready, they're ready to do this migration. You can proxy and focus on them and work through all the steps. Then maybe later on, you can work through another group and add a proxy there. Maybe later on, you can add a proxy to other parts to eventually cover the whole system, but proxy marching, this idea of coming in, installing a proxy, normalizing everything, make sure everything works great, make sure your reporting is working great, that's the proxy team's job. I work in organizations where they literally have a proxy team. They show up. “Hi, we're from Enterprise Architecture and we're here to help. We're going to set up your proxy”, so on and so forth. That's the first step on this process.

So, all the consumers use the proxy. The proxy has to be passed through. ESBs and third parties go through the proxy as well. And we can employ this idea of proxy marching. So, we're under the first step. Pretty good.

Transform the Implementation

Step two: now we're ready to start transformations. We got proxies in place.

We know we stabilized things. Now we're ready to do some transformations. So, one of the transformations is pretty simple. We're going to simply refactor existing components, just like we refactor code. We have a stable interface. We can refactor the components.

We might have a component that's old, it's written in a language nobody really likes anymore, or the person retired or whatever the case may be, and it offers this ABCD services. What I can do is I can rewrite it in a fancy new language that we really like, Rust or Ruby on Rails, or just anything, whatever we want to do. And then what I can do with my proxy is as consumers are making a call, I can simply transfer the routing to the new service.

And if everything works great, we’re all set. If it doesn't work, I can transfer back. Here's that blue-green deployment that we talked about, forward and back. Eventually, if it really works, I can get rid of the old version, but it's really simple that I made sure I validate that nobody's touching that and that my proxy would help me validate that statement.

That's refactoring existing components. Remember the strangulation pattern or the deprecation pattern, maybe have a monolith that has lots and lots of pieces of it. And when I can do is I can start to take small parts out and replace individual parts of that monolith until maybe it's all gone.

But I work in a lot of organizations where this is as far as they get. You know what? We don't need to take reporting out of the monolith. People hardly use it, it’s going to be costly to replace. Just leave it there. The rest of this stuff is fine. We'll just leave reporting in the system. We're auditing in the system and just keep that up, because I don't have a license problem with that, but maybe I do have a licensing and I need to get rid of all of it. So that's what I can do. So, I carefully do all of that, and throughout all of this experience, the API consumers or the service consumers never know the difference. All they know is that it's still working the way they need it.

Maybe I need to untangle ESBs. I got ESBs that talk to lots of things, lots of rules. They're really cumbersome. We don't want to manage any ESB anymore. We've decided that.

The first thing we can do is we can start doing those routings directly to services.

We can replace that with a facade of some kind. We'll talk about a service facade. So refactoring existing components, strangling the monolith, replacing tangled ESBs.

Those are key elements in being able to take this next step. That's transforming existing functionality. You may notice we haven't added any new feature yet. We stabilize the system. We begin that process of transformation. We know how to do stable transformations.

Add Functionality

Now we're ready to add functionality. Now you may get through transformations very quickly, or there's only a couple of things that we need to transform, and we can move on to adding. You may spend most of your time simply transforming what you have, but now we can start adding functionality, and the most common way to add functionality is, I think, through side-by-side components.

Remember that ABCD component? It’s been rewritten. It's very cool. It's very nice. Everybody likes it, but they want new features. One of the things I can do is I can write a new version of that component, that adds the new features, E and F, new capabilities, and people who want the new capabilities can simply address a new endpoint in the system. Both of these still exist. I can do side-by-side releases. I don't need to always do replace. I can have things run side-by-side and you happen to have this happening all the time anyway, in real life. You've got several devices. You probably have different versions of web browsers on there. They can run side-by-side.

Another possibility is just to simply add a new component to the system.

I've got this idea of ABC and D; people want E and F. I'll just add a new component, and I might even write the routing rules to make it look like it's part of the old component. There's new slashes and URLs or whatever the case may be. I just add the functionality separately. So now consumers who just want the new functionality have it, consumers who think they want to add this functionality to their existing pieces, have it as well.

The same thing works for third parties. Remember, we talked about this idea of third-party facades, third-party consumption. What I don't want to do is actually have people talking directly to this third-party. Instead, what I want to do is flip the script, build my own services, build a facade that talks to third-party APIs. In a lot of organizations, there's a set of proxies that just deal with the outside world. They deal with all of the third-party APIs that might be inside the organization. In some other organizations, they have a proxy for each one. There's the Salesforce proxy service, and that team works on the Salesforce proxy, or there's another service for Amazon or Google service or something like that. But using your own facade means you can manage that inside. And that's a protection. Remember: any third-party service is a threat to you. Any third-party service could be canceled at any minute, could be abused by somebody else that you don't control. So, you need to control what comes in your in grass, and that facade is really handy for doing that. so you can do that.

Adding functionality by side-by-side, updates or simply adding new components or creating external service facades that make it look like this stuff is coming from you, but it's really from a third-party. Now we're adding functionality. We're adding new features to the system. Okay.

Rinse and Repeat

Remember, there's one more step along this way. That step is rinse and repeat. So, we've done proxy marching, we've stabilized, we've done transformations of existing services. We started to add functionality. Now we need to do that a lot when you do that more and more.

So how do we do that? So, remember changes are incremental. We do them a single step at a time. So, this is another element that comes from John Allspaw that I talked about earlier. Think about it this way. If you only make four updates to your system in a year, you've only got four chances to get it right in the calendar year.

If you get it wrong in one chance, you've lost 25% of your opportunity. And it turns out if you're releasing large packages, every time you do an update, your chances of success are diminished. The more changes you put in the package, the less likely it is to succeed. So, it turns out if you did 150 changes inside a release that took three months to build, there are more than 11,000 possibilities of some interaction, just inside that release, that could possibly go wrong. But if you do a package with just 10 elements in it, there's less than a hundred possible things that could go wrong. There's a lot to think about this idea. When you do small amounts of changes, and I say, do it every two weeks, for example, I have 25, 26 chances, let's say 25, it's Christmas time. We take some time off.

I have 25 chances to get it right, and if I don't get it right, I've only got two weeks before I get to try again. Incremental changes are really important, so that's what we're going to do for all of these. We're just going to increment these step by step. Sandeep Kishore of HCL technologies really says it simply, the next big thing is incremental change. It's really important to adopt this notion of incrementalism, step by step by step, no big bets for the company.

Now here's another one that's really important, and that is when you're designing your architecture and implementing your architecture, you want to focus on this notion of loose interop instead of tight integration. When I'm selling products as a kit, I want tight integration because I want my customer to have to depend on all the parts, but when I'm trying to build my own ecosystem, my own platform, my own product space, I need loose interop. I need this ability to share the functionality without having to share all the models.

I should have my own individual businesses, my own data, my own applications, but I should be able to communicate them, and that's communicated to this idea of a semantic layer. You've heard organizations talk about ontologies or vocabularies, talk about description languages or when you talk about domain driven design, there's the idea of ubiquitous languages and anticorruption layers.

These are all notions that go after this idea of loose interrupt, loose interrupt is what allows you to deploy independently. If I have tiny integration, then I have to deploy multiple things at the same time. We don't want a distributed monolith where everybody has to march along. What we want is this independent deployability.

Michael Platt, who works at Microsoft, he now works in their blockchain division, talks about it this way, he says “Interoperation is peer to peer”, we're peers, right? We're equals in this case. “Integration is where another system is subsumed”, I'm in control. I swallow this up, right? We don't want that. We want peer to peer. We want to act as equals, team after team, after team in the organization, inter-operating together. And they do that through interfaces. Now, this other thing is just this general notion of continuous improvement. What does continuous improvement really mean? The idea of continuous improvement goes back a long way to a thing called a Schwartz Cycle in the forties and fifties.

It’s this plan; do; check; act cycle of activity. So, I make a plan. I do that. I check the results. I decide what my next step is going to be. Plan; Do; Check; Act, over and over. And that's how we raise the bar. That's how we get better at whatever we do. Support continuous improvement.

This is all of what happens in the Toyota Way. This is what a lot of the continuous delivery is about, is this notion of continuous improvement. One of the key proponents of continuous improvement is Edwards Deming and he had this great line: “Management's job is to improve the system.” Not to improve the product, not to improve the employees, not to improve the customer; improve the system, so that everyone else can do their job better. At Netflix, they talk about this a lot. At Netflix, they say “If you're a leader, it's your job to find out whatever your team needs, give it to them and let them do their job. Your job isn't to tell them how to do something, your job is to ask them how you can make it easier for them. That's improving the system.

This idea of rinse and repeat says “Take all those things we've talked about before, now make all your changes incremental. Aim for interoperability when you're trying to decide what to do and constantly improve. Learn. If you set out a team to do a bunch of proxy marching, they're going to learn how to do it, have them teach other people how to do their proxies. If you have a bunch of people who figure out a smart way to do refactoring, have them teach others”, over and over, learn again and again and again. And that's the key. All right. So, what have we really covered here?

We've had this whole journey, this idea of migration.

Focus on unlocking value, do this because you want to unlock capabilities that you know people want to use. Don't do this as an abstract exercise. Don't do this because you read it in a magazine, do this because you know what this is going to improve.

Change just one thing at a time, don't get excited. No big bets. I know this gets really hard, especially with budgeting. You have to make promises if you're going to get a big piece of budget, but you have to promise to make constant change over and over, improve it all the time. Don't promise that by next year or next month or next week, it's all going to be better. Promise it's going to get a little better every single day, and that's really important.

That means stabilizing the interface, using proxies as your first step to stabilize what's going on and get incoming information so that you know who's using what.

Transform your existing implementations to make them better. Can we make this faster? Can we make this safer? Can we make this more reliable? Can we make this use less space? Can you make this run more efficiently in some way?

Then, start looking at adding functionality. Add features in this side-by-side manner, in this secondary component way, through consuming other people's APIs and creating your own facade, that's what you want to do.

And then figure out how you can constantly rinse and repeat, be good at doing it over and over and over and over again, because that's really what you're going to do. That's really what it's about, rinsing and repeating.

I have a great story we learned from Netflix. That is this notion of, they moved from on-premise servers to on the cloud. They moved all their servers from on-premise to the cloud and they wrote it up in their technical blog, and I think they finished this like three years ago. This is a project they've been working on. They completed it three years ago. They said “We just moved our lives streaming server from on-premise to the cloud, and it only took us eight years”, eight years to migrate the streaming servers from on-premise in all the parts of the world to the cloud. That's why we do this continuously. This is not a race to the end. We want to add speed to the system, not speed through putting it together.

That's really important because in that process of stabilizing, transforming, amending, and repeating over and over again. That's your plan. That's your strategy for improving things every day. You can ask yourself, what did we improve today? What did we make better today? It might be that we made code better today. Tomorrow we'll make production better, but that's what you want to do over and over. And that's how you get ahead. And that's how you handle the idea of service migration at speed.

Hopefully that helps. We've got a couple of minutes for questions. I hope maybe we have a few questions. I encourage you to type them into the chat, and I'll answer them, and then we can kind of take it from there.

I'll just leave this slide up. By the way, I will put the slides online and I'll post it in the chat, at the main room and then I'll also post it on Twitter. So, @mamund is my Twitter handle and you'll be able to find them there. All right.

Any other questions at all? I'd be happy to answer. I know we talked about a lot of things. I'll just wait a second here. Wish we had some music or something in the background, but we don't. Ah, there we go. One quick question.

How do we measure the simplicity and complexity? Not a simple answer. Here's the way I think about the dangers of getting too complex, too fast. There's been a study. I think you may know Conway's Law. This idea of the organization that you have is the system you get. They're tied together.

There's another person who's worked on it on a very similar set of things. It's this idea that the number of people in your team affects the quality of the communication inside the team. I can't think of the person's name. I'll think of him here shortly in a minute. So the more people you add, the harder it is to get things done. Adding more people makes things worse. Same thing for services, the more services you add, the harder it is to get things done. Adding more services makes it harder.

So make sure you're always sort of within scope, make sure that you keep things simple in the sense that you know what your goal is, you know how you're going to solve it. And that's where this incrementalism works really well. Solve simple problems because when you solve a lot of simple problems, you come up with a very successful organization, a very successful system. Hopefully that helps. Start small. If it looks complex, it probably is. Hopefully that helps. Okay.

How do you approach things when you're building an entirely new platform, but not trying to move from a monolith to microservices? Actually, that's a really good question too. Here's the thing. So, think about it from a startup standpoint, right? Maybe you're a startup organization. Your start up has four or five people, right? Maybe you've got one person who has the frontend, one handles the data, one handles the middleware, is the architecture, the designer, the dev ops. There's a small team.

In that first day, when you do your first release, how many microservices are you going to release? One. When you're working in the small, you only need one service. Because that's what your group is in charge of, one service. If you're starting something brand new, build one thing. Don't build a platform, don't plan the entire, like don't say, we're going to build the great wall of China. We're going to build all the pyramids in one day or whatever the case may be, we're going to carve Mount Rushmore in an afternoon. Just work on one thing, start with one thing, and when your team gets so big that you can't really discuss things well, that's when you need to break into smaller teams.

Now you have two services. Usually somewhere around 13 or 14, you'd like to break into smaller teams, teams of five or six or seven again, and then you break that up and now you have two services. So, the idea of microservices is actually this way of trying to keep things small, so you have small teams managing small bits on their own independently. And when the team gets too big and you have to have too big a meeting, break it up into smaller parts. So, start really small. Hopefully that helps.

Where does this start for large enterprises? Oh, that's a really good one too. Remember I talked in the beginning that you want to do what we learned from Adrian Cockcroft in Netflix is “start with the smallest thing.”

Very often in enterprises, what you really want to do is you want to find what I call “co-conspirators” or somebody that is willing to take a chance.

You know what? Pick our team. We'd like to learn something new. Find somebody, somewhere in the organization that's willing to take a chance with you, and then start working with them. Instead of coming to someone and saying “You! We've decided you are the one that's going to have to change the way you do things; you're going to do dev ops. You're going to do continuous…” Don't do that. Find somebody who wants to experiment, and then use them and keep that small. The same thing we were talking about earlier, as if you were starting something new, keep it super small. So, find someone in the enterprise that wants to be your mate, your partner in it. And that may include product managers, too. Product managers may say “Look, I really need some help here.”

What's the measure of a microservice? How small and how big? I'll answer this last question and then we can take some offline. I'm sorry, I’m not answering all of them. A great line for my buddy, Ronnie Mitra, who's a coauthor of both of the books that I talked about. He said “Microservices should be small enough and no smaller”. He stole that from Einstein, I think. But the idea is, you want to keep them simple, but don't get excited about the exact size. When I look at organizations, if they're working in a monolith and they want to break something out, I say “Just take out the search or the user management. Don't break it into 15 services. Just take one out and see if that really works”, goes back to incremental change. Don't worry about “Is this the right size?” in terms of lines of code or something like that. Worry about solving your problems step-by-step and I think you'll do okay. Taking it incrementally is always a good win.

Listen, I've run out of time. We've got lots of other things to do today. I want to thank you very much for everyone that stayed with me today. Hopefully this was interesting. I'll post the slides later. Have a great day and thank you very much. Thanks. Bye, bye.

See Highlights of
Wurreka

Hear What Attendees Say

“Once again Wurreka has knocked it out of the park with interesting speakers, engaging content and challenging ideas. No jetlag fog at all, which counts for how interesting the whole thing was."

Cybersecurity Lead, PwC

“Very much looking forward to next year. I will be keeping my eye out for the date so I can make sure I lock it in my calendar"

Software Engineering Specialist, Intuit

“Best conference I have ever been to with lots of insights and information on next generation technologies and those that are the need of the hour."

Software Architect, GroupOn

Hear What Speakers & Sponsors Say

“Happy to meet everyone who came from near and far. Glad to know you've discovered some great lessons here, and glad you joined us for all the discoveries great and small."

Scott Davis, Web Architect & Principal Engineer, ThoughtWorks

“What a buzz! The events have been instrumental in bringing the whole software community together. There has been something for everyone from developers to architects to business to vendors. Thanks everyone!"

Voltaire Yap, Global Events Manager, Oracle Corp.

“Wonderful set of conferences, well organized, fantastic speakers, and an amazingly interactive set of audience. Thanks for having me at the events!"

Dr. Venkat Subramaniam, Founder - Agile Developer Inc.