BT

InfoQ Homepage News To Microservices and Back Again - Why Segment Went Back to a Monolith

To Microservices and Back Again - Why Segment Went Back to a Monolith

Leia em Português

This item in japanese

Lire ce contenu en fran?ais

Bookmarks

While almost every engineering team has considered moving to microservices at some point, the advantages they bring come with serious trade-offs. At QCon London, Alexandra Noonan told how Segment broke up their monolith into microservices, then, a few years later, went back to a monolithic architecture. In Noonan's words, "If microservices are implemented incorrectly or used as a band-aid without addressing some of the root flaws in your system, you'll be unable to do new product development because you're drowning in the complexity."

Microservices were first introduced to address the limited fault isolation of Segment's monolith. However, as the company became more successful, and integrated with more external services, the operational overhead of supporting microservices became too much to bear. The decision to move back to a monolith came with a new architecture that considered the pain points around scaling related to company growth. While making sacrifices in modularity, environmental isolation, and visibility, the monolith addressed the major issue of operational overhead, and allowed the engineering team to get back to new feature development.

Noonan explained several key points in the evolution of Segment's architecture. The problems faced, and the decisions made at the time, sounded familiar to any experienced software engineer. Only with the advantage of hindsight is it clear which decisions could have been better. Noonan explained each major decision point on a timeline, and noted the pros and cons of each state of the system architecture.

In 2013, Segment started with a monolithic architecture. This provided low operational overhead, but lacked environmental isolation. Segment's functionality is based around integrating data from many different providers. In the monolith, problems connecting to one provider destination could have an adverse effect on all destinations and the entire system.

The lack of isolation within the monolith was addressed by moving to microservices, with one worker service per destination. Microservices also improved modularity and visibility throughout the system, allowing the team to easily see queue lengths and identify problem workers. Noonan pointed out that visibility can be built in to a monolith, but they got it for free with microservices. However, microservices came with increased operational overhead and problems around code reuse.

A period of hypergrowth at Segment, around 2016-2017, added over 50 new destinations, about three per month. Having a code repository for each service was manageable for a handful of destination workers, but became a problem as the scale increased. Shared libraries were created to provide behavior that was similar for all workers. However, this created a new bottleneck, where changes to the shared code could require a week of developer effort, mostly due to testing constraints. Creating versions of the shared libraries made code changes quicker to implement, but reversed the benefit the shared code was intended to provide.

Noonan pointed out the limitations of a one-size-fits-all approach to their microservices. Because there was so much effort required just to add new services, the implementations were not customized. One auto-scaling rule was applied to all services, despite each having vastly different load and CPU resource needs. Also, a proper solution for true fault isolation would have been one microservice per queue per customer, but that would have required over 10,000 microservices.

The decision in 2017 to move back to a monolith considered all the trade-offs, including being comfortable with losing the benefits of microservices. The resulting architecture, named Centrifuge, is able to handle billions of messages per day sent to dozens of public APIs. There is now a single code repository, and all destination workers use the same version of the shared library. The larger worker is better able to handle spikes in load. Adding new destinations no longer adds operational overhead, and deployments only take minutes. Most important for the business, they were able to start building new products again. The team felt all these benefits were worth the reduced modularity, environmental isolation, and visibility that came for free with microservices.

QCon attendees discussing the presentation sounded like typical engineers joining a project with a long history. Quick remarks such as, "Well, obviously you shouldn't do what they did," were countered with voices of experience pointing out that most decisions are made based on the best information available at the time. One of the key takeaways was that spending a few days or weeks to do more analysis could avoid a situation that takes years to correct.

 

Editor's note (Updated 2020-04-30) At QCon, Noonan also participated in a panel discussion, "Microservices - Are They Still Worth It?" which is now available to watch on InfoQ.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Monorepo

    by Nick Bauman /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Sounds like they missed a critical piece in microservices which is monorepo...

  • Microservices borundries

    by Ahmed Hammad /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    1) Microservices is not the simplest solution for fault isolation. In such situations, we should try to find simpler solutions.
    2) Microservices should be around system components or functionalities, not customers. Whatever the system, functionality components will be limited, but the customer's count could be a very large number

    Thank you so much for sharing your experience

  • Perhaps the wrong domain model

    by Ranjith K /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Building microservices based on new destinations didnt sound like the right domain model to use. Could that have been the issue that caused the operational overhead?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.
午夜电影_免费高清视频|高清电视|高清电影_青苹果影院