Global Event Rulesets: Streamlining Alert Routing Across Services

·

8 min read

In the fast-paced world of organizations handling numerous microservices and projects, tackling the challenges that arise can be a daunting task. As many of our customers come with infrastructures that included a large number of microservices we set out to make it easier for them to streamline alert source management.

Enter Global Event Rulesets (GER). This feature is designed to redefine the way you manage alerts. With GER, we're on a mission to simplify and streamline the setup process, allowing you to create rules that effortlessly direct alerts to the right service without the hassle of maintaining separate webhooks for each one.

Say goodbye to manual configurations and the complexities associated with alert management; GER's centralized approach is set to make your life easier and your alerts more efficient.

This feature is available to everyone who takes Squadcast on a spin via our 14 day free trial!

Understanding the current behavior

At present, within the Squadcast ecosystem, alert sources are linked to services through the utilization of service webhooks. This approach results in the aggregation of all events originating from diverse alert sources under a unified service umbrella. Once an alert source is integrated into a service, the automation process is initiated. Within this pipeline, a set of actions into motion, encompassing tasks like tagging, routing, deduplication, and suppression—all of which are executed at the service level.

Challenges associated with this approach:

  • Alert Routing Complexity: The absence of automated alert routing makes it challenging to direct alerts to their designated services efficiently.

  • Escalating Manual Workload: Configuring each alert source to a service demands a substantial amount of time and effort, especially for larger enterprises.

  • Absence of a Unified Endpoint: The absence of a centralized endpoint adds complexity to the management of alert sources, making it harder to integrate and oversee them effectively.

  • Scalability Concerns: As organizations expand, the absence of predefined rules for incident management becomes a significant hurdle, making it increasingly difficult to scale operations smoothly.

How Global Event Rulesets (GER) Help

The Global Event Rulesets (GER) feature redefines alert management by introducing a unified global ruleset. GER employs a unique endpoint featuring a shared routing key for each alert source. This innovation empowers users to establish a rule encompassing multiple alert sources, along with their corresponding configurations, ensuring that alerts are seamlessly routed to the designated service when specific criteria are fulfilled.

Here’s an example of a sample endpoint:

https://api.squadcast.com/global/v1/<alert_source_shortname>/<global_routing_key>

view rawger_sample hosted with ❤ by GitHub

Example for Datadog:

  1. api.squadcast.com/global/v1/datadog/<GLOBAL_ROUTING_KEY> will be the endpoint used for Datadog

  2. GLOBAL_ROUTING_KEY is common key for all the alert sources

After creating a ruleset, users gain the capability to incorporate alert sources and link individual rules to each specific alert source. When an event is received, the routing pipeline determines the most suitable service for the alert. Subsequently, automation rules are triggered upon the alert's arrival at the designated service, provided they have been configured.

Here's an illustrative example of how to establish a ruleset featuring multiple alert sources and their respective rules. Once the ruleset is associated with an alert source, users enjoy the flexibility of adding multiple rules for that source, each tied to a corresponding service for routing. Whenever a rule's expression evaluates as true, alerts are routed to the designated service.

How to Setup and Use Global Event Rulesets (GER)

Prerequisite

  • To effectively create and manage Global Event Rulesets, the user assigned to the Team must possess the appropriate permissions corresponding to their User Role.

Add Ruleset

To add new rulesets,

Step1: Navigate to Global Event Rulesets and Add New Ruleset.
Step2: Next, add the Ruleset Name, optional Description, and select the Ruleset Owner.
Step3: Click Save, and you're done.

Please Note:

  1. You can create and manage up to 30 rulesets for each Team.

  2. A Ruleset Owner is a user or a Squad that someone can reach out to, for anything pertaining to that ruleset. There are no permissions associated with the ownership here.

Details for added Ruleset in Squadcast for Incident Management

Details for added Ruleset

This creates a new ruleset, and the next step is to add alert sources and start creating rules for your ruleset. If you would like to create multiple such rulesets, each with individual endpoints, repeat the above steps as needed.

Please Note:

  1. You can edit or delete a ruleset from its detail page.

  2. Deleting a ruleset will remove all the mapped alert sources and their rules.

Add Alert Sources

To add alert sources to a ruleset,

Step1: Navigate to Global Event Rulesets and select the relevant ruleset from the list.

Step2: Click Add Alert Source and in the side panel, search and select the alert source you wish to create a rule for and click Add.

Please Note:

1. You can only add one alert source at a time.

2. Deleting an added alert source from the ruleset will result in all its rules getting deleted.

Add Alert Source in GER in Squadcast for Incident Management

Add Alert Source

Added Alert Source in GER in Squadcast for Incident Management

Added Alert Sources

Add Rules

Event rules allow you to set actions that should be taken on events that meet your designated rule criteria. In the current version, the only action that the system takes is routing of incoming alerts.

To add rules for an alert source,

Step1: Navigate to Global Event Rulesets. Select the relevant ruleset from the list.

Step2: For your added alert source, click Add Rule.

Step3: In the side panel, provide a Rule Description and create the Rule Expression, referring to the payload data available on the right.

Step4: Lastly, designate the Service for routing when the rule expression is met. Click Save.

Please Note: You can create and manage up to 1000 rules for each alert source.

Add Rules for an Alert Source in Squadcast for Incident Management

Add Rules for an Alert Source

View and arrange priority of added Rule in Squadcast for Incident Management

View and arrange the priority of added Rule

To manage the order of rule execution, simply use the arrows to rearrange the priority of these rules.

Please Note:

  1. The payload you see on the right may be a sample payload provided by Squadcast for the selected alert source, if you have not set up alert source webhooks and started receiving alerts yet. If the webhooks have been set up and you are receiving alerts, then you will see the payload of the latest alert for that alert source.

  2. Also note that, if alert sources support multiple types of payloads for different events, please ensure you refer to the documentation of your alert source for the different payload structures.

  3. You will see only the Services for the selected Team.

Important: If you intend to delete a Service in Squadcast that is associated with a Global Event Ruleset, please ensure that you delete the rule first. Otherwise, you will receive a warning message similar to the one described below.

Example: Alert Source: Admin Labs

{

"webhookId": "5e3378c2-275d-11e8-89db",

"monitorId": "1afb2342-2754-11e8-89db",

"monitorName": "Example",

"monitorAddress": "http://example.adminlabs.com/example.html",

"stateChange": "down",

"outageId": "4fd5c5df-275d-11e8",

"outageStartedAt": "2018-03-14 08:57:09",

"outageEndedAt": null,

"maintenanceId": null

}

view rawger hosted with ❤ by GitHub

Example Rule Expression:

payload.stateChange="down"

view rawger_payload_change hosted with ❤ by GitHub

Catch All Rule

Any alerts that are sent through event rules but do not match any are routed to the Service configured in the Catch All Rule. If the Catch All Rule is empty, the outlier alert is simply dropped from the system. Configuring this helps in making sure no alerts are missed, that is, every incoming alert ends up reaching a Service.

Please Note: This is not mandatory, but we highly recommend having this configured.

Adding a catch-all rule

Step1: Navigate to Global Event Rulesets and Select the relevant ruleset from the list.

Step2: For your added alert source, click Add Catch All Rule and select a Service.

Step3: Click Save.

Add Catch All Rule for an Alert Source in Squadcast for Incident Management

Add Catch All Rule for an Alert Source

View Added Catch All Rule in Squadcast for Incident Management

View Added Catch All Rule

In conclusion, Global Event Rulesets play a vital role in enhancing alert management. It offers a more efficient, time-saving, and user-friendly way to handle alerts, catering to the needs of organizations dealing with a complex web of microservices. Embracing Global Event Rules means embracing simplicity and efficiency in the realm of alert routing.

We trust that our Global Event Rulesets feature will bring us one step closer to achieving our objective of offering the utmost user-friendliness, providing you with a seamless experience. We encourage you to take Squadcast on a spin via our 14 day free trial and give this new feature a try! Do share your thoughts or feedback in the comments or with our support team. Cheers!

Squadcast is a Reliability Workflow platform that integrates On-Call alerting and Incident Management along with SRE workflows in one offering. Designed for a zero-friction setup, ease of use and clean UI, it helps developers, SREs and On-Call teams proactively respond to outages and create a culture of learning and continuous improvement.