Here are the steps to fix incidents that trigger from AWS CloudWatch Alarms.

AWS CloudWatch Incidents

  • You can create alarms in the AWS CloudWatch on a specific metric. A metric can be failed login attempts, CPU usage, Server is down etc.
  • When an alarm triggers ( after it crosses the defined threshold limit, as configured), a Remedy ticket automatically triggers the support team.
  • The support (operations) team should analyze the root cause of the issue.

How to resolve AWS incidents: 7 steps

AWS remedy incidents

Step#1

Check the ticket description to know the AWS account (It may get from Production, Development, or Integration accounts). Then login into that account using your IAM role credentials.

Step#2

After you log in, in the AWS console search for CloudWatch.

Step#3

Next, click on the alarms list, and go to the alarm in question. It could be in the red, which means in alarm state.

Step#4

Now, you can verify the graphical display and timestamp when the alarm triggers (timestamp).

Step#5

Go to the log groups list, and select the log-group in which the alarm metric is defined. Below, you will find the relation between the log group and the alarm.

Creating an alarm from a log group in the AWS.
Image is adapted from AWS blogs

Step#6

Select the relevant log group, and go to the latest log stream. The other way, you can find the log stream by filtering the time.

Step#7

Once you get the event details, you can check the error and find the root cause of the issue.

Keep reading

Related

Fediverse reactions

Discover more from Srinimf

Subscribe now to keep reading and get access to the full archive.

Continue reading