How to Fix AWS CloudWatch Alarm Incidents

Here are the steps to fix incidents that trigger from AWS CloudWatch Alarms.

AWS CloudWatch Incidents

  • You can create alarms in the AWS CloudWatch on a specific metric. A metric can be failed login attempts, CPU usage, Server is down etc.
  • When an alarm triggers ( after it crosses the defined threshold limit, as configured), a Remedy ticket automatically triggers the support team.
  • The support (operations) team should analyze the root cause of the issue.

How to resolve AWS incidents: 7 steps

AWS remedy incidents


Check the ticket description to know the AWS account (It may get from Production, Development, or Integration accounts). Then login into that account using your IAM role credentials.


After you log in, in the AWS console search for CloudWatch.


Next, click on the alarms list, and go to the alarm in question. It could be in the red, which means in alarm state.


Now, you can verify the graphical display and timestamp when the alarm triggers (timestamp).


Go to the log groups list, and select the log-group in which the alarm metric is defined. Below, you will find the relation between the log group and the alarm.


Select the relevant log group, and go to the latest log stream. The other way, you can find the log stream by filtering the time.


Once you get the event details, you can check the error and find the root cause of the issue.

Keep reading


Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.