Blog

Monitoring Amazon Connect With CloudWatch Dashboards and Alarms

The Amazon Connect Administration Guide describes how Connect sends data to CloudWatch so that you can monitor key operational metrics about your contact center. This post walks through an example implementation of such monitoring using a CloudWatch dashboard and CloudWatch Alarms.

You can deploy the solution for your own Connect instance in about five minutes using the provided CloudFormation template. Optionally, you can enter an email address to receive notifications when certain key metrics breach preconfigured thresholds.

Architecture Overview

The components for monitoring and alerting are regular CloudWatch services with an SNS Topic and SNS Subscription for sending alarm notifications by email.

Amazon CloudWatch Architecture Overview
Amazon Connect, CloudWatch, CloudWatch Alarms, SNS Topic, SNS Subscription

CloudWatch Dashboard

The CloudFormation template also creates a CloudWatch dashboard showing a selection of key metrics.

Amazon CloudWatch Dashboard
CloudWatch Dashboard for Amazon Connect Metrics

The Dashboard shows the following Connect metrics:

  • Concurrent Calls (Max and Average per minute)
  • Total Calls (Sum per hour)
  • To Instance Packet Loss (Max and Average % per minute)
  • Missed Calls (Sum per hour)

In addition, we’ve included the following Lambda metrics:

  • Lambda Execution Duration
  • Lambda Errors

CloudWatch Alarms

The CloudFormation stack will create CloudWatch Alarms for a subset of the metrics. We do not create alarms for all metrics because some are purely informational and do not need to raise an alarm when some arbitrary threshold is crossed. Total Calls is one such metric.

Concurrent Calls on the other hand is a key operational metric with a default limit of 100, so we need to know if we’re getting close to that limit. (The Concurrent Calls limit can easily be increased by opening an Amazon Support ticket).

Lambda Execution Duration is included because integration with 3rd party data APIs via Lambda is a critical component of many of our Connect contact center solutions. The maximum execution time of a Lambda function invoked by Connect is currently 8 seconds. For an optimal customer experience we must ensure all Lambda functions complete within this time frame, preferably much quicker. We would like to receive an alert if any of the Lambda functions in our account are greater than 5 seconds.

In a full monitoring solution we would also include metrics from other key AWS services used with Connect such as Lex, Dynamodb, and S3, but for the purpose of this example we’ll stick to Connect and Lambda.

Connect CloudWatch Metrics Explained

The Admin Guide says the metrics from Connect are found under the AWS/Connect namespace, and “In CloudWatch, a dimension is a name/value pair that uniquely identifies a metric. In the dashboard, metrics are grouped by dimension.” Let’s look at some specific examples. If you are familiar with SQL, it may be useful to think of Namespace as the db schema, MetricGroup as the table name, MetricName as the column you want to select, and Dimensions as similar to ‘where’ clauses.

Example 1

Get Concurrent Calls

  • Namespace: AWS/Connect
  • MetricGroup: VoiceCalls
  • MetricName: ConcurrentCalls
  • Dimension – InstanceId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Example 2

Get To Instance Packet Loss

  • Namespace: AWS/Connect
  • MetricGroup: ToInstancePacketLossRate
    (There is no MetricName property needed here because there is only one metric available in the group)
  • Dimension – InstanceId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  • Dimension – Participant: Agent
  • Dimension – Type of Connection: WebRTC
  • Dimension – Stream Type: Voice

Example 3

Get Queue Size for a queue by name

  • Namespace: AWS/Connect
  • MetricGroup: Queue
  • MetricName: QueueSize
  • Dimension – InstanceId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  • Dimension – QueueName: BasicQueue

Queue Size is not included in the CloudFormation template provided but it is included here to show that you must provide the name of the queue as a Dimension otherwise all queues will be included in the Dashboard or Alarms. Also, it is not possible to fetch and iterate over all the Queue names using plain CloudFormation.

Deploy the CloudFormation Stack

Time to deploy – about 5 minutes

We will use the provided AWS CloudFormation template to automate creating all the resources described in this post. CloudFormation is an AWS tool to provision AWS infrastructure deployments predictably and repeatedly. A template file can be used to create, update or delete a collection of resources together as a single unit called a stack. See the AWS CloudFormation Documentation for full details.

Step 1. Log into the AWS Console before continuing.

Step 2. Choose Launch Stack in the same region as your Connect instance.

North Virginia (us-east-1)

 

Oregon (us-west-2)

 

Frankfurt (eu-central-1)

 

Tokyo (ap-northeast-1)

 

Sydney (ap-southeast-2)

 

Step 3. Set the stack parameters.

Under the Environment section of the CloudFormation template Parameters form, set the following values:

  • InstanceId– the Connect Instance ID you wish to monitor. Example, a9650f4c-3a38-41ce-9d24-16032d3aea0d
  • PrimaryEmail– optional, an email address to receive notifications when an alarm is triggered. (Confirmation of subscription to SNS Topic required)
  • DashboardName– a name for your dashboard, the default is Amazon-Connect

Use the default values for the remaining parameters. You can always change these later to suit you alerting preferences.

Amazon CloudWatch Stack Parameters
CloudFormation Stack Parameters

Configuration of each metric

Each metric has 3 configuration properties.

Property Description
Threshold A value that triggers the alarm when exceeded a number of
times in a row (consecutively for Evaluation Period x Evaluation Periods).
Evaluation Period The time window in which the Threshold value is checked.
Evaluation Periods The number of evaluation periods in a row the metric needs to exceed the Threshold for the alarm to be triggered.

1. Concurrent Calls

Description: The number of concurrent active voice calls in the connect instance during the evaluation period. All active voice calls are included, not only active calls that are connected to agents.
Resolution: Open a support ticket with Amazon and request a Concurrent call limit increase for the instance reported.

Parameter Default Value
Threshold 90 (The default limit for concurrent calls is 100 so we want to be alerted if we are getting close to this number)
Evaluation Period 60 seconds
Evaluation Periods 3 (raise the alarm if the threshold is crossed for 3 evaluation periods in a row)

See the Connect limits documentation https://docs.aws.amazon.com/connect/latest/adminguide/amazon-connect-service-limits.html

2. Throttled Calls

Description: The number of voice calls that were throttled by the Amazon Connect service because the rate of calls per second exceeded the configured limit for the instance during the evaluation period.
Resolution: Open a support ticket with Amazon and request a Callrate limit increase for the instance reported.

Parameter Default Value
Threshold 1 (any throttled calls in one, one minute window will raise the alarm)
Evaluation Period 60 seconds
Evaluation Periods 3

3. Missed Calls

Description: The number of voice calls that were missed by agents during the evaluation period. A missed call is one that is not answered by an agent within 20 seconds.
Resolution: This is behavioral and does not have a systems operation type resolution. Lowering the number of missed calls might be done by adding agents or training.

Parameter Default Value
Threshold 20 (20 incoming call per minute, for 1 minute, will raise the alarm)
Evaluation Period 60 seconds
Evaluation Periods 1

4. To Instance Packet Loss Rate

Description: The ratio of packet loss for calls in the instance, reported every 10 seconds.
Resolution: If you see a sudden spike in packet loss, start by reviewing your local network for any recent changes.

Parameter Default Value
Threshold 1% – entered as a value between 0.01 (1%) and 1 (100%)
Evaluation Period 60 seconds
Evaluation Periods 3

5. Lambda Execution Duration

Description: The maximum execution time for any Lambda function in the account.
Resolution: If the Lambda functions invoke 3rd party APIs and you see a sudden spike in execution time, check if permissions have changed, or if network setting have changed.

Parameter Default Value
Threshold 5000 milliseconds (one execution greater than this value, in one, one minute window, will trigger the alarm)
Evaluation Period 60 seconds
Evaluation Periods 1

6. Lambda Execution Errors

Description: A Lambda function in the account failed to return usable data.
Resolution: Check the following: Lambda timeout, out of memory, bad input params, failure to get a response from a downstream service, failure to return a 1 level deep data map for Connect.

Parameter Default Value
Threshold 1 (one error, in one, one minute window, will trigger the alarm)
Evaluation Period 60 seconds
Evaluation Periods 1

Step 4. Open the CloudWatch Dashboard

After the CloudFormation stack deployment completes successfully, navigate to the Outputs tab in the AWS Console. Follow the link next to the key named Dashboard to open the CloudWatch Dashboard created by the template. This will open in the CloudWatch area of the AWS Console. You can also confirm the alarms were created by clicking on the Alarms heading in the left hand menu.

Dashboard Usage

The CloudWatch Dashboard is a read-only wall-board view of the Connect instance health.

You can use the built-in AWS Console functionality to make the dashboard full screen.

Amazon CloudWatch dashboard actions
Fullscreen Dashboard

You can also set a dark theme and set the dashboard to continuously update every minute.

Fullscreen Dark with Refresh

Alarms Usage

If you configured an email address, you will receive an email every time one of the CloudWatch metric alarm threshold values is crossed. The remediation steps to take will depend on which alarm was activated. The body of the email will describe what steps to take. The information in the alert bodies is kept deliberately short for security compliance.

If you need to raise any of the metric alarm thresholds, find the stack in the CloudFormation area of the AWS console, choose Update, use same template, change the parameters and update the stack.

Caveats

Creating a Dashboard in CloudFormation takes some code gymnastics because the dashboard has to be embedded as a JSON string. Fortunately, the CloudFormation FN::Sub intrinsic functions can be applied to the JSON to substitute the Connect Instance ID. Unfortunately, it’s not so easy to replace the annotation values in the Dashboard widgets that illustrate the alarm thresholds. If you change an alarm threshold, you will need to manually change it in the dashboard JSON, or in the deployed dashboard. (The annotation is a visual reference only and is not linked to the actual threshold configured in the alarm).

The Lambda execution duration alarm includes all Lambda functions in the AWS Account. In a production setting or an account being used for multiple parts of your business, you would want to filter by Dimension – FunctionName to set up alarms for specific functions.

Conclusion

In this post, we showed how to use Amazon CloudWatch Dashboard and CloudWatch Alarms to monitor key operational metrics of your Amazon Connect instance. We configured and deployed a CloudFormation template to create a Dashboard, CloudWatch Alarms, SNS Topic and optional SNS Subscription to send email notifications when alarms are raised.

Amazon Connect delivers many other metrics to CloudWatch and other SNS subscribers can be added to integrate CloudWatch Alarms with other messaging services like PagerDuty, Amazon Chime or Slack chat rooms. In a future post I will describe how to configure Amazon Chatbot to monitor your Connect instance from Slack and Amazon Chime team chat rooms.

Feel free to contact us if you’d like to discuss you specific Connect operations monitoring requirements or would like to find out more about Voice Foundry’s Connect managed services.

X