Setting up Contact Trace Records for Analysis with Amazon Athena
Amazon Connect has excellent reporting features, both for real-time and historical analysis. Contact Trace Records (CTRs) are the primary source of data that Connect collects for every call that occurs within your contact center and are used for reporting and analysis. These records are only stored for 24 months after creation, so they don’t last forever.
CTRs contain a lot of details about the call and one of the most important parts are Contact Attributes. While these are not a necessary feature of CTRs, using them within your contact flows can yield a lot of useful data concerning the customer experience and record important events and details that occur during a call. Amazon Connect does not support searching and reporting based on these Contact Attributes, however. They can only be viewed when looking at the details of an individual CTR.
So how can we store permanently store CTRs and make them available for analysis with other AWS Services? Keep reading to understand what services are needed and how to set them up to use with Connect.
AWS Services Overview
- Amazon Connect will be our source of data by generating CTRs for all calls going in and out of your contact center.
- Check out https://voicefoundry.com/amazon-connect-data-sources-part-1/for more information regarding CTRs and other Connect Data.
Amazon Kinesis Data Firehose
- Amazon Kinesis Data Firehose is a managed data streaming service that will transport our CTRs from Connect to Amazon S3.
- With unlimited capacity and scalability, Amazon S3 is the best choice for storing our CTRs and it easily integrates with all the other services we need.
- AWS Glue is another managed service which stores the metadata and database definitions as a Data Catalog (database table schema) that we will use with Amazon Athena, based on our CTR data structure.
- Amazon Athena gives us the power to run SQL queries on our CTRs in S3, using the Data Catalog from AWS Glue.
Note: Some configuration settings are not fully supported through CloudFormation so we’ll be setting up everything manually through the AWS Console.
If you already have an S3 bucket setup that you wish to use to store CTRs, just take note of the bucket name. You can also use the S3 bucket that is used by your Connect instance to store call recordings and exported reports. Otherwise, you will need to create a new S3 bucket.
- Navigate to the S3 Service Console in AWS.
- Click on the Create Bucket button.
- Enter a unique name for your bucket. Remember this must be a unique name world-wide.
- Click Create as the default settings are enough, the bucket will already be private.
The next service we are going to set up is AWS Glue.
- Navigate to the AWS Glue Service Console in AWS.
- Start by selecting Databases in the Data catalog section and Add database.
- Enter the desired name for your database, and optionally, the location and description.
- Click on your newly created database. From here you can update the optional information if needed.
- When you click on Tables from the left-hand side, you will see all tables in that region. By clicking on your Database and then View tables, the console will automatically filter the tables to show only those for the selected database.
There are a couple of ways to create definitions for your table: manually, or with a crawler. With the manual option, you can specify the table schema yourself. With a crawler, you can schedule or run an on demand a job that can go through your data to attempt to determine the schema for you. In this case, we will go through both options; we will setup the initial table manually, then add additional definitions using a crawler.
- Click on Add tables, then Add table manually.
- Enter your table name and select the database you want it to belong to.
- On the next page, select the S3 bucket to use as your Data Store. This will be the bucket that you wish to use to store your CTRs.
- For the Data format, choose Parquet. You then need to Define a schema by clicking on the Add Column. Add the following columns and data types:
Now you have a table that resembles the data from our Contact Trace Records. Review and finalize your table configuration. We’ll come back to Glue in a little bit.
Amazon Kinesis Data Firehose
The next service we will create is an Amazon Kinesis Data Firehose. This will deliver the CTRs from Connect to S3.
- Navigate to the Amazon Kinesis Dashboard in AWS.
- Select Create delivery stream under Kinesis Firehose delivery streams.
- Enter your desired stream name.
- Choose the source of data, in this case, select Direct PUT or other sources. The other option is to get data from Kinesis Data Streams which is useful if you have other needs with the CTRs but in our case it’s not necessary. Click Next.
- Under Process records, keep Record transformation disabled. Here you can invoke a lambda function to transform your data records prior to delivering to S3.
- We do want to enable Record format conversion. Select Apache Parquet.
- Next, we will integrate our new Glue database. Select your AWS Glue region, the Glue database and table we created earlier. Select Latest for table version.
- Next, choose Destination. Due to enabling format conversion, S3 is the default and only option. So, we’ll need to select the bucket we created earlier.
- For the prefix, you can leave this blank. Firehose automatically distributes records using the following folder structure “YYYY/MM/DD/HH.” You can alter this if needed but that structure format will work for us.
- It is good to set an error prefix, such as “error/”. With this option, any records with delivery errors will get sent to this folder, isolated from other records.
- You can also specify a S3 backup destination which will store untransformed records. We’ll keep this disabled for now.
- Keep the default buffer size of 128 MB and set the buffer interval to 60 seconds.
- You can set S3 compression, encryption, error logging, and tags if desired.
- The final step is to create or choose an IAM role for Firehose to access S3 and any other needed services. Clicking the link will open a new page to create or choose a role. The default role is good enough, just change the Role name as needed. Click Apply.
- Review your Firehose configuration and finish creating. It will take a minute or two to create.
Next, we need to configure Connect to send CTRs to your new Kinesis Data Firehose.
- Navigate to your Connect Instance in the AWS Console.
- Select Data streaming from the left-hand side.
- Click the checkbox “Enable data streaming” if not already enabled. This will display options for Contact Trace Records and Agent Events. Under Contact Trace Records, Select Kinesis Firehose and select the Firehose we just created.
- Click Save and AWS Connect will update the instance and its own IAM role to access the selected Firehose.
We’ve now created all the Services we need; the only thing left is data. If your Connect instance is already live, just wait a bit for new CTR’s to be generated otherwise, start making some test calls. After you’ve got some new CTRs, check out the Amazon Kinesis Firehose in the Console. Select your Firehose and click on the Monitoring tab. You should start seeing some metrics pop up.
If it’s been more than five or so minutes with no metrics, you might want to check permissions as this is the most likely issue. If you see data in DeliveryToS3Records, you are good to go.
Navigate to the Amazon Athena Console. You don’t need to set anything up in Athena. Athena automatically looks at your Glue Data Catalog and shows your available Databases you can query. You should see your database in the drop down and tables underneath. You can simply click the three vertical dots to open a small menu and select Preview table to run a simple query. Or enter in the query tab:
SELECT * FROM "<your-database>"."<your-table>" limit 10;
Click Run query. You should see your CTRs pop up in the results window below. In the Results window, you can export your results as CSV. Your CTR data can now be queried using SQL. You’ll notice that currently, some columns have JSON strings in them. Athena allows you to query keys within the JSON using json_extract() like this:
SELECT json_extract(attributes, '$.<attribute-key>') AS "<attribute-key>" FROM "<your-database>"."<your-table>";
Visit https://docs.aws.amazon.com/athena/latest/ug/querying-athena-tables.html for more information on Querying Data with Athena.
Going back to AWS Glue for a moment, we can now create a Crawler since we have data in our S3 bucket. There are few more columns we can easily add to our table which will help speed up our queries as our data set gets larger and larger.
- Navigate back to the AWS Glue Dashboard.
- Select Crawlers from the left-hand side.
- Click on Add crawler.
- Enter desired name, tags and description are optional.
- Next, select Data stores as the Crawler source type. Choose your S3 bucket as the Data store. You can add additional data stores on the next page, but we don’t need one.
- You’ll then need to create an IAM role. You can create a new IAM role directly on this page which is convenient.
- Next, you will need to create a schedule for the crawler. Since the structure of the CTR is unlikely to change, you can just choose Run on demand. This will prevent the crawler from running unnecessarily and increasing costs.
- Now choose the database the crawler will write to. There are several options to choose from but for now, just select “Add new columns only” and “Mark the table as deprecated in the data catalog.”
- Review and finish.
Using the default configuration in setting up our table and crawler, there will be a small issue if you run the crawler first. In our initial setup, the ‘compressionType’ will be ‘null’ on the database table and ‘none’ for our Crawler. This will cause an error and the crawler will be unable to merge any new columns with the table. To fix this issue, you will need to edit our table and add the tag ‘compressionType’ as none.
- Select your table and then select Edit table details.
- Scroll down to the Table properties and add the following key-value pair: ‘compressionType’, none.
- If we run the crawler now, we should have no issues and we should see that it updated our table. Click Run crawler.
- Look at the table properties now and you should see four new columns at the end, partition_0 – partition_3. This is not CTR data but a reference to the folder structure in S3.
- Edit these columns as Year, Month, Day, and Hour. Be aware of the type that the column is set to, the default will be ‘string’ but in this case you could also select ‘int,’ which is reflected with the following query.
SELECT "contactid", "year" FROM "<your-database>"."<your-table>" WHERE "year" = 2019;
Although not noticeable with small data sets, once we are querying our data lake with millions of records, the ability to filter by these partitions will greatly speed up our queries and reduce costs since Athena charges by data scanned. In our case, this partitioning is built-in with Firehose and meets our needs, but you may have different requirements.
Check out https://docs.aws.amazon.com/athena/latest/ug/work-with-data.html for more information on working with source data and partitioning.
Amazon Connect has some great features to help you run a cloud-based contact center. It is designed to enable you to get started as quickly as possible and manage your contact center with minimal effort. Being a part of the AWS ecosystem, however, lets you leverage many other services so that you can get even more functionality from your contact center.
In a later post, we will dive further into data transformations and more complex table schemas. The next phase would be to transform this data prior to S3 delivery where you can change these JSON objects to more structured and easily queried data.