AWS Athena with DynamoDB as Data sources in 5 minutes

Sarin Suriyakoon

22/11/23

3 mins read

Tech trends

DynamoDB is a NoSQL database offered by AWS. It’s designed for low latency and high availability, making it an optimal choice for applications that demand seamless scaling. With DynamoDB, AWS provides a feature-rich solution that equips you to handle growing workloads effortlessly.

Athena is a robust data analytics tool within the AWS ecosystem. It empowers users with the ability to query a variety of data sources using familiar SQL language. Whether you’re diving into business intelligence tasks or other analytical endeavours, Athena simplifies the process and enables you to derive meaningful insights from your data.

Connecting Amazon Athena to Amazon DynamoDB offers a powerful way to analyze data stored in DynamoDB using SQL queries. This integration can be valuable in various real-world scenarios:

Analyzing Large Datasets: If your DynamoDB tables store large volumes of data, running complex queries directly on the DynamoDB tables might be slow and resource-intensive. With Amazon Athena, you can run queries using SQL, which can be faster and more efficient for analyzing large datasets.
Business Intelligence and Reporting: Athena can help you generate reports and perform business intelligence tasks on the data in DynamoDB. You can use SQL queries to aggregate, filter, and transform data to gain insights into customer behaviour, sales trends, or operational metrics.

There are many more you can do with both tools, you get the idea. In order to connect DynamoDB as Athena’s data source you need to create a lambda function to be a connector between both tools but don’t worry AWS Console is pretty much telling you what to do.

Let’s get into the hands-on

Prerequisite

AWS Account(For experimental purposes, use an Admin account of your own)
DynamoDB Table with the dataset that you want to play with(You can use this : https://github.com/pacozaa/DynamoDB-Coffee-Data-Generator)
Cost should be FREE but you can set the AWS budget and check AWS Cost Explorer for your peace of mind.
Basic Lambda experience is a plus
Basic Athena experience is a plus
Basic S3 experience is a plus

Athena Part 1

Go to the Athena console then open the left-hand side menu and click Data sources

step2-data-sources-view

2. Click on Create data source

3. Choose Amazon DynamoDB as the data source

step3-select-dynamodb

4. Fill in the form — Add the Data source name that you’d like.

step4-data-sources-detail

5. Scroll down and you will see Connection details This is where you choose or create the Lambda function you’d like to use as a connector to your DynamoDB.

In case you don’t have it click the Create Lambda function This will open the AWS Lambda page with the Application configuration that is almost ready for you to deploy

step5-data-sources-detail-lambda

Lambda Function

AWS will bring you to the Lambda> Application creation page with the default setting for AthenaDynamoDBConnector

step6-go-to-lambda

Scroll down and you will see the form to fill in the Application name and SpillBucket

Application name -> Put in the custom name that you’d like
SpillBucket -> Put in the bucket name you’d like to use as SpillBucket

step7-lambda-config step11-lambda-config-5

3. Next ConnectorConfig

AthenaCatalogName->Put in a custom name that you’d like as well
SpillPrefix->Put in the prefix that will be used inside the bucket

step8-lambda-config-2 step10-lambda-config-4

4. Click Deploy

step9-lambda-config-3

5. After deployment you can check the result at the Lambda Dashboard, Applications, Functions

step13-check-lambda-dashboard

Athena Part 2

Now we can switch back or go back to AWS Athena that we paused from Part 1
Click Next to finish creating the data source. Your data source should have Associated databases with the database you want as the data source

step14-check-data-sources-associate-table

step15-data-sources-page

Athena Query Editor

Go to Athena Query Editor
Select the DynamoDB data source you just created in the Data sourcedropdown and the database you want which in this case should be default
Try to run interactive SQL editor with this query

SELECT count(*) FROM “table_name”;

This will count how many rows your database has(In this case my table is named coffeemagic)

Run the query

step16-query-1

4. You will see this error If you haven’t created S3 bucket with the same name yet. Go to S3 and create it.

step17-handle-surprised-error-redacted_dot_app-redacted_dot_app

5. You can check bucket name in Lambda>Functions>[Your function name for DynamoDB connector] and go to tab Configuration>Environment variables>spill_bucket

step18-go-to-function

step19-check-lambda-config-bucket-name

6. Run the Query again, this time you should see the desired result

step21-run-the-query-again

Note: If you have a permission issue please check your IAM user policy or Lambda>Configuration>Permissions

step20-lambda-config

Conclusion
There you have it. Let’s have fun with DynamoDB with SQL in Athena Query Editor. Elevate your data analytics game, streamline operations, and make informed, real-time decisions.

Contact us now to revolutionize your data infrastructure and propel your business towards unparalleled efficiency and success!