How to model relational data in DynamoDB

Understanding DynamoDB single table design, one-to-many and many-to-many relationships.

UpdatedMarch 21, 2022

How to model relational data in DynamoDB

An architect by trade, practicing architecture in the Cloud. 7x AWS Certified, AWS Community Builder, Serverless advocate, Vue enthusiast.

Part of series

Comments (4)

Join the discussion

Drew Davis1y ago

I wrote a npm package, dyna-record, which abstracts these design concepts. I’m interested to know what you think of it. I’m apparently not allowed to share links here but I wrote a blog title Unlock Relational Data Modeling in DynamoDB with dyna-record on medium

Mohit Yadav4y ago

Great to learn something! I'm new to DynamoDB and still confused why you prepended project# and monitor# to some GSIs, this would increase the redundancy and also unintentionally increase storage space.

Also, while querying, why do you query both, the primary key and the GSI, even though the schema says they contain the same data.

I might be totally getting the concept of GSIs wrong. Pardon me for that.

Thanks for the awesome and in-depth article 🎉

Stefan Olaru4y ago

Hey Mohit Yadav, good question.

It's called index overloading. The table can hold many different kinds of data items at one time, hence we use generic names for primary key (PK), sort key (SK), and global secondary index (GSI).

Prefixing allows to identify the entity to be queried, e.g. user-related items will always have PK starting with user#, so on.

We can get the project details using GetItem, but to Query all projects of an org we'll have to use the overloaded GSI like GSI=org#1234 AND begins_with(SK, project#).

Mohit Yadav4y ago

Thanks for the in-depth answer, Stefan!

Honestly speaking, I understand most of this, but some concepts are still a bit rusty... Explaining them may be out of the scope of this article, so I'll definitely learn them when I get into NoSQL DBs. 😃

Pablo Curell Mompo4y ago

Hey :)

Good article explaining the how, but I am left curious about the why :/

Why would you use a dynamoDB for relational data in lieu of just a relational database?

Stefan Olaru4y ago

Hey Pablo,

I picked DynamoDB because of the flexible schema and scalability, and it's also very Serverless-friendly. As a plus, the on-demand pricing worked better for my use case.

When a relational database makes more sense, use that by all means.

Pablo Curell Mompo4y ago

Stefan Olaru

Ok, thanks for answering :)

Mahavir Vataliya4y ago

Best Article to start with dynamoDB Database design.

Building Monithor

Part 1 of 3

Up next

Running cronjobs at scale in AWS

With Serverless, DynamoDB, SQS, and Lambda functions.

More from this blog

Streaming AI Responses on AWS Lambda

A clean serverless pattern for streaming real-time AI workflow responses with Server-Sent Events and AWS Lambda

Sep 19, 20255 min read

Debugging AWS SES templates

Emails not reaching the inbox due to template Rendering Failure

Feb 23, 20243 min read

Exporting RDS Snapshots to Athena using Step Functions

A secure, efficient, and hassle-free way to run data analytics queries

Nov 14, 20237 min read

Exporting RDS Snapshots to Athena using Step Functions

Run Lambda functions with fixed IP

How to run your AWS Lambda functions with a fixed IP up to 10x cheaper.

Mar 28, 20239 min read

Scalable webhook listeners in AWS

Using the API Gateway - SQS integration to create scalable webhook listeners and validate the requests at the API Gateway level

Mar 8, 20235 min read

Stefan Olaru

19 posts

Cloud Architect & Developer. I specialize in AWS Serverless solutions and Event-Driven architectures. Sharing my learnings from helping startups build, launch, scale, and navigate tech challenges.

I'm building monithor.co - an open-source web monitoring service built with Serverless on top of AWS. It's using DynamoDB as the database, a highly performant NoSQL database that's been increasingly popular for serverless applications over the past years.

After more than a decade of dealing mostly with relational databases, using DynamoDB requires looking at things from a different angle, unlike traditional SQL databases, it does not use table joins and other relational database constructs. However, we can model many common relational designs in a single DynamoDB table but the process is different.

DynamoDB Table Basics

Before getting deeper into things let's summarize the DynamoDB table basic concepts:

Partition (hash) key (PK) - defines in which partition the data is stored into
Sort (range) key (SK) - defines the sorting of items in the partition, and it's optional. You can query it by condition expression (=, >, <, between, begins_with, contains, in) combined with PK
Local secondary index (LSI) - allows defining a different sort key for PK. It can only be created when the table is created
Global secondary index (GSI) - allows defining a new partition and/or sort key. GSI is the central part of most design patterns in a single table design. Both LSI and GSI are implemented as copying all the data to a new table-like structure
Attributes - can be scalar (single value) or complex (list, maps, sets)

There are three ways of reading data from DynamoDB:

GetItem - retrieves a single item from a table. This is the most efficient way to read a single item because it provides direct access to the physical location of the item.
Query - retrieves all of the items that have a specific partition key. Within those items, you can apply a condition to the sort key and retrieve only a subset of the data. Query provides quick, efficient access to the partitions where the data is stored.
Scan - retrieves all of the items in the specified table. (This operation should not be used with large tables, because it can consume large amounts of system resources.

Key concepts when using DynamoDB

Know your access patterns in advance

You shouldn't start designing the DynamoDB schema until you know the questions it will need to answer. Understanding the business problems and the application use cases upfront is essential. Write down all the access patterns, here's what the access pattern list looks like for Monithor.

Screenshot 2022-03-14 at 07.19.04.png

Our table can grow quite big so we want to avoid Scan at all costs so we'll create our data model so we can extract what we need using Query and GetItem.

Single table design

You should maintain as few tables as possible in a DynamoDB application, we'll go with a single table design.

Yes, having all the data in a single table might look confusing and hard to read, but it offers a rich set of flexible and efficient design patterns, better utilization of resources, and easier configuration, capacity planning, and monitoring.

When using a single table, use generic attributes for the primary key like PK (partition key), SK (sort key), GS1PK (first global secondary index partition key), GS1SK (first global secondary index sort key).

Also, identify the type of the item by prefixing keys with type. e.g. project#1234 (project = type, 1234 = id)

Screenshot 2022-03-14 at 07.32.00.png

The DynamoDB single-table design requires a different mindset than relational database table design, I recommend using NoSQL workbench for DynamoDB for data modeling and visualization. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/workbench.html

Monithor's Table Design

a composite primary key (PK and SK)
a local secondary index with the sort key created_at - allows querying PK created within an interval of time
a local secondary index with the sort key nextrun_at - allows querying PK due to be executed within a time interval
a global secondary index (GSI) with the sort key set as table's sort key (SK)

Here's the CloudFormation template for our DynamoDB single table.

Resources:
    # DynamoDB
    ddbTable:
        Type: AWS::DynamoDB::Table
        # enable table retention on production use
        # DeletionPolicy: Retain
        Properties:
            TableName: ${self:custom.resourceName}
            AttributeDefinitions:
                - AttributeName: PK
                  AttributeType: S
                - AttributeName: SK
                  AttributeType: S
                - AttributeName: created_at
                  AttributeType: N
                - AttributeName: nextrun_at
                  AttributeType: N
            KeySchema:
                - AttributeName: PK
                  KeyType: HASH
                - AttributeName: SK
                  KeyType: RANGE
            LocalSecondaryIndexes:
                - IndexName: pk_created
                  KeySchema:
                      - AttributeName: PK
                        KeyType: HASH
                      - AttributeName: created_at
                        KeyType: RANGE
                  Projection:
                      ProjectionType: ALL
                - IndexName: pk_run
                  KeySchema:
                      - AttributeName: PK
                        KeyType: HASH
                      - AttributeName: nextrun_at
                        KeyType: RANGE
                  Projection:
                      ProjectionType: ALL
            GlobalSecondaryIndexes:
                - IndexName: GSI
                  KeySchema:
                      - AttributeName: GSI
                        KeyType: HASH
                      - AttributeName: SK
                        KeyType: RANGE
                  Projection:
                      ProjectionType: ALL

            # use the On Demand pricing
            BillingMode: PAY_PER_REQUEST
            # autodelete logs after specified time
            TimeToLiveSpecification:
                AttributeName: expires_at
                Enabled: TRUE

Relational data

Modeling relational data in DynamoDB requires reusing or overloading keys and secondary indexes for different purposes. Overly-simplified, the data model for Monithor looks as in the diagram below.

relations.drawio.png

Organization - the account that owns the projects & cloud subscription, an organization can have multiple projects (one-to-many)
Project - the monitors are organized into projects for easier monitoring and access control, a project can have multiple monitors (one-to-many), but a project also has and belongs to many users (many-to-many)
Monitor - the monitor will create a log entry with the result of each execution, so a monitor can have multiple logs (one-to-many)

One-to-Many relationship

Let's take aside the relationship between projects and monitors, a project can have multiple monitors. In the UI we'll have to display all the monitors of a project, so we'll have to create an API endpoint that will query all monitors by project ID.

We rely on the GSI global secondary index, for each monitor entry we'll populate the GSI with the project ID prefixed with the entity type project#.

Screenshot 2022-03-14 at 10.49.01.png

Then, when we query on the GSI index the key conditions will be GSI=project#1234 AND SK begins_with(monitor#) which will return the information we're looking for.

Screenshot 2022-03-14 at 10.52.35.png

The query code would look like this:

const monitors = await ddb
        .query({
            TableName: process.env.DDB_TABLE,
            IndexName: "GSI",
            KeyConditionExpression: "#GSI = :gsi AND begins_with(#SK, :prefix)",
            ExpressionAttributeNames: {
                "#GSI": "GSI",
                "#SK": "SK"
            },
            ExpressionAttributeValues: AWS.DynamoDB.Converter.marshall({
                ":gsi": "project#1234",
                ":prefix": "monitor#"
            }),
        });

And here's yet another example of a one-to-many relationship. We need to show a list of all projects within the organization dashboard UI. In this case, we'll fill the organization ID in the GSI of the project item.

Screenshot 2022-03-14 at 11.04.21.png

Querying the GSI with the key conditions GSI=org#1234 AND SK begins_with(project#) will return all the projects of that organization.

Screenshot 2022-03-14 at 11.03.43.png

Many-to-Many relationship

The many-to-many relationships may seem a bit difficult at first, in a relational database we sort it out with a relationship table and two JOIN statements, but in DynamoDB this is not possible.

On Monithor, the core many-to-many relationship is between projects & users. We want to have granular access control for every project, so a user can have different access levels on multiple projects. A project can have multiple users.

We rely on the table sort key to store the association between entities, we store a concatenated value of user ID & project ID plus making use of GSI.

Screenshot 2022-03-14 at 11.16.52.png

In the app UI, once the user logs in we'll want to display a list of the projects they have access to. We know the user ID from the login session so we'll have the query the table index with the following key conditions PK=user#1234 AND SK begins_with user#1234|project#

This will return all the project IDs the user#1234 has access to.

Screenshot 2022-03-14 at 11.35.53.png

The code will look as follows.

const monitors = await ddb
        .query({
            TableName: process.env.DDB_TABLE,
            KeyConditionExpression: "#PK = :pk AND begins_with(#SK, :prefix)",
            ExpressionAttributeNames: {
                "#PK": "PK",
                "#SK": "SK"
            },
            ExpressionAttributeValues: AWS.DynamoDB.Converter.marshall({
                ":pk": "user#1234",
                ":prefix": "user#1234|project#"
            }),
        });

There's another UI screen where we need to display a list of users that have access to a project. This time we'll query the GSI* with the following key conditions GSI=project#1234 AND SK begins_with user#

This will return all the user IDs that have access to project#1234.

Screenshot 2022-03-14 at 11.40.00.png

Resources

If you want to get deeper into DynamoDB I can't recommend enough Alex DeBrie's The DynamoDB Book.

Also, have a look at the Best Practices for Designing and Architecting with DynamoDB.

About

Monithor is a web service monitoring tool that helps you check websites or APIs' uptime and functionality and be alerted when they don't work as expected. It's built on AWS using the Serverless framework and it's open-source.

Command Palette