On-the-fly image resizing with AWS Lambda, S3 & CloudFront
How to lazily process 300k images and avoid the time-consuming and costly batch processing of every single asset.
I've been working (on the side) lately on a redesign for the first SaaS I've built, a handmade marketplace for Romania's market. At the time of writing, the marketplace has close to 40k active products for sale, not including the drafts, products pending approval, or the sold ones.
The new design features larger images, it will display thumbnails with different proportions and also makes use of "srcset" attribute to display responsive images.
The old logic works like this: the user uploads product photos > the app server creates predefined image sizes using GD Library > uploads into an S3 bucket. It does the job, but wastes server resources.
There are about 300k images to be resized, running a batch process to resize all the images into new dimensions will be time-consuming and quite costly.
Why process and store the assets that may not even be accessed by users? I'm going to process them on the fly (lazily), a resized image will only be created if a user requests that specific size.
Architecture overview
1) The user requests a resized image via Cloudfront (if the asset exists it will be returned and everything stops here)
2) Cloudfront requests the asset from the S3 bucket (using a website endpoint)
3) Because the asset doesn't exist the browser will be temporary redirected (307) to the API Gateway endpoint
4) API Gateway triggers the Lambda function
5) The Lambda function downloads the original image from the S3 bucket, resizes it, and uploads the resized image back into the bucket with the requested key
6) The API Gateway returns a permanent redirect (301) to the newly created asset Cloudfront URL
The Setup
IAM User
We'll start by creating an IAM user that will allow our Lambda function to put the resized images on S3.
Log in to the AWS account and go to the Identity & Access Management (IAM)
Go to Users and click "Add users" > pick a username that'll be easier to remember > check "Programmatic access"
Click "Next" to go through to the Permissions page, click on "Attach existing policies" directly, and select AmazonS3FullAccess*
View and copy the API Key & Secret to a temporary place, we'll need them for the Lambda config.
Only use AmazonS3FullAccess policy for a quick demonstrative setup. In a production environment, I recommend following the principle of least privilege and set more restrictive access. The Lambda function only requires getObject & putObject permissions to our specific S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::crafty-products/*"
}
]
}
Lambda & API Gateway
Using Serverless Framework
I'm using the Serverless framework to build and deploy my services, but it can also be done directly from the AWS Console.
If you're using Serverless this is the Lambda function config from the serverless.yml file:
# ...
functions:
# on the fly image resizer function
resizer:
handler: src/resizer.handler
memorySize: 1024
timeout: 15
environment:
AWS_KEY: ${self:custom.environment.AWS_KEY}
AWS_SECRET: ${self:custom.environment.AWS_SECRET}
BUCKET: ${self:custom.environment.PRODUCTS_BUCKET}
CDN_URL: ${self:custom.environment.CDN_URL}
events:
- http:
path: /resizer
method: GET
# ...
When the service gets deployed with Serverless (sls deploy
) it will create a Lambda function and the API Gateway that will listen for the GET requests and trigger the Lambda to resize the image.
I get into more details about Serverless configs & deploy in another post: how to automatically deploy Serverless applications on AWS using GitHub Actions.
Using AWS Console
If not using Serverless, everything can be done from the AWS console, though it's quite cumbersome, here are the steps:
Creating the Lambda function
1) Go to the Lambda Functions and click "Create function"
2) Select "Author from Scratch", enter the function name and select the Node runtime (14 or newer), create the function.
3) Download the zip from GitHub, it contains the Lambda function
4) In the function's code tab, click "Upload from zip" in the top right and upload the function zip. Note that the Sharp module is quite large so you won't be able to edit the function in the AWS Console, but you can add Sharp as a Lambda Layer and maintain the ability to update your function from the AWS Console editor.
5) In the function's configuration tab we'll add the environment variables
- AWS_KEY (the IAM user access key)
- AWS_SECRET (the IAM user access secret)
- BUCKET (the S3 bucket containing the images)
- CDN_URL (the CloudFront URL, we don't have it at this step, we'll set this up later)
6) In the function's configuration, in the general configuration tab set the Memory to 1024MB and Timeout to at least 15 seconds to allocate enough resources & time for Lambda to process larger images. Default settings (128MB/3s timeout) will likely fail.
The Lambda function code
You can find the Lambda function code on GitHub
It reads the source image from an S3 bucket as a stream, pipes it to Sharp, then writes the resized version as a stream back to the S3 bucket. It uses Node.js streams under the hood to prevent loading large amount of data in the Lambda function's memory and Sharp, a high-performance Node.js module for image processing (sharp.pixelplumbing.com)
The function has a set of predefined image sizes, you will have to adjust to your needs or remove entirely. The goal was to prevent malicious users increase the AWS bill by generating images that won't ever be used.
//
const s3 = new AWS.S3(),
sizes =[
"360x270", // larger thumb restricted
"480h", // height restricted
"640w",
"1280w", // largest web image
];
//
The Lambda function will be triggered with a GET request like this:
... ?key=produs_227613/360x270/img.AZ70KuRzw9.jpg
The key query string parameter contains the S3 path, the desired output size, and the original asset filename.
The sizes are of 3 types:
- width x height - i.e. 360x270 will generate a width & height constricted thumbnail 360px wide and 270px high
- width only - i.e. 640w will generate an image 640px wide and the height will be automatically adjusted to maintain proportions
- height only - i.e. 640h will generate an image 640px high and the width will be automatically adjusted to maintain proportions
Creating the API Gateway
1) Go to the API Gateway and click "Create API"
2) We need an HTTP API, so click "Build" on that
3) For the integrations select Lambda and the name of the function we just created, then give the new API a name and click "Next"
4) For the routes we'll use a GET method and route "/resize" path to our Lambda integration, click "Next"
5) I'm using "prod" as a stage in the samples below, you can name the stage any way you want
6) Create the API and copy the Invoke URL somewhere, we'll need it for S3
The lambda function will now be triggered by Invoke URL + route, i.e.
https://bm8w4yiv1b.execute-api.us-east-1.amazonaws.com/prod/resize
S3
1) Enable "Static website hosting"
Open AWS console go to S3 > select the bucket hosting the images > Properties > Static website hosting.
Copy the bucket website endpoint somewhere, we'll need it for the CloudFront settings.
2) Redirection rules
This will temporarily redirect any S3 object that doesn't exist to the API Gateway URL passing the S3 key as a query string. Make sure you update the hostname with yours.
[
{
"Condition": {
"HttpErrorCodeReturnedEquals": "404",
"KeyPrefixEquals": ""
},
"Redirect": {
"HostName": "k28ihy1li5.execute-api.us-east-1.amazonaws.com",
"HttpRedirectCode": "307",
"Protocol": "https",
"ReplaceKeyPrefixWith": "prod/resizer?key="
}
}
]
In other words, when a visitor tries to load a product thumbnail with this URL:
https://d1p41knyaaw8le.cloudfront.net/produs_227613/360x270/img.AZ70KuRzw9.jpg
If the thumbnail doesn't exist in S3, the browser will get redirected to the API gateway URL:
https://k28ihy1li5.execute-api.us-east-1.amazonaws.com/prod/resizer?key=produs_227613/360x270/img.AZ70KuRzw9.jpg
3) Bucket Policy
Open the bucket "Permissions" tab and allow GetObject action to any asset in your bucket (update the resource ARN accordingly)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::crafty-products/*"
}
]
}
CloudFront
Using a CDN to serve static assets is a no-brainer, so we'll create a new CloudFront distribution (if we don't have one already). There are 2 main things to pay attention to when setting CloudFront for lazy image processing.
Origin Domain
Enter the S3 bucket website endpoint URL, don't select the suggested S3 options from the dropdown list, should look something like this:
crafty-products.s3-website-eu-west-1.amazonaws.com
Behaviors
Having CloudFront in front of S3 brings one extra challenge when requesting an image that is not found in S3: the not found response is cached, and the user will end up in a redirect loop.
We need to define a behavior for all the image types that will be resized, in this case, jpg, jpeg & png.
Lambda config
Once the CloudFront distribution is deployed,
if you created the Lambda using AWS Console you have to go back into the Lambda config and update the CDN_URL environment variable
if you used Serverless update the CDN_URL variable and deploy the function again
sls deploy --f resizer
That's all, it seems a bit complicated than it really is.
I wrote this post to illustrate the pattern and explain the mechanics of on-the-fly image processing. AWS has a Serverless Image Handler solution available which can be deployed via an AWS CloudFormation template.