September 22, 2022 –5 min read
This article is sponsored by AWS and is part of my AWS Series.
In a previous post, we learned how to process messages coming to Amazon SQS using Lambda Functions written in .NET.
We have messages coming into the queue, the Lambda function gets triggered, processes the message, and successfully removes the message from the queue. Great!
But what happens on an exception processing the message in the Lambda Function?
Maybe it’s an unhandled code path, a null reference exception, or an external third-party integration service that is down.
In this post, you’ll learn,
- What happens on Exception in Processing Messages?
- Handle Retries and Configure Dead-letter Queue (DLQ)
- ReDrive Messages from Dead-letter Queue
- Enabling Batch Error Processing
Amazon SQS and AWS Lambda Triggers in .NET
Learn how to process SQS messages from AWS Lambda Function. We will learn how to set up and trigger a .NET Lambda Function using SQS and the various configurations associated.
We first need to understand what happens when there is an exception in processing a message from SQS.
First, let’s simulate an exception in our Lambda Function handler.
Let’s update the Lambda Function code to simulate an exception to see the exception handling in action.
private async Task ProcessMessageAsync(SQSEvent.SQSMessage message, ILambdaContext context) context.Logger.LogInformation($"Processed message message.Body"); if (message.Body.Contains("Exception")) throw new Exception("Message cannot be processed"); await Task.CompletedTask;
The above code throws an exception anytime the message body contains the word ‘Exception‘.
This makes it easy to simulate the exceptions we need when sending messages and see what happens. To see the exception handling in action, send a message to the queue with the word ‘Exception’.
The Lambda Function will throw an error as soon as it picks up for processing.
When an exception happens in processing Lambda Function does not acknowledge the successful processing of the message.
The message is not deleted from the Queue by the Lambda Function. After the default VisibilityTimeout period of the Queue (30secs), the message is made available in the Queue for processing again.
In this case, where we have simulated an exception based on the message body, this message will always throw an exception (with the current Function code). The message gets retried every 30 seconds.
In real-world scenarios, if the exceptions in the processing are because of transient errors (that go away after some time), the message will eventually be processed. Eg, If the Function talks to a third-party service that is currently down, it might be back up when we try again after 30 seconds. In this case, the message will be successfully processed and removed from the queue.
However, if the exception is due to a null reference exception or unhandled code paths, it will continue to throw an error and be retried.
Amazon SQS For the .NET Developer: How to Easily Get Started
Learn how to get started with Amazon SQS and use it from a .NET Application. We will learn how to send and receive messages, important properties, and concepts that you need to know when using SQS.
If there are a lot of messages in the queue, eventually erroring out, it will create a backlog of messages. It can cause delays in processing new messages coming into the Queue.
So ideally, you want the messages to retry a few times and removed for manual intervention.
Dead-letter Queue (DLQ) lets you isolate and handle problematic messages manually.
Under the Queue configuration, navigate to the Deal-letter queue configuration to set it up.
You must first create a queue to designate as DLQ. I have one created from the SQS Console –
user-data-dlq. Click the Edit button to update the configuration.
Specify the Queue ARN and the Maximum received count.
The Maximum received value determines the number of times a message will be retried. If the ReceiveCount for a message exceeds the maximum received count, SQS moves the message to the associated DLQ.
The above configuration will remove the message when the ReceiveCount is more than 2. So the message will get delivered for processing twice – the first time it comes into the Queue and then once more after 30 secs (default VisiblilityTimeout after an exception in processing).
Any existing messages in the Queue with a ReceiveCount more than the maximum is automatically moved to the DLQ.
You can inspect this message from the DLQ and decide what action to take next. Based on the message content, the application, and the business processes associated with the messages, you can choose to ignore them (delete from DLQ), reprocess the message later, or fix the Lambda Function and then retry the message again.
Re-processing messages from Queue is referred to as Redrive.
SQS allows easily moving unprocessed/error messages back to the source or other custom destination queues. This helps you to stay on top of the messages that are not getting processed and retry them.
Select the Message destination to redrive the messages. You can redrive the messages to the source queue or choose a custom destination.
You can also specify the Velocity control settings to redrive messages. It controls the rate at which messages are moved from the Dead-letter queue to the destination queue.
Optionally, you can inspect messages in the queue and pick and choose the messages to redrive. This is useful if you need to filter and process messages.
Selecting the ‘DLQ redrive’ button sends the messages back into the destination queue and is available for processing. The message will follow the same execution flow as it would have come in the first time.
On successful processing, it gets removed from the queue. If it continues to throw an exception, it will eventually be moved back to the DLQ (based on the maximum receive count setting.)
Any time the Batch Size property is set to more than 1, the
SNSEvent can have multiple records. This means when looping and processing the messages, they can fail independently.
For example, if you receive ten messages, seven might get processed successfully, but three might fail.
Now whether you fail the full batch on the first exception or loop through all of the messages and continue processing even if one fails is something you can control in your function code.
To fail the entire batch on the first exception, you only need to bubble up the exception as it occurs, and the code will stop processing more messages. The current Function code stops processing on the first exception.
After the VisibilityTimeout, all the messages in the batch are made available again in the queue for reprocessing.
If you decide to continue processing the messages even after one message throws an error, you can handle the exception at the individual processing level.
However, in these cases, we must tell SQS the messages that were processed successfully and those that failed to process.
SQSBatchResponse signals SQS the failed messages in a batch.
public async Task<SQSBatchResponse> FunctionHandler(SQSEvent evnt, ILambdaContext context) var response = new SQSBatchResponse() BatchItemFailures = new List<SQSBatchResponse.BatchItemFailure>() ; context.Logger.LogInformation($"Received Message of count evnt.Records.Count"); foreach (var message in evnt.Records) try await ProcessMessageAsync(message, context); catch (Exception e) context.Logger.LogError(e.Message); response.BatchItemFailures.Add(new SQSBatchResponse.BatchItemFailure() ItemIdentifier = message.MessageId ); return response;
The above code handles exceptions from the
ProcessMessageAsync function responsible for processing each record message. On exception in processing, it records the
MessageId of the message that failed processing as a
Once all messages in the batch are processed, the Lambda Function returns the batch response, which contains only the message id that failed to process.
Reporting batch item failures must be enabled on the Lambda Trigger Configuration for this code to work. You can enable this from the Lambda Trigger Configuration, as shown below.
Once enabled and the new function code deployed, we can successfully process messages in batches. Only messages that fail to process from the batch will be moved to the DLQ.
If you want to simulate and test the batch item failures, you can use the below Linqpad script to send bulk messages.
async Task Main() var sqsClient = new AmazonSQSClient(); var batchRequest = Enumerable.Range(1, 100) .Select(e => new SendMessageBatchRequestEntry(Guid.NewGuid().ToString(), $"Test (e % 5 == 0 ? "Exception" : "") e")).ToList(); await Task.WhenAll( batchRequest .Batch(10) .Select(async batch => await sqsClient .SendMessageBatchAsync("QUEUE-URL", batch.ToList())));
The above code sends 100 messages in batches of 10. Every 5th message in the list will trigger an exception.
In the Cloudwatch logs, you will notice the messages with Exception being retried, and it will eventually get moved to the Dead-letter Queue.
I hope this helps you understand how to handle exceptions when consuming messages from SQS from a .NET application.