aws lambda in python

Tutorial: Coding an AWS Lambda that stores data in S3, sends emails using AWS SES and checks AWS SSM

In this tutorial, we will be coding an AWS Lambda in Python that checks Hacker news and based on preferences in AWS Parameter store stores data in S3 and sends an notification-email using SES. For this article we assume that you have already

  • Have AWS CLI installed and correctly configured
  • created an S3-bucket
  • Python3, Boto3, and pip3 installed on your system
  • have done the first steps in creating a lambda in AWS (default settings)
  • some basic knowledge of AWS and coding in Python

In this article we will create a Python-based AWS-lambda function that uses several dependencies who all will be packaged with the lambda-code itself into a deployment-package that can be installed in AWS Lambda. In this tutorial we will do this manually, and also manually create policies and a Execution role of the lambda.

In the real world of course we would use something like Cloudformation, SAM (Serverless Application Model), CDK (AWS Cloud Development Kit) or Terraform to create these things. But in this tutorial we will do things manually.

To start we will create a deployment-package containing both the code and dependencies for the AWS Lambda that should already be created. Open a console to your working-directory and enter the following commands:

pip3 install --target ./package beautifulsoup4
pip3 install --target ./package requests
pip3 install --target ./package --upgrade boto3

This creates a subdirectory called package with inside installed the packages named in the commands. Notice the –upgrade flag in boto3. Creating this tutorial I noticed some sub-packages are available in multiple packages. To make sure boto3 is correctly installed I use this flag so boto3 uses the correct version of these sub-packages. Go inside the directory and create a file with the exact name:

lambda_function.py

Now we will do some coding in this file. First we need some imports for logging, using dates, creating CSV-files, communication with AWS and creating HTTP-requests. Next we also import BeautifulSoup from the bs4 package that we will use for parsing the HTTP-page of Hacker news. Finally we also need to import a separate package for supporting IO for creating CSV-files.

import logging
from datetime import date
import csv
import boto3
import requests
from bs4 import BeautifulSoup
from io import StringIO

Since we want to do some logging and also want to make sure the log-messages are flushed in time we do some configuration of this in our Python-script:

# So later on we can do some manual flushing when doing logging
if len(logging.getLogger().handlers) > 0:
    logging.getLogger().setLevel(logging.INFO)
else:
    logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()

We now have a logger that we can use to log messages. Next we will create a function that

  • Connects to an the webpage of Hackernews (https://news.ycombinator.com/) and obtain the contents;
  • Parse through the items using BeautifulSoup, and in case the title of the item contains the interesting text-piece, store only the title and link of each item in a list of items to be returned.

The listing of this can be seen below:

def get_data_from_rss(interesting):
    URL = "https://news.ycombinator.com/"

    #Some headers where obtained while doing request to the
    #site and using the inspector of the browser (Firefox in this case)
    headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '7200',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0'
    }

    response = requests.get(URL, headers)

    xmlSoup = BeautifulSoup(response.content, 'html.parser')

    links = xmlSoup.select(".titleline > a:nth-of-type(1)")

    data_items = []

    for item in links:
      title = item.get_text()
      if interesting in title:
        data_item = {}
        data_item['title'] = title
        data_item['link'] = item["href"]
        data_items.append(data_item)
    return data_items

Once we have these items we

def process_rss_items(rss_items):
    BUCKET_NAME = 'test-bucket-anchormen1'

    PREFIX = date.today().strftime("%Y%m%d")

    s3 = boto3.resource('s3')

    filename = PREFIX + '_interesting_links.txt'
    # Creating an empty file called "_DONE" and putting
    #it in the S3 bucket
    s3Object = s3.Object(BUCKET_NAME, PREFIX + '_interesting_links.txt')

    # create file-buffer
    file_buffer = StringIO()

    w = csv.DictWriter(file_buffer, delimiter=";", fieldnames=['title','link'])
    w.writeheader()

    for rss_item in rss_items:
    w.writerow(rss_item)

    s3Object.put(Body=file_buffer.getvalue())

    # close object and discard buffer
    file_buffer.close()

    #send notification-mail
    send_notification_email(filename)

Now we will make a short return back to the commandline. We will execute the command below to create a setting in Systems Manager Parameter Store of AWS containing a keyword that we want the lambda to check titles on. In this case we create parameter named /testpythonlambda/interesting containing the value “Arduino”

aws ssm put-parameter --name "/testpythonlambda/interesting" --value "Arduino"
--allowed-pattern ".{1,300}" --type String --overwrite

Next we create a function that checks this newly created setting and returns its value. We do this with boto3 of course:

def get_interesting():
    session = boto3.Session(region_name='eu-west-1')
    ssm = session.client('ssm')
    interest = ssm.get_parameter(Name='/testpythonlambda/interesting', WithDecryption=False)
    return interest['Parameter']['Value']

We now create a function that given a filename sends an email using boto3 to a email-adress notifying the recipient that new interesting topic where found in Hacker news.

def send_notification_email(filename):
    email_client = boto3.client("ses", region_name="eu-west-1")

    email_client.send_email(
     Destination={
      "ToAddresses": [
       "YOUR_EMAIL_ADDRESS",
      ],
     },
     Message={
      "Body": {
       "Text": {
        "Charset": "UTF-8",
        "Data": "Data was stored in S3 bucket in "+filename,
       }
      },
      "Subject": {
       "Charset": "UTF-8",
       "Data": "New interesting topics found in Hacker news",
      },
     },
     Source="YOUR_EMAIL_ADDRESS",
    )

To make this work, make sure that in Amazon SES you create an identity for the email-adress you are mailing to (or domain). You will then receive an email with a link once you create the identity that you need to click (make sure it is the right email with link). The subject of the mail will look something like “Amazon Web Services – Email Address Verification Request in region”.

To wrap it all together we create the lambda-handler function:

def lambda_handler(event, context):
    logger.info('Start Hacker news notifier-lambda\n\n')
    logger.handlers[0].flush()
    interesting = get_interesting()
    rss_items = get_data_from_rss(interesting)
    process_rss_items(rss_items)
    logger.info('\n\nStopping Hacker news notifier-lambda')
    logger.handlers[0].flush()

Now we can create a ZIP-file from the contents of the package-directory. The root of the package-directory of course contains the Python-file when we create the ZIP-file. This ZIP-file is our deployment-package.

You can now go to the AWS Lambda you created earlier and upload the deployment-package. Make sure you change the default timeout of the Lambda. When creating this tutorial I have set the timeout to 5 minutes just in case.

timeout lambda

Finally we can go to AWS IAM and create a separate Execution Role and required policies for this Lambda. Just to test things out, the policies I created and associated with that Execution Role looked like:

execution role in aws iam

Basically I created three inline policies for:

  • Allowing to execute the getparameter to access the interest-setting in Systems Manager Parameter Store
  • Writing to cloudwatch that contains the same rights you get when an initial Execution Role is created for a Lambda;
  • Sending email using SES .

For a bit diversity in this tutorial I created a customer managed policy for allowing the Lambda to write to the specified bucket. Also for that reason I used the AWS managed role called AWSLambdaBasicExecutionRole to allow the Lambda-function to be executed.

Keep in mind AWS best practices. When creating policies. You should narrow the amount of rights as much as possible, only allowing the Lambda to do only what it needs to do.

Now everything should work. If you return to AWS Lambda and trigger the lambda it should

  • Get the contents of Hacker-news
  • Get from SSM parameter store what is interesting by accessing the setting
  • Parse the contents, check for interesting items and return them
  • Store those items in an S3
  • Send an email to notify that something interesting was found and also present the name of the file it is stored in

The lambda-function can be easily scheduled periodically (for example daily or weekly) in AWS Event bridge, but this is left as an exercise for the reader (I have tried to keep this tutorial short). You can download the full listing of the script here.

Like this article and want to stay updated of more news and events?
Then sign up for our newsletter!

Don't miss out!

Subscribe to our newsletter and stay up to date with our latest articles and events!

Subscribe now

Newsletter Subscription