Amazon Web Services – Best Approach to Install GDAL for Rasterio in AWS Lambda Python Dockerfile

amazon-lambdaamazon-web-servicesdependenciespython

The python dependencies for an AWS lambda application have exceeded the 250 MB limit for AWS Lambdas. One of these dependencies is rasterio which depends on gdal. I'm attempting to build a docker image to get round the 250 MB limit and deploy our code to an AWS Lambda (using serverless.com).

Approach 1: pip install rasterio

Currently I have a Dockerfile with:

FROM public.ecr.aws/lambda/python:3.10
RUN pip install rasterio # Fails with error (see below)
WARNING:root:Failed to get options via gdal-config: [Errno 2] No such file or directory: 'gdal-config'
      ERROR: A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.

Approach 2: yum install gdal-devel

tl; dr: "No package gdal-devel available."

Approach 3: build gdal

tl; dr: a lot of dependencies. Nervous those dependencies will have dependencies that need to be built too.

Approach 4: yum install epel-release then gdal-devel

  • Needs fortran: yum -y install libgfortran worked but installed libgfortran.so.4
  • yum -y install gdal-devel still erroring e.g. "Error: Package: openblas-openmp-0.3.3-2.el7.aarch64 (epel) Requires: libgfortran.so.3(GFORTRAN_1.0)(64bit)"
  • I'm not certain the problem is with having version 4 instead of version 3 of libgfortran but I couldn't easily install libgfortran.so.3.

Approach 5: use aws/sam/build-python container

service: aws-python-docker-demo
frameworkVersion: "3"

plugins:
  - serverless-python-requirements

custom:
  pythonRequirements:
    usePipenv: true
    layer: true

provider:
  name: aws
  runtime: python3.10
  deploymentBucket:
    blockPublicAccess: true

functions:
  hello:
    handler: src/main.lambda_handler
    layers:
      - !Ref PythonRequirementsLambdaLayer
  • this serverless-python-requirements plugin seems to use a docker container public.ecr.aws/sam/build-python3.10 to install the python dependencies and zip them up for the lambda

    • (which then fails because the lambda's dependencies & code are >= 250 MB size limit)
  • Plan:

    1. understand how serverless-python-requirements:
      1. installs python dependencies inside public.ecr.aws/sam/build-python3.10 container
      2. zips python dependencies (which will be > 250 MB)
    2. copy that zip into the docker image for the AWS lambda.
    3. … ?

I'm not sure if this is a good approach, I'm sure there are better solutions. Any advice welcome.

** Update ** regarding new approach (no 6) and in response to @Rob's kind answer.

Approach 6: Try to use an old gdal/lambda docker image

Work in progress is here using https://hub.docker.com/r/remotepixel/amazonlinux-gdal/ . Next step: get this to work then iterate from there to:

  • update gdal
  • use latest lambda container
  • use python 3.10 (as required for our application)

Currently planning to re-answer / update the answer to this StackOverflow question: https://stackoverflow.com/questions/36772111/how-can-i-install-a-recent-version-of-gdal-on-amazon-linux#comment135429542_44907360

Currently erroring with:

{
    "errorType": "Runtime.InvalidEntrypoint",
    "errorMessage": "RequestId: 2cda4291-3b02-4079-8d59-f1ab111f8dab Error: exec: \"main.lambda_handler\": executable file not found in $PATH"
}

Response to Rob's potential answer

When I run that it errors with the following:

cat Dockerfile2
FROM public.ecr.aws/lambda/python:3.10
RUN pip install rasterio
docker --version
Docker version 24.0.6, build ed223bc

MacOS 12.7.2

docker build  -t testing-run-api-dependencies-2 -f ./Dockerfile2 . --progress=plain --no-cache
#0 building with "desktop-linux" instance using docker driver

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile2
#2 transferring dockerfile: 101B done
#2 DONE 0.0s

#3 [internal] load metadata for public.ecr.aws/lambda/python:3.10
#3 DONE 1.1s

#4 [1/2] FROM public.ecr.aws/lambda/python:3.10@sha256:f95780930513037d252b6b6165720381a1014096c3be9f2eac620776c8f0d167
#4 CACHED

#5 [2/2] RUN pip install rasterio
#5 1.173 Collecting rasterio
#5 1.229   Downloading rasterio-1.3.9.tar.gz (411 kB)
#5 1.309      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 411.7/411.7 kB 5.5 MB/s eta 0:00:00
#5 1.406   Installing build dependencies: started
#5 8.663   Installing build dependencies: finished with status 'done'
#5 8.666   Getting requirements to build wheel: started
#5 8.934   Getting requirements to build wheel: finished with status 'error'
#5 8.939   error: subprocess-exited-with-error
#5 8.939   
#5 8.939   × Getting requirements to build wheel did not run successfully.
#5 8.939   │ exit code: 1
#5 8.939   ╰─> [3 lines of output]
#5 8.939       <string>:22: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
#5 8.939       WARNING:root:Failed to get options via gdal-config: [Errno 2] No such file or directory: 'gdal-config'
#5 8.939       ERROR: A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.
#5 8.939       [end of output]
#5 8.939   
#5 8.939   note: This error originates from a subprocess, and is likely not a problem with pip.
#5 8.942 error: subprocess-exited-with-error
#5 8.942 
#5 8.942 × Getting requirements to build wheel did not run successfully.
#5 8.942 │ exit code: 1
#5 8.942 ╰─> See above for output.
#5 8.942 
#5 8.942 note: This error originates from a subprocess, and is likely not a problem with pip.
#5 8.947 
#5 8.947 [notice] A new release of pip is available: 23.0.1 -> 24.0
#5 8.947 [notice] To update, run: pip install --upgrade pip
#5 ERROR: process "/bin/sh -c pip install rasterio" did not complete successfully: exit code: 1
------
 > [2/2] RUN pip install rasterio:
8.942 error: subprocess-exited-with-error
8.942 
8.942 × Getting requirements to build wheel did not run successfully.
8.942 │ exit code: 1
8.942 ╰─> See above for output.
8.942 
8.942 note: This error originates from a subprocess, and is likely not a problem with pip.
8.947 
8.947 [notice] A new release of pip is available: 23.0.1 -> 24.0
8.947 [notice] To update, run: pip install --upgrade pip
------
Dockerfile2:2
--------------------
   1 |     FROM public.ecr.aws/lambda/python:3.10
   2 | >>> RUN pip install rasterio
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install rasterio" did not complete successfully: exit code: 1

Best Answer

Perhaps I am misunderstanding, but maybe this is something specific to your build machine/Docker version. When I try your Approach 1 above verbatim to build the container locally it succeeds:

$ cat Dockerfile
FROM public.ecr.aws/lambda/python:3.10
RUN pip install rasterio
$ docker build . --progress=plain --no-cache
#1 [internal] load build definition from Dockerfile
#1 sha256:d1ebfcb0fe353fccdb071e1d06a6f600aac8465c1fd5a4883664ca2701cb4bdc
#1 transferring dockerfile: 106B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:f43dde8419761808ee740a10052d2fccd8c242389cc1a3d9d1e8a894dec623b0
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for public.ecr.aws/lambda/python:3.10
#3 sha256:4d27f73d29144c07cb21fedb31129fd6d4bf13e6d609a2728602ed5805b8d5cf
#3 DONE 0.7s

#4 [1/2] FROM public.ecr.aws/lambda/python:3.10@sha256:f95780930513037d252b6b6165720381a1014096c3be9f2eac620776c8f0d167
#4 sha256:39fff7d5ce9d7fffbcb71cd9476e781934b22456974c40953be4ee60fbb44a02
#4 CACHED

#5 [2/2] RUN pip install rasterio
#5 sha256:e902f6dbbb4dd6e5c6fa56ded9099fd59b7ccccdc52a40348207aa252b773b32
#5 0.749 Collecting rasterio
#5 0.878   Downloading rasterio-1.3.9-cp310-cp310-manylinux2014_x86_64.whl (20.6 MB)
#5 3.194      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.6/20.6 MB 8.8 MB/s eta 0:00:00
#5 3.284 Requirement already satisfied: setuptools in /var/lang/lib/python3.10/site-packages (from rasterio) (65.5.1)
#5 3.333 Collecting click>=4.0
#5 3.357   Downloading click-8.1.7-py3-none-any.whl (97 kB)
#5 3.372      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 7.2 MB/s eta 0:00:00
#5 3.430 Collecting attrs
#5 3.454   Downloading attrs-23.2.0-py3-none-any.whl (60 kB)
#5 3.486      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 1.8 MB/s eta 0:00:00
#5 3.534 Collecting cligj>=0.5
#5 3.559   Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
#5 3.592 Collecting click-plugins
#5 3.616   Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
#5 3.653 Collecting affine
#5 3.678   Downloading affine-2.4.0-py3-none-any.whl (15 kB)
#5 3.725 Collecting certifi
#5 3.748   Downloading certifi-2024.2.2-py3-none-any.whl (163 kB)
#5 3.770      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 163.8/163.8 kB 8.2 MB/s eta 0:00:00
#5 3.804 Collecting snuggs>=1.4.1
#5 3.828   Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
#5 4.264 Collecting numpy
#5 4.291   Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
#5 6.311      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 8.9 MB/s eta 0:00:00
#5 6.439 Collecting pyparsing>=2.1.6
#5 6.462   Downloading pyparsing-3.1.2-py3-none-any.whl (103 kB)
#5 6.478      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.2/103.2 kB 7.3 MB/s eta 0:00:00
#5 6.659 Installing collected packages: pyparsing, numpy, click, certifi, attrs, affine, snuggs, cligj, click-plugins, rasterio
#5 9.264 Successfully installed affine-2.4.0 attrs-23.2.0 certifi-2024.2.2 click-8.1.7 click-plugins-1.1.1 cligj-0.7.2 numpy-1.26.4 pyparsing-3.1.2 rasterio-1.3.9 snuggs-1.4.7
#5 9.264 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#5 9.398 
#5 9.398 [notice] A new release of pip is available: 23.0.1 -> 24.0
#5 9.398 [notice] To update, run: pip install --upgrade pip
#5 DONE 9.7s

#6 exporting to image
#6 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#6 exporting layers
#6 exporting layers 0.9s done
#6 writing image sha256:f66b9b3349811656692b09d941ecd3234af229da4e7ee14974e08c7d8b8bb3f3 done
#6 DONE 0.9s

Now, in theory, docker builds shouldn't have any dependencies on the local machine, a docker build that works in one place should work on another, but maybe you have a stale cached dependency? Maybe try docker build on a different machine if you have access to another environment or docker system prune -a (expensive, clears all your unused cached images) and rebuild.

Related Topic