Skip to content

Botocore 1.32.1 breaks some tests with moto  #7031

@potiuk

Description

@potiuk

Problem

Latest botocore 1.32.1 released few hours ago breakes some tests in moto (4.2.8 - i.e. latest). Some of the internal models stored in botocore are gzipped (and it changes between botocore versions) are not recognized by moto when it tries to load the models. Botocore internally uses JSONFIleLoader https://github.com/boto/botocore/blob/develop/botocore/loaders.py#L149 that automatically handles .json.gz data and decomresseses it, while moto uses "load_resources" which does not try to load .gz versions of files https://github.com/getmoto/moto/blob/master/moto/utilities/utils.py#L16

There are a number of such internal models in botocore that are gzipped but In case of botocore 1.32.1 what changed is that
botocore/data/emr/2009-03-31/service-2.json that previously (1.32.0) was stored as plain .json became now `botocore/data/emr/2009-03-31/service-2.json.gz’ - you can see it by downloading the packages from pypi and comparing them.

Example stacktrace

This leads to errors similar to:

/usr/local/lib/python3.8/site-packages/moto/core/models.py:126: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.8/site-packages/moto/core/models.py:96: in start
    self.enable_patching(reset)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/site-packages/moto/core/models.py:336: in enable_patching
    for key, value in backend.urls.items():
/usr/local/lib/python3.8/site-packages/moto/core/base_backend.py:52: in urls
    url_bases = self.url_bases
/usr/local/lib/python3.8/site-packages/moto/core/base_backend.py:100: in url_bases
    return self._url_module.url_bases
/usr/local/lib/python3.8/site-packages/moto/core/base_backend.py:41: in _url_module
    backend_urls_module = __import__(
/usr/local/lib/python3.8/site-packages/moto/emr/urls.py:1: in <module>
    from .responses import ElasticMapReduceResponse
/usr/local/lib/python3.8/site-packages/moto/emr/responses.py:52: in <module>
    class ElasticMapReduceResponse(BaseResponse):
/usr/local/lib/python3.8/site-packages/moto/emr/responses.py:61: in ElasticMapReduceResponse
    aws_service_spec = AWSServiceSpec("data/emr/2009-03-31/service-2.json")
/usr/local/lib/python3.8/site-packages/moto/core/responses.py:964: in __init__
    spec = load_resource("botocore", path)
/usr/local/lib/python3.8/site-packages/moto/utilities/utils.py:22: in load_resource
    return json.loads(pkgutil.get_data(package, resource))  # type: ignore
package = 'botocore', resource = 'data/emr/2009-03-31/service-2.json'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def get_data(package, resource):
        """Get a resource from a package.
    
        This is a wrapper round the PEP 302 loader get_data API. The package
        argument should be the name of a package, in standard module format
        (foo.bar). The resource argument should be in the form of a relative
        filename, using '/' as the path separator. The parent directory name '..'
        is not allowed, and nor is a rooted name (starting with a '/').
    
        The function returns a binary string, which is the contents of the
        specified resource.
    
        For packages located in the filesystem, which have already been imported,
        this is the rough equivalent of
    
            d = os.path.dirname(sys.modules[package].__file__)
            data = open(os.path.join(d, resource), 'rb').read()
    
        If the package cannot be located or loaded, or it uses a PEP 302 loader
        which does not support get_data(), then None is returned.
        """
    
        spec = importlib.util.find_spec(package)
        if spec is None:
            return None
        loader = spec.loader
        if loader is None or not hasattr(loader, 'get_data'):
            return None
        # XXX needs test
        mod = (sys.modules.get(package) or
               importlib._bootstrap._load(spec))
        if mod is None or not hasattr(mod, '__file__'):
            return None
    
        # Modify the resource name to be compatible with the loader.get_data
        # signature - an os.path format "filename" starting with the dirname of
        # the package's __file__
        parts = resource.split('/')
        parts.insert(0, os.path.dirname(mod.__file__))
        resource_name = os.path.join(*parts)
>       return loader.get_data(resource_name)
E       FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/site-packages/botocore/data/emr/2009-03-31/service-2.json'

You can see a failure of Airflow canary run (this is what triggered our investigation):

https://github.com/apache/airflow/actions/runs/6882709099/job/18722609903#step:5:14716

Potential solution

For now we will limit moto to < 1.32.1 in Airflow as a workaround but I think the best solution would be to extes load_resources method in moto to handle also .gz version automatically. That should fix this problem (and also address any future problem like that if botocore team will compress more models.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions