Skip to content

Add resilience to elasticsearch outage #39

@bpetersen

Description

@bpetersen

In my docker-compose file, I have a link from elastalert to elasticsearch. But it looks like elasticsearch takes a little time to come up. Elastalert starts and tries to connect to elasticsearch but fails. From that point, the process stays alive but does not try to reconnect to elasticsearch. It should retry periodically so that it's resilient. Or I suppose you could let the app crash so that another utility can bring it back up.

     ProcessController:      self._es_version = self.get_version()
       File "/opt/elastalert/elastalert/elastalert.py", line 169, in get_version
         info = self.writeback_es.info()
       File "/usr/lib/python2.7/site-packages/elasticsearch-6.3.0-py2.7.egg/elasticsearch/client/utils.py", line76, in _wrapped
         return func(*args, params=params, **kwargs)
       File "/usr/lib/python2.7/site-packages/elasticsearch-6.3.0-py2.7.egg/elasticsearch/client/__init__.py", line 241, in info
         return self.transport.perform_request('GET', '/', params=params)
       File "/usr/lib/python2.7/site-packages/elasticsearch-6.3.0-py2.7.egg/elasticsearch/transport.py", line 318, in perform_request
         status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers,ignore=ignore, timeout=timeout)
       File "/usr/lib/python2.7/site-packages/elasticsearch-6.3.0-py2.7.egg/elasticsearch/connection/http_requests.py", line 85, in perform_request
         raise ConnectionError('N/A', str(e), e)
     elasticsearch.exceptions.ConnectionError: ConnectionError(HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcb84c154d0>: Failed to establish a new connection: [Errno -2] Name does not resolve',))) caused by: ConnectionError(HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcb84c154d0>: Failed to establish a new connection: [Errno -2] Name does not resolve',)))

16:56:05.680Z ERROR elastalert-server: ProcessController:  ElastAlert exited with code 1

Simple repro:

  1. start elastalert container
  2. see max retries exceeded
  3. see container is still 'up' with docker-compose ps
  4. start elasticsearch container
  5. notice elastalert container logs do not show a retry, but the container is still 'up'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions