From net-http2 documentation:
It is RECOMMENDED to set the :error callback: if none is defined, the underlying socket thread may raise an error in the main thread at unexpected execution times.
But the problem is that Apnotic::ConnectionPool does not allow calling on on actual Apnotic::Connection to pass in the error handler.
Here's example scenario that leads to lost jobs - we've been seeing this in production for quite some time now but couldn't point a finger on it, but thanks to #68 it's got easily reproducible:
- There's a sidekiq worker set up to initialize pool as suggested in documentation with no error handler
- Fresh sidekiq process is booted and it starts processing jobs, some of them call that worker
- APNS connection fails with
SocketError or similar
- Sidekiq process crashes completely since that
SocketError is raised on a main thread
- Unprocessed jobs that were already picked up by that process are lost completely
Here is an example report of this exact behavior: sidekiq/sidekiq#3886
I think apnotic should definitely do a better job here to improve reliability and also stop suggesting unsafe usage in documentation. Here's what I would suggest:
- Allow to pass through connection error handler into
Apnotic::ConnectionPool
- Make this handler required or at least change the documentation to have it explicitly provided
Currently we work this around by creating a pool manually:
class Worker
POOL = ConnectionPool.new(size: 5) do
connection = Apnotic::Connection.new(...)
connection.on(:error) do |err|
Bugsnag.notify(ConnectionError.new(err.inspect))
end
connection
end
end
Please let me know if there are any thoughts. Thanks!
From net-http2 documentation:
But the problem is that
Apnotic::ConnectionPooldoes not allow callingonon actualApnotic::Connectionto pass in the error handler.Here's example scenario that leads to lost jobs - we've been seeing this in production for quite some time now but couldn't point a finger on it, but thanks to #68 it's got easily reproducible:
SocketErroror similarSocketErroris raised on a main threadHere is an example report of this exact behavior: sidekiq/sidekiq#3886
I think apnotic should definitely do a better job here to improve reliability and also stop suggesting unsafe usage in documentation. Here's what I would suggest:
Apnotic::ConnectionPoolCurrently we work this around by creating a pool manually:
Please let me know if there are any thoughts. Thanks!