Bug Report
Describe the Bug
Hi @dgreif,
This may not really be a bug, maybe just a design decision, and I could definitely deal with it on the application side, but, as I was researching this behavior, it really seemed like a bug to me, so I thought I would document it here just in case.
What I've noticed is that, if Ring has any temporary issue where requests fail with authentication errors (for example, on Oct 5th, there was an ~1hr service outage where requests were failing with authentication errors) that the API never recovers from this, it just keeps attempting to poll (for example, due to the background camera polling) and returning this error:
Refresh token is not valid. Unable to authenticate with Ring servers. See https://github.com/dgreif/ring/wiki/Refresh-Tokens
This message is incorrect as the refresh token was still valid. For example, if I just restart, using the most recently saved refresh token, the authentication will work, so clearly the token is still valid.
In researching this I believe it has to do with how the getAuth() function treats a failure to acquire an access token. If I understand the flow (and it's possible I don't), whenever any request fails with a "401 Unauthorized" error, the most recent authentication response is immediately cleared (_authPromise set to undefined) and a new authentication attempt is made via await on authPromise, which itself calls getAuth(). Normally getAuth() uses the current refresh token to POST to https://oauth.ring.com/oauth/token and get an updated authentication response with a new access token. However, during any type of temporary service outage, the POST to https://oauth.ring.com/oauth/token may fail, and, if I'm following the logic correctly, the catch then immediately clears the existing, active refresh token.
At this point, all future requests are doomed to fail since there is no longer a valid refresh token in the runtime environment. Calls to REST API continue, each one awaiting for authPromise, which calls getAuth(), which calls getGrantData(), however, since this.refreshToken is now null, getGrantData() will always fall through to the throw and the message noted above will be logged. Once this.refreshToken is null I don't see any way the API could ever recover, it might as well exit at that point as far as I can see.
I'm not 100% sure what should be done here, but it feels to me that a previously working refresh token should not be invalidated immediately on a failure, however, perhaps I'm supposed to be doing something to catch this behavior?
To Reproduce
It's really difficult to reproduce since it requires Ring to reject the current access token and refresh token, but I could probably write up a code example that simulates the behavior by directly manipulating the access and refresh tokens if really needed.
Expected behavior
IMO the API should recover from a temporary service outage without requiring this to be handled on the application side, no different than a network outage.
Screenshots/Logs
Just for the sake of example, here's an example of the logs showing it failing every 20 seconds (due to polling cameras), note that this is after the service was restored:
2022-10-10T12:33:04.938Z ring-mqtt Refresh token is not valid. Unable to authenticate with Ring servers. See https://github.com/dgreif/ring/wiki/Refresh-Tokens
2022-10-10T12:33:24.939Z ring-mqtt Refresh token is not valid. Unable to authenticate with Ring servers. See https://github.com/dgreif/ring/wiki/Refresh-Tokens
2022-10-10T12:33:44.952Z ring-mqtt Refresh token is not valid. Unable to authenticate with Ring servers. See https://github.com/dgreif/ring/wiki/Refresh-Tokens
Additional context
Obviously, this is only an issue when there are unexpected authentication service outages. It's not clear to me how to tell a service failure vs an actual expired refresh token, but, realistically, if the code has made an initial connection using the refresh token provided, and from that point is constantly maintaining the token, then it seems reasonable to expect that this token becoming suddenly invalid is a service failure vs a real authentication failure, although I of course do know there are potential cases where a working token may be invalidated, for example, if a device is removed from allowed devices.
I've noticed that the Ring app seems to recover from this just fine, so it seems like ring-client-api should be able to deal with this as well.
Environment
- OS: Ubuntu 20.04
- Node.js: 16.17.1
- NPM: 8.15.0
- ring-client-api: 11.3.0
Bug Report
Describe the Bug
Hi @dgreif,
This may not really be a bug, maybe just a design decision, and I could definitely deal with it on the application side, but, as I was researching this behavior, it really seemed like a bug to me, so I thought I would document it here just in case.
What I've noticed is that, if Ring has any temporary issue where requests fail with authentication errors (for example, on Oct 5th, there was an ~1hr service outage where requests were failing with authentication errors) that the API never recovers from this, it just keeps attempting to poll (for example, due to the background camera polling) and returning this error:
This message is incorrect as the refresh token was still valid. For example, if I just restart, using the most recently saved refresh token, the authentication will work, so clearly the token is still valid.
In researching this I believe it has to do with how the getAuth() function treats a failure to acquire an access token. If I understand the flow (and it's possible I don't), whenever any request fails with a "401 Unauthorized" error, the most recent authentication response is immediately cleared (_authPromise set to undefined) and a new authentication attempt is made via await on authPromise, which itself calls getAuth(). Normally getAuth() uses the current refresh token to POST to https://oauth.ring.com/oauth/token and get an updated authentication response with a new access token. However, during any type of temporary service outage, the POST to https://oauth.ring.com/oauth/token may fail, and, if I'm following the logic correctly, the catch then immediately clears the existing, active refresh token.
At this point, all future requests are doomed to fail since there is no longer a valid refresh token in the runtime environment. Calls to REST API continue, each one awaiting for authPromise, which calls getAuth(), which calls getGrantData(), however, since this.refreshToken is now null, getGrantData() will always fall through to the throw and the message noted above will be logged. Once this.refreshToken is null I don't see any way the API could ever recover, it might as well exit at that point as far as I can see.
I'm not 100% sure what should be done here, but it feels to me that a previously working refresh token should not be invalidated immediately on a failure, however, perhaps I'm supposed to be doing something to catch this behavior?
To Reproduce
It's really difficult to reproduce since it requires Ring to reject the current access token and refresh token, but I could probably write up a code example that simulates the behavior by directly manipulating the access and refresh tokens if really needed.
Expected behavior
IMO the API should recover from a temporary service outage without requiring this to be handled on the application side, no different than a network outage.
Screenshots/Logs
Just for the sake of example, here's an example of the logs showing it failing every 20 seconds (due to polling cameras), note that this is after the service was restored:
Additional context
Obviously, this is only an issue when there are unexpected authentication service outages. It's not clear to me how to tell a service failure vs an actual expired refresh token, but, realistically, if the code has made an initial connection using the refresh token provided, and from that point is constantly maintaining the token, then it seems reasonable to expect that this token becoming suddenly invalid is a service failure vs a real authentication failure, although I of course do know there are potential cases where a working token may be invalidated, for example, if a device is removed from allowed devices.
I've noticed that the Ring app seems to recover from this just fine, so it seems like ring-client-api should be able to deal with this as well.
Environment