-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CLOUDSTACK-8855 Improve Error Message for Host Alert State and reconnect host API. #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,9 @@ | |
| // under the License. | ||
| package org.apache.cloudstack.api.command.admin.host; | ||
|
|
||
| import com.cloud.exception.AgentUnavailableException; | ||
| import com.cloud.exception.InvalidParameterValueException; | ||
| import com.cloud.utils.exception.CloudRuntimeException; | ||
| import org.apache.log4j.Logger; | ||
|
|
||
| import org.apache.cloudstack.api.APICommand; | ||
|
|
@@ -100,17 +103,18 @@ public Long getInstanceId() { | |
| @Override | ||
| public void execute() { | ||
| try { | ||
| Host result = _resourceService.reconnectHost(this); | ||
| if (result != null) { | ||
| HostResponse response = _responseGenerator.createHostResponse(result); | ||
| response.setResponseName(getCommandName()); | ||
| this.setResponseObject(response); | ||
| } else { | ||
| throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Failed to reconnect host"); | ||
| } | ||
| } catch (Exception ex) { | ||
| s_logger.warn("Exception: ", ex); | ||
| throw new ServerApiException(ApiErrorCode.RESOURCE_UNAVAILABLE_ERROR, ex.getMessage()); | ||
| Host result =_resourceService.reconnectHost(this); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great, thanks for the explanation.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: missing space -> |
||
| HostResponse response = _responseGenerator.createHostResponse(result); | ||
| response.setResponseName(getCommandName()); | ||
| this.setResponseObject(response); | ||
| }catch (InvalidParameterValueException e) { | ||
| throw new ServerApiException(ApiErrorCode.PARAM_ERROR, e.getMessage()); | ||
| } | ||
| catch (CloudRuntimeException e) { | ||
| s_logger.warn("Exception: ", e); | ||
| throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, e.getMessage()); | ||
| }catch (AgentUnavailableException e) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: please indent those catch blocks the same way |
||
| throw new ServerApiException(ApiErrorCode.RESOURCE_UNAVAILABLE_ERROR, e.getMessage()); | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
| // under the License. | ||
| package com.cloud.agent; | ||
|
|
||
| import com.cloud.utils.exception.CloudRuntimeException; | ||
| import org.apache.cloudstack.framework.config.ConfigKey; | ||
|
|
||
| import com.cloud.agent.api.Answer; | ||
|
|
@@ -141,7 +142,7 @@ public enum TapAgentsAction { | |
|
|
||
| public void pullAgentOutMaintenance(long hostId); | ||
|
|
||
| boolean reconnect(long hostId); | ||
| void reconnect(long hostId) throws CloudRuntimeException, AgentUnavailableException; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner In the current code i wanted to force people to bubble up the runtime exceptions as well and then take a call at the top most calling method. This way if something fails we do not see multiple exception messages one for each level at which the failure occurred and with different error messages leading to confusion.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bvbharatk I got your intentions here. However, don’t you think we can work out the places where people are login runtime exceptions, instead of bubbling them up? I believe that instead of creating constraints in the code, we should spot when people are introducing these things, and educate them. For the code that is already doing the things you mentioned, well, we can work to fix them bit by bit. It feels a little unusual (at least for me) to declare runtime exceptions. At the end of the day, you may make it more visible, but just because you declare a runtime exception, it does not mean people will be forced to catch it (they will not be forced to catch it). BTW: I never had a good expression to talk about the re-throw of exceptions. Thanks for the “bubble exception up”.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you talking in a philosophical/ideological way? Because
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is also another question, I do not see the benefit of declaring a runtime exception
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I understand the re-throwing of a checked exception as runtime one.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This explanation that you just said, for me, seems a great piece of information to be in a method documentation. Therefore, instead of declaring a runtime, we can document why we need to catch and deal with it, something in the JavaDoc explaining that would be great; e.g."hey dude, if you want to use this method, please catch and deal with CloudRuntimeExceptions" (of couse, with a more polished language ;) ). I am still not convinced on declaring runtime exceptions, though. Philosophically speaking, they do not need to (should not?) be declared. It does not harm, but I also do not see the benefit of it. Anyways, if people are comfortable with it, I cannot do anything else. |
||
|
|
||
| void rescan(); | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -986,33 +986,28 @@ public Answer[] send(final Long hostId, final Commands cmds) throws AgentUnavail | |
| } | ||
|
|
||
| @Override | ||
| public boolean reconnect(final long hostId) { | ||
| public void reconnect(final long hostId) throws CloudRuntimeException, AgentUnavailableException{ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You added
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner |
||
| HostVO host; | ||
|
|
||
| host = _hostDao.findById(hostId); | ||
| if (host == null || host.getRemoved() != null) { | ||
| s_logger.warn("Unable to find host " + hostId); | ||
| return false; | ||
| throw new CloudRuntimeException("Unable to find host " + hostId); | ||
| } | ||
|
|
||
| if (host.getStatus() == Status.Disconnected) { | ||
| s_logger.info("Host is already disconnected, no work to be done"); | ||
| return true; | ||
| throw new CloudRuntimeException("Host is already disconnected, no work to be done"); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks pretty wrong to throw an exception when no action should be taken. What are the logic to throw an exception here? |
||
| } | ||
|
|
||
| if (host.getStatus() != Status.Up && host.getStatus() != Status.Alert && host.getStatus() != Status.Rebalancing) { | ||
| s_logger.info("Unable to disconnect host because it is not in the correct state: host=" + hostId + "; Status=" + host.getStatus()); | ||
| return false; | ||
| throw new CloudRuntimeException("Unable to disconnect host because it is not in the correct state: host=" + hostId + "; Status=" + host.getStatus()); | ||
| } | ||
|
|
||
| final AgentAttache attache = findAttache(hostId); | ||
| if (attache == null) { | ||
| s_logger.info("Unable to disconnect host because it is not connected to this server: " + hostId); | ||
| return false; | ||
| throw new CloudRuntimeException("Unable to disconnect host because it is not connected to this server: " + hostId); | ||
| } | ||
|
|
||
| disconnectWithoutInvestigation(attache, Event.ShutdownRequested); | ||
| return true; | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -1049,7 +1044,13 @@ public boolean executeUserRequest(final long hostId, final Event event) throws A | |
| } | ||
| return true; | ||
| } else if (event == Event.ShutdownRequested) { | ||
| return reconnect(hostId); | ||
| //should throw a exception here as well.instead of eating this up. | ||
| try { | ||
| reconnect(hostId); | ||
| } catch (CloudRuntimeException e) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bvbharatk Is it possible to take the failure reason forward and show the appropriate alert?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sureshanaparti
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bvbharatk Thanks for the clarification. |
||
| return false; | ||
| } | ||
| return true; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -357,19 +357,12 @@ public boolean executeUserRequest(final long hostId, final Event event) throws A | |
| } | ||
|
|
||
| @Override | ||
| public boolean reconnect(final long hostId) { | ||
| Boolean result; | ||
| try { | ||
| result = propagateAgentEvent(hostId, Event.ShutdownRequested); | ||
| if (result != null) { | ||
| return result; | ||
| } | ||
| } catch (final AgentUnavailableException e) { | ||
| s_logger.debug("cannot propagate agent reconnect because agent is not available", e); | ||
| return false; | ||
| public void reconnect(final long hostId) throws CloudRuntimeException, AgentUnavailableException { | ||
| Boolean result = propagateAgentEvent(hostId, Event.ShutdownRequested); | ||
| if (result!=null && !result) { | ||
| throw new CloudRuntimeException("Failed to propagating agent change request event:" + Event.ShutdownRequested + " to host:" + hostId); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick, grammar typo: |
||
| } | ||
|
|
||
| return super.reconnect(hostId); | ||
| super.reconnect(hostId); | ||
| } | ||
|
|
||
| public void notifyNodesInCluster(final AgentAttache attache) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -512,7 +512,11 @@ public void doInTransactionWithoutResult(TransactionStatus status) { | |
| }); | ||
| HostVO host = _hostDao.findById(lbDeviceVo.getHostId()); | ||
|
|
||
| _agentMgr.reconnect(host.getId()); | ||
| try { | ||
| _agentMgr.reconnect(host.getId()); | ||
| } catch (Exception e ) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cannot you use a more specific
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using a more specific catch makes sense, I will take this up in a separate PR. This was supposed to be a small improvement to the existing code.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. |
||
| s_logger.debug("failed to reconnect host "+host); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would log the message at least at the info level, warn would be better IMO |
||
| } | ||
| return lbDeviceVo; | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -767,7 +767,9 @@ public void sendAlert(AlertType alertType, long dataCenterId, Long podId, Long c | |
| // set up a new alert | ||
| AlertVO newAlert = new AlertVO(); | ||
| newAlert.setType(alertType.getType()); | ||
| newAlert.setSubject(subject); | ||
| //do not have a seperate column for content. | ||
| //appending the message to the subject for now. | ||
| newAlert.setSubject(subject+content); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure this is a good idea?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand that 999 characters may seem a lot, but the What concerns me the most are methods such as The use of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you that we should not try to save the world in a single day. However, comments in the code for me mean nothing (comments and documentation are two different things). ACS has a lot of comments saying to improve this or that, and they just stay there for days, months and years. So, adding a comment like that does not bring any value to the code. My point is that this specific case is not a complicated one; it is only a matter of adding a column to a table, then a new property in a POJO, and setting the correct value in the newly created property. The same way I understand that we should not try to save the planet at once, I also have the philosophy that we should not let to do tomorrow what can be done today.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you regarding the time of contributor. I also find it great that you documented this and opened a Jira ticket. However, for this specific case, I am really not comfortable with the change as it is. As I said before, the code at line 772 is opening the gates for unexpected runtime exceptions (A.K.A. bugs). If others are willing to take the risk of merging and then later dealing with the consequences, I cannot do anything against it. I am only pointing at the problem and making it quite clear what I think. I really do not see any trouble to do things the right way here. It is only a matter of creating an alter table SQL that adds a field to a table. Then, you have to create this new field in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelweingartner
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agree with @rafaelweingartner |
||
| newAlert.setClusterId(clusterId); | ||
| newAlert.setPodId(podId); | ||
| newAlert.setDataCenterId(dataCenterId); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1157,15 +1157,16 @@ public Host cancelMaintenance(final CancelMaintenanceCmd cmd) { | |
| } | ||
|
|
||
| @Override | ||
| public Host reconnectHost(final ReconnectHostCmd cmd) { | ||
| final Long hostId = cmd.getId(); | ||
| public Host reconnectHost(ReconnectHostCmd cmd) throws CloudRuntimeException, AgentUnavailableException{ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this method need to return a host? It seems that it just reconnects to a host, if not, throws an exception.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The scope of this pr is to improve the exception handling and error messaging. Changing the method signature is not a immediate concern. I will make the suggested changes in another pr if i find time. |
||
| Long hostId = cmd.getId(); | ||
|
|
||
| final HostVO host = _hostDao.findById(hostId); | ||
| if (host == null) { | ||
| throw new InvalidParameterValueException("Host with id " + hostId.toString() + " doesn't exist"); | ||
| } | ||
|
|
||
| return _agentMgr.reconnect(hostId) ? host : null; | ||
| _agentMgr.reconnect(hostId); | ||
| return host; | ||
| } | ||
|
|
||
| @Override | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
CloudRuntimeExceptionis aRuntimeExceptionyou do not need to declare it in the method signature.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bvbharatk this is one of the questions I still have here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaelweingartner
I answered it below