Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion api/src/com/cloud/resource/ResourceService.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

import java.util.List;

import com.cloud.exception.AgentUnavailableException;
import com.cloud.utils.exception.CloudRuntimeException;
import com.cloud.dc.DataCenter;
import org.apache.cloudstack.api.command.admin.cluster.AddClusterCmd;
import org.apache.cloudstack.api.command.admin.cluster.DeleteClusterCmd;
Expand Down Expand Up @@ -50,7 +52,7 @@ public interface ResourceService {

Host cancelMaintenance(CancelMaintenanceCmd cmd);

Host reconnectHost(ReconnectHostCmd cmd);
Host reconnectHost(ReconnectHostCmd cmd) throws CloudRuntimeException, AgentUnavailableException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CloudRuntimeException is a RuntimeException you do not need to declare it in the method signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvbharatk this is one of the questions I still have here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
I answered it below


/**
* We will automatically create a cloud.com cluster to attach to the external cluster and return a hyper host to perform
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
// under the License.
package org.apache.cloudstack.api.command.admin.host;

import com.cloud.exception.AgentUnavailableException;
import com.cloud.exception.InvalidParameterValueException;
import com.cloud.utils.exception.CloudRuntimeException;
import org.apache.log4j.Logger;

import org.apache.cloudstack.api.APICommand;
Expand Down Expand Up @@ -100,17 +103,18 @@ public Long getInstanceId() {
@Override
public void execute() {
try {
Host result = _resourceService.reconnectHost(this);
if (result != null) {
HostResponse response = _responseGenerator.createHostResponse(result);
response.setResponseName(getCommandName());
this.setResponseObject(response);
} else {
throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Failed to reconnect host");
}
} catch (Exception ex) {
s_logger.warn("Exception: ", ex);
throw new ServerApiException(ApiErrorCode.RESOURCE_UNAVAILABLE_ERROR, ex.getMessage());
Host result =_resourceService.reconnectHost(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the result was checked for null; can the method _resourceService.reconnectHost(this) return null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
reconnectHost now throws an exception instead of returning a null. So i have removed the null check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the explanation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: missing space -> result = _resourceService

HostResponse response = _responseGenerator.createHostResponse(result);
response.setResponseName(getCommandName());
this.setResponseObject(response);
}catch (InvalidParameterValueException e) {
throw new ServerApiException(ApiErrorCode.PARAM_ERROR, e.getMessage());
}
catch (CloudRuntimeException e) {
s_logger.warn("Exception: ", e);
throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, e.getMessage());
}catch (AgentUnavailableException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: please indent those catch blocks the same way

throw new ServerApiException(ApiErrorCode.RESOURCE_UNAVAILABLE_ERROR, e.getMessage());
}
}
}
3 changes: 2 additions & 1 deletion engine/components-api/src/com/cloud/agent/AgentManager.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
// under the License.
package com.cloud.agent;

import com.cloud.utils.exception.CloudRuntimeException;
import org.apache.cloudstack.framework.config.ConfigKey;

import com.cloud.agent.api.Answer;
Expand Down Expand Up @@ -141,7 +142,7 @@ public enum TapAgentsAction {

public void pullAgentOutMaintenance(long hostId);

boolean reconnect(long hostId);
void reconnect(long hostId) throws CloudRuntimeException, AgentUnavailableException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CloudRuntimeException is a RuntimeException you do not need to declare it in the method signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
What you said is correct in general. But in case of our code we can see places where we catch runtime exceptions and fail without giving out the actual reason. If we think the reason is of no use to the user, we should simply bubble them up and handle them accordingly in the top most calling method. If we think the reason is not of importance to the user, we can simply log it and then send appropriate error messages.

In the current code i wanted to force people to bubble up the runtime exceptions as well and then take a call at the top most calling method. This way if something fails we do not see multiple exception messages one for each level at which the failure occurred and with different error messages leading to confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvbharatk I got your intentions here. However, don’t you think we can work out the places where people are login runtime exceptions, instead of bubbling them up? I believe that instead of creating constraints in the code, we should spot when people are introducing these things, and educate them. For the code that is already doing the things you mentioned, well, we can work to fix them bit by bit.

It feels a little unusual (at least for me) to declare runtime exceptions. At the end of the day, you may make it more visible, but just because you declare a runtime exception, it does not mean people will be forced to catch it (they will not be forced to catch it).

BTW: I never had a good expression to talk about the re-throw of exceptions. Thanks for the “bubble exception up”.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
When i said they will have to catch i meant, When a method declaration has one or more exceptions defined using throws clause then the method-call must handle all the defined exceptions. If he wants to handle it he will have to catch it, if he dose not want to handle he can bubble it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking in a philosophical/ideological way? Because CloudRuntimeException is a runtime exception, so the compiler will not complain if the code calls the method and does not handle the declared exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also another question, I do not see the benefit of declaring a runtime exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
Well the Idea was to tell that CloudRuntimeExceptions should also be sent to the caller(in case of current Cloudstack code), and then the caller will add more context and throw a single exception, which will be encapsulating all the lower level exceptions. Currently if i make RuntimeException a non Runtime one it will break the code at many places. The idea was to do a cleanup bit by bit and once we are at a position to change the RuntimeException to a non runtime we do it. Well at least this is what i thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I understand the re-throwing of a checked exception as runtime one.
So, from what I understood, you want to tell other people that may use the "reconnect" method that it throws a cloud runtime exception, is that it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
It is just a heads up to tell them hey this method can also throw a runtimeException and i think it needs to be handled by the caller. In way it acts like a lable to do further refactoring. If you see the implementation of reconnect in clusteredAgentMangerImpl, it throws runtime exceptions which tell the reason why it failed to reconnect, but since it is a runtimeException no one care to check it and hence the reason for failure is not propagated to the caller, and the API will just say failed to reconnect without giving any reason for the failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation that you just said, for me, seems a great piece of information to be in a method documentation. Therefore, instead of declaring a runtime, we can document why we need to catch and deal with it, something in the JavaDoc explaining that would be great; e.g."hey dude, if you want to use this method, please catch and deal with CloudRuntimeExceptions" (of couse, with a more polished language ;) ).

I am still not convinced on declaring runtime exceptions, though. Philosophically speaking, they do not need to (should not?) be declared. It does not harm, but I also do not see the benefit of it.

Anyways, if people are comfortable with it, I cannot do anything else.


void rescan();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -986,33 +986,28 @@ public Answer[] send(final Long hostId, final Commands cmds) throws AgentUnavail
}

@Override
public boolean reconnect(final long hostId) {
public void reconnect(final long hostId) throws CloudRuntimeException, AgentUnavailableException{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added AgentUnavailableException to the method signature, but I did not see you throwing this exception anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
The reconnect method is a method in the interface AgentManager.java, This interface is also implemented by ClusteredAgentManagerImpl.java in which propagateAgentEvent method throws a AgentUnavailableException. So i had to add this to the interface and also to the implementing class method.

HostVO host;

host = _hostDao.findById(hostId);
if (host == null || host.getRemoved() != null) {
s_logger.warn("Unable to find host " + hostId);
return false;
throw new CloudRuntimeException("Unable to find host " + hostId);
}

if (host.getStatus() == Status.Disconnected) {
s_logger.info("Host is already disconnected, no work to be done");
return true;
throw new CloudRuntimeException("Host is already disconnected, no work to be done");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks pretty wrong to throw an exception when no action should be taken. What are the logic to throw an exception here?

}

if (host.getStatus() != Status.Up && host.getStatus() != Status.Alert && host.getStatus() != Status.Rebalancing) {
s_logger.info("Unable to disconnect host because it is not in the correct state: host=" + hostId + "; Status=" + host.getStatus());
return false;
throw new CloudRuntimeException("Unable to disconnect host because it is not in the correct state: host=" + hostId + "; Status=" + host.getStatus());
}

final AgentAttache attache = findAttache(hostId);
if (attache == null) {
s_logger.info("Unable to disconnect host because it is not connected to this server: " + hostId);
return false;
throw new CloudRuntimeException("Unable to disconnect host because it is not connected to this server: " + hostId);
}

disconnectWithoutInvestigation(attache, Event.ShutdownRequested);
return true;
}

@Override
Expand Down Expand Up @@ -1049,7 +1044,13 @@ public boolean executeUserRequest(final long hostId, final Event event) throws A
}
return true;
} else if (event == Event.ShutdownRequested) {
return reconnect(hostId);
//should throw a exception here as well.instead of eating this up.
try {
reconnect(hostId);
} catch (CloudRuntimeException e) {
Copy link
Contributor

@sureshanaparti sureshanaparti Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvbharatk Is it possible to take the failure reason forward and show the appropriate alert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sureshanaparti
I agree with you, I have also added a comment in the code indicating the same. We can work on it in a different pr. This is just a small improvement to the existing code. Fixing this will need lot of rework and cause code churn. So the intention was to do this later in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvbharatk Thanks for the clarification.

return false;
}
return true;
}
return false;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -357,19 +357,12 @@ public boolean executeUserRequest(final long hostId, final Event event) throws A
}

@Override
public boolean reconnect(final long hostId) {
Boolean result;
try {
result = propagateAgentEvent(hostId, Event.ShutdownRequested);
if (result != null) {
return result;
}
} catch (final AgentUnavailableException e) {
s_logger.debug("cannot propagate agent reconnect because agent is not available", e);
return false;
public void reconnect(final long hostId) throws CloudRuntimeException, AgentUnavailableException {
Boolean result = propagateAgentEvent(hostId, Event.ShutdownRequested);
if (result!=null && !result) {
throw new CloudRuntimeException("Failed to propagating agent change request event:" + Event.ShutdownRequested + " to host:" + hostId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, grammar typo: Failed to propagate agent change....

}

return super.reconnect(hostId);
super.reconnect(hostId);
}

public void notifyNodesInCluster(final AgentAttache attache) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,11 @@ public void doInTransactionWithoutResult(TransactionStatus status) {
});
HostVO host = _hostDao.findById(lbDeviceVo.getHostId());

_agentMgr.reconnect(host.getId());
try {
_agentMgr.reconnect(host.getId());
} catch (Exception e ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot you use a more specific catch here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a more specific catch makes sense, I will take this up in a separate PR. This was supposed to be a small improvement to the existing code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
I miss-looked the call for propagateAgentEvent method

s_logger.debug("failed to reconnect host "+host);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would log the message at least at the info level, warn would be better IMO

}
return lbDeviceVo;
}

Expand Down
4 changes: 3 additions & 1 deletion server/src/com/cloud/alert/AlertManagerImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -767,7 +767,9 @@ public void sendAlert(AlertType alertType, long dataCenterId, Long podId, Long c
// set up a new alert
AlertVO newAlert = new AlertVO();
newAlert.setType(alertType.getType());
newAlert.setSubject(subject);
//do not have a seperate column for content.
//appending the message to the subject for now.
newAlert.setSubject(subject+content);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is a good idea?
If the column for content does exist, what about creating it with this PR? Thus, we can avoid this type of half measure solution. Especially that the column subject has a limitation on its size @Column(name = "subject", length = 999); this can potentially create problems in certain conditions in runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
I think we can live with this for now and take it up in a separate PR.

Copy link
Member

@rafaelweingartner rafaelweingartner Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that 999 characters may seem a lot, but the content may contain some long strings. for instance, in com.cloud.network.router.VirtualNetworkApplianceManagerImpl.RvRStatusUpdateTask.checkDuplicateMaster(List<DomainRouterVO>), it is generated a subject and a content that together (depending on variable combinations used to create those strings) can go over 500 chars.

What concerns me the most are methods such as org.apache.cloudstack.storage.image.BaseImageStoreDriverImpl.createVolumeAsyncCallback(AsyncCallbackDispatcher<? extends BaseImageStoreDriverImpl, DownloadAnswer>, CreateContext<CreateCmdResult>), where the subject and content would be the same, and they are created using an error message that may come from the most different places.

The use of com.cloud.alert.AlertManagerImpl.sendAlert(AlertType, long, Long, String, String) is quite broad and it would be a lot of work to check every single place it is used. For me, the code that you are adding at line 772 seems like a half-measure that is bug prone. I would like to hear the feedback from others on this topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
I agree with you that it is a half measure, And so i commented "appending the message to the subject for now." The idea was just to improve it little by little. We need not do every thing in this PR. We can work on it in a separate one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that we should not try to save the world in a single day. However, comments in the code for me mean nothing (comments and documentation are two different things). ACS has a lot of comments saying to improve this or that, and they just stay there for days, months and years. So, adding a comment like that does not bring any value to the code. My point is that this specific case is not a complicated one; it is only a matter of adding a column to a table, then a new property in a POJO, and setting the correct value in the newly created property.

The same way I understand that we should not try to save the planet at once, I also have the philosophy that we should not let to do tomorrow what can be done today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
I agree with you but "what can be done today" depends on the individuals priorities and bandwidth. Just so that the required improvement dose not get lost in comments i have created a ticket in Jira to track this, CLOUDSTACK-9846.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you regarding the time of contributor. I also find it great that you documented this and opened a Jira ticket. However, for this specific case, I am really not comfortable with the change as it is. As I said before, the code at line 772 is opening the gates for unexpected runtime exceptions (A.K.A. bugs). If others are willing to take the risk of merging and then later dealing with the consequences, I cannot do anything against it. I am only pointing at the problem and making it quite clear what I think.

I really do not see any trouble to do things the right way here. It is only a matter of creating an alter table SQL that adds a field to a table. Then, you have to create this new field in AlertVO, and use it; as simple as that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner
I understand the changes need are simple. I think that we are over exaggerating the consequence here. The worst that can happen is a db exception if the length of the sting becomes greater than 999, which has not happened until now. It dose not effect any of the critical functionality. So i think we can fix this later in another PR. Besides this PR has been in the review board since more than a year now and i do not want to spend any more time on this. If you think this is really a critical bug and we need to fix it asap, I will be more than happy if you can lend me a hand and take this forward.
Thanks
Bharat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @rafaelweingartner

newAlert.setClusterId(clusterId);
newAlert.setPodId(podId);
newAlert.setDataCenterId(dataCenterId);
Expand Down
7 changes: 4 additions & 3 deletions server/src/com/cloud/resource/ResourceManagerImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -1157,15 +1157,16 @@ public Host cancelMaintenance(final CancelMaintenanceCmd cmd) {
}

@Override
public Host reconnectHost(final ReconnectHostCmd cmd) {
final Long hostId = cmd.getId();
public Host reconnectHost(ReconnectHostCmd cmd) throws CloudRuntimeException, AgentUnavailableException{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method need to return a host? It seems that it just reconnects to a host, if not, throws an exception.
If so, it could be void as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigo93

The scope of this pr is to improve the exception handling and error messaging. Changing the method signature is not a immediate concern. I will make the suggested changes in another pr if i find time.

Long hostId = cmd.getId();

final HostVO host = _hostDao.findById(hostId);
if (host == null) {
throw new InvalidParameterValueException("Host with id " + hostId.toString() + " doesn't exist");
}

return _agentMgr.reconnect(hostId) ? host : null;
_agentMgr.reconnect(hostId);
return host;
}

@Override
Expand Down