Skip to content

Restcomm1147 rms resources leak on conf #2564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 10, 2017

Conversation

gvagenas
Copy link
Contributor

@gvagenas gvagenas commented Oct 9, 2017

PR also includes MGCP resources metrics and updated tests to assert MGCP resources are properly cleaned up

This refer to RESTCOMM-1147
This refer to RESTCOMM-1155
This refer to RESTCOMM-1155
This refer to RESTCOMM-1155
This close RESTCOMM-1158
This refer to RESTCOMM-1155
…rt proper MGCP resources at the end of each test

This refer to RESTCOMM-1155
This refer to RESTCOMM-1147
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8" ?>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we need to find a way to configure how this is used in JenkinsCI env...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree @jaimecasero . I did this to quickly enable logs in the testsuite since they are essential when working locally with the testsuite

public static Map<MediaSession, ActorRef> getLinks() {
return links;
connectionToEndpointMap = new ConcurrentHashMap<ActorRef, ActorRef>();
connEndpointMap = new ConcurrentHashMap<String, String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering this is Actor state, and concurrency should be already solved by Actor model, is there any particular intention on using ConcurrentHashMap??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero no particular reason, just a habit


public static Map<MediaSession, ActorRef> getLinks() {
return links;
connectionToEndpointMap = new ConcurrentHashMap<ActorRef, ActorRef>();
Copy link
Contributor

@jaimecasero jaimecasero Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont seem to find where connectionToEndpointMap is used? could it be this slid into final code after different design/refactorings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero I forgot to clean up this one. I will clean this now

@@ -400,6 +439,26 @@ private void modifyConnection(final Object message, final ActorRef sender) {
private void deleteConnection(final Object message, final ActorRef sender) {
final ActorRef self = self();
final DeleteConnection dlcx = (DeleteConnection) message;
if (dlcx.getConnectionIdentifier() == null) {
connEndpointMap.values().removeAll(Collections.singleton(dlcx.getEndpointIdentifier().getLocalEndpointName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log the result of this operation, i think it could be worth to print it, whether we deleted something or not...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero you mean the operation to remove from the Map? Yes I can add this at the log statement later

logger.info(msg);
}
} else {
connEndpointMap.remove(dlcx.getConnectionIdentifier().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, can we save the result of this operation for logging later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure as above

} else {
connEndpointMap.remove(dlcx.getConnectionIdentifier().toString());
monitoringService.tell(new MgcpConnectionDeleted(dlcx.getConnectionIdentifier().toString(), null), self());
if (!connEndpointMap.values().contains(dlcx.getEndpointIdentifier().getLocalEndpointName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add some comment to this logic? im not sure im following the intention. It seems we already removed the map entry using the connId, which is the key to the map. Then we check if the endpointId is still in the map as value, if not we tell the service that is was removed. Is there an scenario where connId was removed, but endpointId is still present, what would be the use case there? Are the endpointId shared among different connIds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero RC only deletes connections, on the RMS side when there is no connection for a given endpoint, the endpoint is released.
This is what the method does here.
First remove endpoint-connection, then check if there are more connections for the give endpoint, IF NOT, consider the endpoint released and tell the MonitoringService.

I added comments for that.

mgcpEndpointsConference = new AtomicInteger(0);
mgcpEndpointsPacketRelay = new AtomicInteger(0);

mgcpEndpointMap = new ConcurrentHashMap<String, String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, do we need concurrent collections as Actor state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a just a habit

mgcpEndpointsConference = new AtomicInteger(0);
mgcpEndpointsPacketRelay = new AtomicInteger(0);

mgcpEndpointMap = new ConcurrentHashMap<String, String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we only need the size of these maps to return as part of onStatistics logic... Do we really need to save all this info into maps? it seems a potential place for unnecesary leaks, when the size of these maps can be already calculated by using other counters already present... I would try to reduce state to simple counters, that maybe be wrong but do not risk to introduce more leaks...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero I am thinking that in the future we can provide MGCP resource details by call, similar to the live call details, so in that case the Map is needed.
Agree with your point but I don't see a risk of leaks here.

fsm.transition(message, completed);
}
//Don't wait for ever for the CallController response
context().setReceiveTimeout(Duration.create(2000, TimeUnit.SECONDS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kind of timeouts should come from configuration, so we can tweak then (or customers) depending on envirnoment... where is 2 secs coming from (for me is regular call timeout, just wondering if this is iline with some other system in the cloud or something)=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero this is a safety net for Conference in a cluster configuration where notification from media server might never arrive. It doesn't need to be on configuration since this is an internal, very specific timer to be able to cleanup call no matter what error condition might occur. No particular reason for for the 2 sec duration, I can reduce it if you think its too much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds seems ok

@@ -288,6 +289,13 @@ public void testDialCancelBeforeDialingClientAliceAfterTrying() throws ParseExce
assertTrue(maxConcurrentCalls==0);
assertTrue(maxConcurrentIncomingCalls==0);
assertTrue(maxConcurrentOutgoingCalls==0);

Map<String, Integer> mgcpResources = MonitoringServiceTool.getInstance().getMgcpResources(metrics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a fairly common construct, could we isolate this in a Assert util class, where access to monitoring services, and literal to specific counters are protected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimecasero I thought about that but let this task for later as it will take significant time. I did half of the work (MonitoringServiceTool.getInstance().getMgcpResources(metrics)) but the rest of the enhancement can be done in another sprint with proper estimation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

This refer to RESTCOMM-1147
@gvagenas gvagenas merged commit 0e7dc36 into master Oct 10, 2017
@jaimecasero jaimecasero deleted the restcomm1147_rms_resources_leak_on_conf branch April 11, 2018 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants