-
Notifications
You must be signed in to change notification settings - Fork 215
Restcomm1147 rms resources leak on conf #2564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This refer to RESTCOMM-1147
This refer to RESTCOMM-1155
…resources_leak_on_conf
This refer to RESTCOMM-1155
This refer to RESTCOMM-1155
This close RESTCOMM-1155
This refer to RESTCOMM-1158
…restcomm1147_rms_resources_leak_on_conf
This close RESTCOMM-1158 This refer to RESTCOMM-1155
…test This refer to RESTCOMM-1155
This close RESTCOMM-1163
…rt proper MGCP resources at the end of each test This refer to RESTCOMM-1155 This refer to RESTCOMM-1147
@@ -0,0 +1,26 @@ | |||
<?xml version="1.0" encoding="UTF-8" ?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess we need to find a way to configure how this is used in JenkinsCI env...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree @jaimecasero . I did this to quickly enable logs in the testsuite since they are essential when working locally with the testsuite
public static Map<MediaSession, ActorRef> getLinks() { | ||
return links; | ||
connectionToEndpointMap = new ConcurrentHashMap<ActorRef, ActorRef>(); | ||
connEndpointMap = new ConcurrentHashMap<String, String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering this is Actor state, and concurrency should be already solved by Actor model, is there any particular intention on using ConcurrentHashMap??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero no particular reason, just a habit
|
||
public static Map<MediaSession, ActorRef> getLinks() { | ||
return links; | ||
connectionToEndpointMap = new ConcurrentHashMap<ActorRef, ActorRef>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont seem to find where connectionToEndpointMap is used? could it be this slid into final code after different design/refactorings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero I forgot to clean up this one. I will clean this now
@@ -400,6 +439,26 @@ private void modifyConnection(final Object message, final ActorRef sender) { | |||
private void deleteConnection(final Object message, final ActorRef sender) { | |||
final ActorRef self = self(); | |||
final DeleteConnection dlcx = (DeleteConnection) message; | |||
if (dlcx.getConnectionIdentifier() == null) { | |||
connEndpointMap.values().removeAll(Collections.singleton(dlcx.getEndpointIdentifier().getLocalEndpointName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we log the result of this operation, i think it could be worth to print it, whether we deleted something or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero you mean the operation to remove from the Map? Yes I can add this at the log statement later
logger.info(msg); | ||
} | ||
} else { | ||
connEndpointMap.remove(dlcx.getConnectionIdentifier().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, can we save the result of this operation for logging later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure as above
} else { | ||
connEndpointMap.remove(dlcx.getConnectionIdentifier().toString()); | ||
monitoringService.tell(new MgcpConnectionDeleted(dlcx.getConnectionIdentifier().toString(), null), self()); | ||
if (!connEndpointMap.values().contains(dlcx.getEndpointIdentifier().getLocalEndpointName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some comment to this logic? im not sure im following the intention. It seems we already removed the map entry using the connId, which is the key to the map. Then we check if the endpointId is still in the map as value, if not we tell the service that is was removed. Is there an scenario where connId was removed, but endpointId is still present, what would be the use case there? Are the endpointId shared among different connIds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero RC only deletes connections, on the RMS side when there is no connection for a given endpoint, the endpoint is released.
This is what the method does here.
First remove endpoint-connection, then check if there are more connections for the give endpoint, IF NOT, consider the endpoint released and tell the MonitoringService.
I added comments for that.
mgcpEndpointsConference = new AtomicInteger(0); | ||
mgcpEndpointsPacketRelay = new AtomicInteger(0); | ||
|
||
mgcpEndpointMap = new ConcurrentHashMap<String, String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, do we need concurrent collections as Actor state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a just a habit
mgcpEndpointsConference = new AtomicInteger(0); | ||
mgcpEndpointsPacketRelay = new AtomicInteger(0); | ||
|
||
mgcpEndpointMap = new ConcurrentHashMap<String, String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we only need the size of these maps to return as part of onStatistics logic... Do we really need to save all this info into maps? it seems a potential place for unnecesary leaks, when the size of these maps can be already calculated by using other counters already present... I would try to reduce state to simple counters, that maybe be wrong but do not risk to introduce more leaks...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero I am thinking that in the future we can provide MGCP resource details by call, similar to the live call details, so in that case the Map is needed.
Agree with your point but I don't see a risk of leaks here.
fsm.transition(message, completed); | ||
} | ||
//Don't wait for ever for the CallController response | ||
context().setReceiveTimeout(Duration.create(2000, TimeUnit.SECONDS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of timeouts should come from configuration, so we can tweak then (or customers) depending on envirnoment... where is 2 secs coming from (for me is regular call timeout, just wondering if this is iline with some other system in the cloud or something)=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero this is a safety net for Conference in a cluster configuration where notification from media server might never arrive. It doesn't need to be on configuration since this is an internal, very specific timer to be able to cleanup call no matter what error condition might occur. No particular reason for for the 2 sec duration, I can reduce it if you think its too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 seconds seems ok
@@ -288,6 +289,13 @@ public void testDialCancelBeforeDialingClientAliceAfterTrying() throws ParseExce | |||
assertTrue(maxConcurrentCalls==0); | |||
assertTrue(maxConcurrentIncomingCalls==0); | |||
assertTrue(maxConcurrentOutgoingCalls==0); | |||
|
|||
Map<String, Integer> mgcpResources = MonitoringServiceTool.getInstance().getMgcpResources(metrics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be a fairly common construct, could we isolate this in a Assert util class, where access to monitoring services, and literal to specific counters are protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaimecasero I thought about that but let this task for later as it will take significant time. I did half of the work (MonitoringServiceTool.getInstance().getMgcpResources(metrics)
) but the rest of the enhancement can be done in another sprint with proper estimation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
This refer to RESTCOMM-1147
PR also includes MGCP resources metrics and updated tests to assert MGCP resources are properly cleaned up