-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dp 110 fd #199
Dp 110 fd #199
Conversation
Please rebase on the top of master |
I'd like to see a proper performance analysis of whether the thread pool improves or impedes performance in any significant way before this goes in. |
This issue is not about performance, this is about non-blocking operations, and threadpool here guarantees ordering of operations. So I'm not sure what we can measure here.. |
We could measure a node with 2 processes each sending many messages to a process in one of two other nodes. Isolate one of the target nodes and observe how fast the messages arrive at the other. This patch should make it orders of magnitude faster. |
Related to #204 |
@mboes I've implemented a benchmark in a setting suggested by @facundominguez main program that pings 2 nodes (one is hardcoded), and reply service (code below). 30s w/o patch not sure if I need to include this benchmark as it require many manual steps to run it. UPD. with -threaded I don't observe this difference. Possibly better benchmarks may needed. import System.Environment
import Control.Monad
import Control.Applicative
import Control.Distributed.Process
import Control.Distributed.Process.Node
import Control.Concurrent
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Data.Binary (encode, decode)
import qualified Data.ByteString.Lazy as BSL
import qualified Data.ByteString.Char8 as BS
import Network.Transport (closeTransport, EndPointAddress(..))
import Network.Transport.TCP
import Control.Distributed.Process
import Control.Distributed.Process.Internal.Types
num :: Int
num = 10000
main = do
[h1,h2,h3] <- getArgs
[Right t1,Right t3] <- mapM (\h -> createTransport h "0" defaultTCPParameters) [h1,h3]
[n1,n3] <- mapM (\t -> newLocalNode t initRemoteTable)
[t1,t3]
let p1 = ProcessId (NodeId $ EndPointAddress $ BS.pack "172.17.42.1:9999:0") (LocalProcessId 0 10)
print p1
addr3 <- newEmptyMVar
forkIO $ runProcess n3 $ getSelfPid >>= (liftIO . putMVar addr3) >> forever (expect >>= flip send ())
forkIO $ runProcess n1 $ do
x <- getSelfPid
forever $ usend p1 x
runProcess n1 $ do
p <- liftIO $ takeMVar addr3
x <- getSelfPid
replicateM_ num $ do
send p x
expect :: Process () import System.Environment
import Control.Monad
import Control.Applicative
import Control.Distributed.Process
import Control.Distributed.Process.Node
import Control.Concurrent
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Data.Binary (encode, decode)
import qualified Data.ByteString.Lazy as BSL
import qualified Data.ByteString.Char8 as BS
import Network.Transport (closeTransport, EndPointAddress(..))
import Network.Transport.TCP
import Control.Distributed.Process
import Control.Distributed.Process.Internal.Types
main = do
[h] <- getArgs
Right t <- createTransport h "9999" defaultTCPParameters
n <- newLocalNode t initRemoteTable
runProcess n $ do
say . show =<< getSelfPid
forever $ do
p <- expect
say $ show p
send p () |
Another update without patch send between local processes never (I couldn't wait so much) happen, import Control.Monad
import Control.Distributed.Process
import Control.Distributed.Process.Node
import Control.Concurrent
import Network.Transport.TCP (createTransport, defaultTCPParameters)
import Network.Transport (EndPointAddress(..))
import qualified Data.ByteString.Char8 as BS
num :: Int
num = 10000
main :: IO ()
main = do
Right t <- createTransport "127.0.0.1" "0" defaultTCPParameters
n <- newLocalNode t initRemoteTable
_ <- forkIO $ runProcess n $ do
forever $ do
mref <- monitorNode $ NodeId $ EndPointAddress (BS.pack "198.164.0.4:8923:0")
unmonitor mref
runProcess n $ do
self <- getSelfPid
localPid <- spawnLocal $ forever $ send self () >> (expect :: Process ())
replicateM_ num $ do
() <- expect
say "."
send localPid () |
A pool of threads is used for network operations. One thread is used per NodeId.
PTAL, I have rebased patch on the top of the current master, so it could be merged automatically. |
Except that the introduction of yet another layer of synchronized communication between threads is something you have to pay for each send, not just the problematic ones when there are network problems. The "benchmarks" you wrote earlier after @facundominguez proposed them are not, in my book, benchmarks (case in point, you don't have meaningful relative performance comparisons) - they are tests exhibiting the problem that this patch proposes one way to fix. A benchmark here would demonstrate that communication latencies are not adversely affected in a high speed network, even using say n-t-inmemory, under a variety of stressful scenarios (many processes each sending 1 message simultaneously to the same target, processes sending to multiple target nodes simultaneously, one process sending many messages etc). In any case we've discussed this patch in the past and I'm not convinced that this is the right approach. Before investing significant resources into this and going full speed ahead, can we get @dcoutts and @edsko's feedback? Guys, the problem is this: current implementations of the |
@mboes we generic microbenchmarks in |
I haven't looked at the detail yet, but I'm not very keen on introducing an unbounded buffer into the pipeline. We tried hard to avoid unbounded buffers. |
@edsko @facundominguez ok then, based on @edsko's latest feedback, do we agree that this issue is best solved at the network-transport level, and this PR should be closed? |
We agree that we should solve this at the network-transport level. |
After a second thought, I think merging in the meantime might be more useful to users of CH than keeping the code as is. |
This is a fair amount of imprecisely quantified technical debt we'd be committing to master. I say imprecisely unquantified because the benchmarks discussion above is still unresolved. Could documenting this infelicity in the design of d-p/n-t to the user instead be a good enough stopgap? There are ways for the user to mitigate (not remove yes, but mitigate...) this problem, in particular by setting aggressive timeouts in n-t-tcp in the meantime. |
Yes, without having gone into the details too much it does sound like something for the NT level API. We do not want to adversely affect the fast path of sending because of occasional expensive slow operations like establishing new heavyweight connections. |
No description provided.