Many thanks to all those who participated in the discussion in the previous post. All comments, criticisms and compliments are welcome. This is what motivates me to keep writing this blog.
The repercussion of the previous post was much bigger than I expected. The Embarcadero team corrected some of the DataSnap problems and therefore I decided to redo these tests and create this new post.
This post is a continuation of DataSnap analysis based on Speed & Stability tests, if you haven’t read the previous post, please do it before you go ahead.
Embarcadero
To be honest, I surprised myself at the reaction of Embarcadero. I really thought they would inundate me with criticism and would do nothing. I expected some of those unfounded answers that sometimes I receive. I thought nothing would happen, no fix or solution.
Actually I received some criticism during the Embarcadero Webinar presented by David I. and Marco Cantù. Someone questioned them about my tests and both spoke I wanted a better wizard with more options and that allowed me to configure everything without writing a line of code. It was a totally distorted view of my post. I only used the Wizard to avoid claims that the problem was in my implementation and not in the framework. I don’t like wizards and Embarcadero don’t need to create better or fancy ones. All I want is that the framework works and has a decent documentation.
After all they understood that it wasn’t a Wizard problem and a lazy programmer, but instead a real problem in the framework. After some discussion they realized the problem and then worked to correct.
Surely there was a pressure from the community over the Embarcadero team, claiming the quality they promise in their products. The participation of many people of the Delphi community influenced on the speed at which Embarcadero worked on this problem. I don’t think it should be that way. If I’m with a problem and I present it to Embarcadero, by e-mail or QC, they should look at it the same way. Why is it necessary for the Delphi community to press them for a fix on a bug?
The impression I have, evaluating the comments of David I. and Marco Cantù is that this response we’ve had from Embarcadero is mainly due to the commitment of Marco Cantù. Less than 2 hours after sending the email asking him to look at the blog I got feedback and since then he speaks with me daily to try to solve these problems. Thank you Marco and congratulations for the work you are doing. I really hope you can influence other employees of Embarcadero, maybe this way we’ll have a better feedback without needing to post an article on a blog.
For those who didn’t follow the comments of the previous post I will give a brief summary of what happened. A few days after posting my article, Marco Cantú wrote an article on his blog talking about optimizations that can be performed in DataSnap servers to improve performance and stability. He also conducted a webinar talking a bit about it. What happened is that I applied all the optimizations that were suggested and the problem persisted. Marco also realized this and has allocated a team to work on it and try to find the problems. After some time Embarcadero released XE3 Update 1 with a series of fixes for DataSnap. In this article I will present the tests I did with this new version and an analysis of the results.
Optimizations
I received some criticism concerning the Datasnap server I used . In general they criticized the fact I didn’t apply an optimization or because I didn’t use ISAPI, for example. Thank you for the constructive criticism, it helped me a lot.
I agree that the way the server was developed wasn’t ideal to test the performance of the framework. Maybe I haven’t noticed it because my focus was to show the stability problems of the framework, not so much the performance problem. Anyway, I applied these optimizations and redid the tests, which I will introduce below.
My main argument with respect to these critics is that the DataSnap server should at least work, even without any optimizations. It wouldn’t be as fast as it could be, but at least shouldn’t crash. I keep thinking that way. The truth is that the framework was not built to be scalable. There is no enough optimizations to make it to work in environments with a lot of concurrency.
I will make a brief description of the optimizations proposed by the community and Embarcadero.
Keep-Alive
Michael Justin noticed a mistake I made on the DataSnap server. Thank you Michael. All servers should use Keep-Alive. What I didn’t know was that the DataSnap Server had a setting for this and that was disabled by default. Thus, the DataSnap test was the only one who didn’t use Keep-Alive.
Theoretically it would be an interesting optimization, particularly for the test case presented, where many calls are made to the server. But I found a problem related to it. By enabling the Keep-Alive on server, the performance drops absurdly (5 requests / second). This occurs only with the testing server and client on different machines. On a local test it works perfectly. I have informed this problem to Marco Cantù and he would further evaluate it.
With that, I was forced to disable Keep-Alive, so all tests with DataSnap are without the Keep-Alive option. It would be unfair to turn this feature off on other servers because they have the capability and it works perfectly. I also don’t have enough time to redo the tests of all servers.
Memory consumption
One of the issues that I observed was related to memory consumption. Why Datasnap server consumes so much memory if the method called does absolutely nothing?
Maybe I don’t know how to explain exactly but i will try. Basically the DataSnap creates a session for each HTTP connection that it receives. This session will be destroyed after 20 minutes, in other words, on the first 20 minutes of the test the memory consumption will only go up, after that it has the tendency of stabilize itself. I really have no idea why Datasnap does this. In a REST application I don’t see much sense in these sessions with a default configuration. Of course, sessions can be helpful, but i can’t understand why it’s a default configuration. Indeed, DataSnap doesn’t have a configuration for that. It appears like you just have to use this session control, without being able to choose otherwise (There is no documentation). The MORMot framework has a session control too but it’s configurable and doesn’t consumes so much memory.
Anyway, there is a way around this problem. Daniele Teti wrote an article on his blog, take a look. The solution that I will show here was placed by him on his blog. Thanks Daniele.
uses System.StrUtils, DataSnap.DSSession, Data.DBXPlatform;
function TServerMethods1.HelloWorld: String;
begin
Result := 'Hello World';
GetInvocationMetaData.CloseSession := True;
end;
After running this method the session will close and memory consumption will be lower. Of course still exists an overhead for creating and destroying this session.
Thread Pool
Ondrej Kelle alerted me to use a thread pool. Thank you Ondrej. I really didn’t know that.
All servers were using thread pool except Datasnap. Marco Cantù put more information on his blog about how to implement this thread pool on DataSnap. Implementing a thread pool should avoid all overhead on creating and destroying Indy threads I talked about on the first post.
ISAPI
I received severe criticism for doing a test without using ISAPI. I was worried when I read these critics, especially coming from renowned and experienced people.
I don’t want to justify, instead I want to explain why I haven’t used ISAPI. As you know, I don’t have much experience with servers, I’m starting to work with it now. Practically every opinion I have comes from information I extracted from the internet, books, etc. I’ve never worked effectively with ISAPI. I’ve already configured IIS a few times and did some tests but I never worked with anything in production using ISAPI. In almost every lecture I attended on DataSnap (This includes Delphi Conference 2011 and 2012 and Webinars) was placed as an advantage of DataSnap that it doesn’t need to use IIS to run the server. The “experts” reported problems, difficulties, etc.. For me it was clear (perhaps wrongly) that not using IIS was a great advantage. Because of this I had not cogitated the possibility of using ISAPI. Despite this, when the server started to fail I created an ISAPI server to test and the problem also occurred.
To add a bit more information to my study I included a ISAPI server on the tests that I will present in this post.
The new tests
The tests were conducted in the same environment of previous tests using the same methodology. Three additional servers were added as well.
- A server using the same sources of the first version, but compiled using XE3 Update1. No optimizations.
- A VCL server rather than Console (Suggestion from Marco Cantù) with all improvements suggested by the community and Embarcadero.
- A ISAPI server
I sent VCL server codes for Marco before taking the tests. Unfortunately he didn’t have enough time to evaluate carefully but he told me it seemed like a good implementation.
The servers were added to the folder on github.
NOTE: I didn’t find a way to measure the ISAPI server memory consumption, I believe that it couldn’t be done since the operation is quite different. Therefore it does not appear in the graphics memory consumption. If someone with more experience have any ideas, feel free to comment.
Pierre Yager suggested I perform tests in Delphi on Rails framework. I did some quick tests and the framework looks promising. Certainly has more performance than the DataSnap. Pierre, unfortunately I didn’t have time to perform all tests with this framework. Anyway I think that is a valid alternative and who is looking for options should certainly evaluate this framework as well. Thank you.
Tests without concurrency
The methodology hasn’t changed. In these tests we used only one thread making calls to the server, without any concurrency.
In this test, regarding the performance, showed no significant change. The update provided by Embarcadero in nothing has changed in this aspect (and this was not the purpose of the update). The optimizations also had no effect.
Maybe the optimization that could help in this test is the Keep-Alive that malfunctioned, as explained above.
Let’s take a look at the memory consumption.
We had changes in memory consumption. Although the graph shows a slightly lower consumption between versions XE3 and XE3 Update 1 I do not believe that Embarcadero has made any changes in this regard. Especially because we don’t see the same behavior on the test with 1 million requests. The memory consumption of DataSnap servers are simply unpredictable and unstable.
The optimizations have a big effect here. The code proposed by Daniele Teti was effective and memory consumption dropped significantly.
Tests with concurrency
In these tests do we not have data for DataSnap server (Console) compiled with XE3 (pre-update1) because it didn’t run these tests as explained in the first post.
The performance of all DataSnap servers in the tests with concurrency was mediocre. Thanks to XE3 Update 1 at least the server did not crash. You may notice that the server without optimization appears to have superior performance to others, but we have to evaluate this carefully, because a new variable needs to be exposed. Something I had not checked in any of the other tests appeared now. It comes to the amount of requests rejected by the server, or simply unanswered.
The chart above shows the error rates in the test with a hundred threads (Unfortunately I do not have the rates of the test with 50 threads because I did not realize that during these tests. Rates of errors with 50 threads should be smaller).
All DataSnap servers showed very high error rates, ranging from 61% to 97%. Among the servers that had errors (Only DataSnap), the one with the lowest error rate was ISAPI with 61.48%.
The memory consumption of the DataSnap server without optimizations is quite high. With the optimizations it is much smaller. Still showed a significant and disturbing consumption in the second test. I do not know where this consumption comes as sessions are being destroyed. I believe it is memory leaks.
Strange behavior
I could not monitor all the tests in real time simply because the tests take too long. Some tests took more than 10 hours and were running overnight. I took a week to complete all these tests because of this slowness.
In some moments I monitored the tests and incidentally I noticed a very strange behavior. In this case, the VCL server compiled with the XE3 Update 1 with optimizations running test in 100 threads.
In these pictures you can see the memory consumption varies a lot and a extremely high peak in I/O . The I/O was stable at 22.3KB and suddenly jumps to 464.5 KB, after a while it stabilizes again. At the same time the server was running with 1153 threads. An absurd amount. I got to catch more than 1450 threads running during this test. Where does it come from?
I have no theory to explain this behavior. Theoretically the overhead of Indy threads should no longer exist, since I am using a thread pool. I leave it in the hands of experts.
Another very strange behavior I noticed in the test with only one thread. I spent a long time researching but have not found the reason. Some softwares interfere with DataSnap server speed . I identified two, Google Chrome and EyeBeam (VOIP). The strange thing is that they cause the server to become much faster. The difference amounts to more than four times in some cases. In other words, the DataSnap server is four times faster with the EyeBeam running.
With Google Chrome the difference varies. As you use the browser DataSnap suffers a change in performance, but if you leave the browser idle it has no effect.
Obviously all these tests were done without interference. I think it would be interesting that Embarcadero investigate this. Probably they can find a way to optimize the DataSnap server without major changes. After all, if a third party software can do it, should not be so difficult to implement internally.
I have not noticed this interference in other servers.
Conclusion
With the adjustment made by Embarcadero in XE3 Update 1 I could finish the tests and introduce some new elements to the study. Now we can see more clearly how far we can go with DataSnap. For me it became even clearer that the DataSnap is an option only for small projects or small requirements. Even using ISAPI I did not get satisfactory results in this environment.
Marco Cantù made it clear to me that the Update 1 includes some fixes for the problems and that it was not any major restructuring. He also made it clear that after this update the server is more stable and able to work with a slightly larger amount of connections, but we can not expect it to work in environments with a large concurrency, which is my case and the test case presented here. Marco said that to improve these problems would require a complete redesign of the framework and they are thinking about this possibility. I hope Embarcadero actually invest a little more in DataSnap. Maybe it becomes the product they are selling.
My final critic is for Embarcadero policy updates that is the target of much criticism around the world. All users of the previous versions of DataSnap XE3 just have to live with these problems and it has been like this for years. Embarcadero simply forces companies to buy a new product to gain access for a bug correction. As an analogy, you buy a car and it comes with an engine problem, instead of the manufacturer giving you maintenance, replace the engine or something, they correct the problem in a new model and forces you to buy the new one. I am not a lawyer and do not know much but I think this behavior violates the code of consumer protection in my country. At the company where I work, all purchases of Embarcadero products are suspended (due to the huge amount of issues with the Brazilian support), so I will not have access to this update. We bought the DataSnap XE2 to use DataSnap, it had some problems and now we have to buy the XE3 to make it work. That simply will not happen. I think it’s a totally unacceptable. What do you think of the update policy of Embarcadero?
I still can not add my blog to DelphiFeeds list, so if you can help disseminate this, I thank you. I added a suggestion on feedback page of DelphiFeeds , but it seems I to need to get too many votes. If you wish you can help me.
Liked the post? Leave a comment!
No wonder that people buy the Professional version. With the money you save instead buying Enterprise or Architect you can buy much more professional third party components. The issue here is that we all would expect a descent product from Embarcadero and this is not the case. I also will not upgrade to XE3 because I don’t see a big improvement compared with XE2.
It is obvious that it is impossible to expect quality from Embarcadero anymore. All these Enterprise versions, Firemonkey, etc. are a joke. I cannot understand who could ever want to buy them when there are much better and even free alternatives? I think Embarcadero should make serious decisions and raise the quality of it’s products before it’s too late…
I would love to see DataAbstract here…
Thank you Leus.
I’ve already explained why I did not use the DataAbstract in first post comments. Not support REST.
It’s hard to believe that any solution would perform worse than DataSnap, but I do not believe the DataAbstract be as fast as mORMot.
“With Google Chrome the difference varies. As you use the browser DataSnap suffers a change in performance, but if you leave the browser idle it has no effect.”
This sounds suspiciously as if DataSnap TCP sockets don’t have TCP_NODELAY options set [http://msdn.microsoft.com/en-us/library/windows/desktop/ms740476(v=vs.85).aspx]. Nagle’s algorithm [http://en.wikipedia.org/wiki/Nagle’s_algorithm] would indeed slow communication down a lot.
Somebody with better knowledge about the DataSnap TCP mechanism would have to confirm or deny that.
An easy way to test it would be to call setsockopt() on a DataSnap socket but I don’t know how to access this socket from the code. (I really don’t know anything about DataSnap. I just know how TCP/IP works.)
Thank you for the information. Very interesting.
I also have no knowledge enough to verify that. But probably someone will come up with that answer.
Maybe Marco’s team can verify/deny that?
This has been going like this since Delphi 5: all the core value was in the compiler, IDE and base libraries. For all advanced uses, you had to go for third party, and the features in versions above Pro were never up to scratch.
The issue you mention of ‘no maintenance’ is also just as old, the only maintenance and support you’ll get with Delphi will be from third parties.
Database support has also historically been quite problematic, we jumped ship to third party drivers with D7, following major issues with the bundled Oracle and InterBase drivers.
That said, Delphi Pro + Third Parties is a very capable solution, and can duke it out with the best of them (as the mORMot results in your tests show), it’s just very sad that neither Borland nor CodeGear nor Embarcadero ever really came to leverage, or maybe even understand, their own product and realize its full potential.
I totally agree on EMBT update policy and business model for Delphi/RAD. Each year new version with some “bombastic” stuff ( sounding and looking good impressive keywords for new features ), mass tours around the world ( with evangelists, sorry folks ) in few months. Behind – all basic functionality works and most of promised features NOT. After all – one or 2 updates, and… new version is coooming ! . The same circus again. Not mention higher prices outside USA, weak QC, etc, etc…
Thank you for the tests, are very valuable to decide on what technology build next projects.
I think for once Emb should bite the bullet and release an updated Datasnap for some earlier releases as well – simply to reassure customers and keep them tied to Delphi. It may be pretty stupid to have some of them buy XE3 while maybe meanwhile losing many others that could have bought XE4 and so on. Realistically, you can no longer expect everybody buys each version now, especially if Emb keep on releasing once a year and with issues like this one. Their horizon should be farther than the next release, and keeping paying customers even if they don’t buy each release but still buy say every two or three will anyway pay in the long term. Getting some more cash in the short term while losing more and more customers doesn’t really.
Thank you for test
Mr. Roberto, this is an outstanding post.
“My disagreement comes from the fact that having a rather sophisticated architecture for processing complex data types, managing sessions, mapping methods, and a lot more will never compete on speed with simpler” architectures …but you have to compare solutions offering similar features.” (Marco Cantu)
I partially agree with Marco Cantu with your testing scenario.
My conclusion is: It’s just unfair to say – poor DataSnap after all;
mORMot was not built on some DB layers like DataSnap dbx;
mORMot uses kernel.sys, which uses ICOP, and it’s optimized from speed from the ground;
Another issue is that DataSnap lacks of performance is probably due to non-existent garbage collection; besides Datasnap has big security flaws;
mORMot uses a simpler architecture and used by some users only for its speed, due to its modular design, it’s more effective than DS?
After your post, I have the feeling that Embarcadero will show more respect to mORMot.
– DataSnap can certainly learn from mORMot effective libraries;
– good to know that guys at embarcadero are pushing DataSnap for REST communications in Linux.
– DataSnap is fine.
Greetings
warleyalex from Brazil
Thanks for the comment warleyalex.
It’s a long discussion. Is worth noting that from the beginning I said that my goal is not to compare the frameworks. Exactly because it would force me to compare features and many other things. That would take months.
My goal with these tests was to verify that the datasnap would support the load that I need it to support, nothing more.
Yes, the DataSnap may have features that mORMot not have, but that is not the point. The mORMot is an amazing framework with an absurd amount of functionality. Embarcadero probably will never be able to develop something similar to mORMot. I am studying the mORMot a few weeks and am amazed at what he is capable of.
Anyway, I never said that a framework is better than another. I never said “poor DataSnap after all”. Better is something very relative. Best for me might not be best for you, is an endless discussion. What I said is that the DataSnap does not work in environments with high concurrency. And in this case no matter how many functionalities he has. It simply does not work.
Thank you.
In our company we adopted the DataSnap a year ago, and when we put the product into the hands of customers, we had many problems, then later decided to study part 1 mormot, nobody in Brazil social networking community know what is or was interested I’m thinking of creating another layer of abstraction in software so maybe in the future may change mormot to datsnap, as are studies please so I can put a post on a basic implementation of vcl server with requests and rest json response for transporting objects typed and / or images in which I can help I am available
Thank you for your efforts. I hope these tests lead to significant improvements.
Reblogged this on Habari! Blog.
thank you Michael!
Your test results match mine. I have checked the results this afternoon. On TCP/IP no delay. It can be set via the TIdSocketHandle.SetSockOpt in the FServers OnBeforBind event. AHandle.SetSocketOpt(id_IPPROTO_TCP, id_TCP_NoDelay, 1) for example.
1 – leads to a constant flow and a constant Throughtput of about requests 230 / second.
0 – leads to a behavoir – number of requests handled go up to about 1800 /second but short before reaching 1750 to 1800 (maybe little earlier) the number of requests starts to drop.
The errors simply do come from sending to ‘many’ requests from jmeter via to many threads via one network card. You can battle this behaviour via using more than 4 Threads, I tried 8 to 16 but a lot more does not make sense, because the number of requests starts dropping before.
I have set the ChannelQueueSize to 0. It did not have a huge impact.
Some profiling showed that the server did not do a lot at all. I used Eric’s sampling profiler, Monte Carlo, ignoring system calls, application idle times … There is little memory acquisition and freeing memory but most of the time the server does almost nothing and waits I think. The most wide widely use parctice at the moment seems to be having about 2 or 3 threads per processor (very likely one per core maybe or virtual core) and using IOCP. This seems to be the better solution on windows. Linux kernel is different of course.
I think your results are correct. I simply tried the VCL Server. In practice the other solutions are faster indeed. Very much the same result on a slow AMD quadcore silent office PC and a Qual core Intel. Both on Win7 workstation no virutalization.
Setting TCP_NODELAY to 0 leads to some better results, but also disable Nable algorithm. So in the real world (which is the purpose of this test), it is something to be avoided.
Such results are disappointing.
Thanks for trying the TCP_NODELAY idea.
Not true. Nagle’s algorithm was designed purely for interactive TCP usage – when a remote terminal is connected to the server and user is typing on a command line and each command line change triggers a TCP packet. When you are doing query/response synchronously (when you don’t do any processing until the response arrives), Nagle’s algorithm will only hurt you.
0 – leads to a constant flow and a constant Throughtput of about requests 230 / second.
1 – leads to a behavoir – number of requests handled go up to about 1800 /second but short before reaching 1750 to 1800 (maybe little earlier) the number of requests starts to drop.
Indeed you are right. Sorry. It was late at night. It’s the other way around. It is possible that the 230 still increase and I did not test the impact of the queue size assuming a stable response time and req/sec. I only gave things a try a few minutes 10 to 20 in order to separate the socket errors (to few threads in the pool) from the phenomenon boost first (up to 1700 and then steadily decease) which is imo the more evil thing when we talk about higher volume rest applications.
When I compiled the example to an ISAPI.dll and run it on my dual core laptop on an IIS 7.5 (Windows workstation) the requests/sec go up to 7000 and more constantly over a period of a few minutes to half an hour. This is ok in general.
Honestly mORMot is what I still investigate, works good so far…
ISAPI is ISAPI … cumbersome. I love Error 500 and restart. This is good for nothing. I don’t want to bound to an OS and a lot more evil to a specific webserver on one OS. If something does not work with the Abyss which is a very pure internet oriented easy to configure lightweight server my enthusiasm is limited. http is not the only option …
Excellent! Same change should be done somewhere in the test client.
Response –
http://delphihaters.blogspot.com/2013/01/your-money-or-your-job-part-3.html
How much would it cost to upgrade Delphi XE2 to Delphi XE3 to get this problem fixed?
Maybe this problem will get fixed in Delphi XE4 or maybe Delphi XE5, which may be unacceptable.
After some time, you need to inter-op with website and REST, SOAP services.
Would Delphi be suitable for website?
Better do the whole thing in ASP.NET or Java. At least you can make both website and SOAP/REST services with BSON, JSON and XML interop and share codes.
Meh.
Are you sure?
It was its first intent (e.g; for gaming), but as a side effect, it is also sometimes a good candidate to make the IP layer more stable, and sometimes does not work well with routers.
Nable’s algorithm is a first candidate to let your server work as expected when the routers implement some DOS prevention algorithms, e.g. rejecting such small packets, which may be made in order to create a server congestion. I’ve seen it in corporate networks.
Disabling Nable’s algorithm is not a good idea, and even with it disabled, DataSnap is still unstable… and slower than the other!
Disabling Nable’s may increase “Hello World” speed on local networks, but will propably slow down the process on slow networks (like Internet), and is not recommended with blob or bigger content transfer.
Just the kind of tweaking which may help for this exact test, but won’t help in the wild.
Disabling Nagle is okay if you send your data in blocks anyway, this guarantees the blocks leave immediately rather than wait for a timer.
What is problematic is when the socket is treated as a stream and written to without buffering, thus sending out potentially only a few bytes at a time.
I haven’t checked if DataSnap does buffering or not, and the devil could be in the details, f.i. if they write a small header then a single block for the data, instead of writing header+data as a single block.
Exactly.
My words, also.
This is why the RealThinClient SDK disables the Nagle algorithm and uses internal buffers to send out as much data as possible with a single API call.
The event-based model of the RTC SDK even allows developers to make numerous “Write” calls in a single event, without a performance penalty, because all of these calls will end up in the internal buffers. When user event completes its part of the job, RTC will take over, injecting a valid HTTP header at the front of the buffer and then sending data to the sockets API in predefined chunk sizes to maximize throughoutput.
All the other code optimizations in the RTC SDK would be pointless if Nagle algorithm was used, or if the packet sending algorithm wasn’t optimized. While this might not show over localhost or in very fast Networks with low latency, it plays an important role over the Internet, WiFi and mobile networks.
I’d love to see a comparison with the RealThinClient SDK there. I’d be happy to grant the tester free access to the full RealThinClient SDK release and help with any questions about component usage. My direct contact details are available at http://www.realthinclient.com/contact.htm
I’d also like to see RTC SDK included in these test results!
I’d love to do these tests. But I no longer have the environment set up. And also do not have time for this. All tests were done at the company where I work aiming at the objectives of a project. The time I had for these tests ended.
Unfortunately I am unable to do that right now. Thanks for your interest.
Sorry
Roberto, cara assim que vc tiver algum teste de uso real do mormot, publica no blog, ou comunidade do facebook, g+ agradeco, vou esperar a emb não
roberiopraci, eu estou pensando em publicar alguns posts sobre o mORMot. Estou estudando ele a 2 semanas. É fantástico, mas eu ainda conheço muito pouco, vou precisar de mais tempo.
Da uma passada la no Fórum, tem muita informação.
Please you do this test with the RealThinClient. We’d be so haapy. Clients wants more comparision always.
It’s really true that Embarcadero enforce us to update our products without giving a support the previous versions. I work with delphi for 10 years and never had a good support. I really hope that they improve their services.
Hello,
Did you test the
GetInvocationMetaData.CloseSession := True;
Alone, because without knowing the internals it could (Or I got that kind of gut feeling…) slow the server down, because needs to start new sesseion on every call (But also use the memory)
Is there memory/speed trade off with that Optimization???
Just wondering…
Thanks for the question.
Is a good question. I did some testing, but I can not do the stress test using the sessions properly with JMeter (I’m not saying it’s not possible, is a little tricky). So I do not know exactly. I believe there may be some performance gain because the overhead is lower, but I think the difference is small.
Did some tests too, on a quad core 8gb Win7 machine, but I get some different results than you:
(took less samples, otherwise had to wait too long 🙂 )
50 threads, 50000 samples
Datasnap (poolsize 150) 16.6mb 991/s
Datasnap (poolsize 150) SMM2 167mb 2742/s
Mormot 6mb 3600/s
Mormot SMM2 36mb 4248/s
Node (32bit) 34mb 12010/s
IndyHttp 7mb 2786/s
IndyHttp SSM2 30mb 5276/s
IndyHttp (100threadpool) SSM2 55mb 11994/s
1 Thread, 5000 samples
Node 680/s
Mormot 617/s
Datasnap VCL 617/s
IndyHttp 624/s
Some remarks about test:
– recompiled mormot with latest sources with XE3 trial, exe from github does not accept connections?
– had to run mormot as administrator?
– SMM2 = ScaleMM2 memory manager, to illuminate threading issues (memory manger per thread so no global locks and sleep() like FastMM4 does!)
http://code.google.com/p/scalemm/downloads/detail?name=ScaleMM_v215.zip
– IndyHttp is compiled in D2010, with TIdHTTPServer and “AResponseInfo.ContentText := ‘Hello World!’;” in IdHTTPServer1CommandGet, so almost the same as Node.js
Conclusion:
– when using indy + threadpool + scalemm2, the same high throughput as node can be achived! (12000/s).
Some remarks about results:
– single thread results are same? about 620 or 680/s?
– mormot is much lower than node, where you have the same high results?
(I will check my mormot compile this evening, maybe debug build?)
Thanks for the comment.
The testing environment seems to be very different.
Are you doing these tests with two different machines?
What software are you using as a client? JMeter?
Are 50,000 requests per thread? 2,500,000 total requests?
What was the error rate of these tests in datasnap servers?
mORMot had some break changes in recent weeks. Some units have been renamed, so maybe not compile. Look http://synopse.info/forum/viewtopic.php?id=950
The executable is ok on github. try http://localhost:777/service/helloworld
In tests with a single thread I noticed a bottleneck on the client machine, as I explained in the first post, so the results were the same.
I have not done any testing using SMM2.
You can send me the executable (or sources)? I no longer have the environment set up, but I would like to evaluate some things.
Not 2 different machines but single and using just localhost
I used JMeter 2.8 (latest) and Java7 (client, latest)
1000 request per thread, total (in results) will become then 50 * 1000 = 50000
error in datasnap was 0% (both smm2 and fastmm/normal)
Will retry mormot:
hmm, I had checked “display errors only”, after unchecking that it works ok and I get 6mb, 13372/s, so more than node!
If you can compile a new version + scalemm2 it will even get higher!
I have my plain indy http test uploaded here:
https://asmprofiler.googlecode.com/svn/trunk/-Other-/JmeterPlainIndyHttpTest/JmeterPlainIndyHttpTest.zip
My datasnap build (xe3, with and without scalemm2):
https://asmprofiler.googlecode.com/svn/trunk/-Other-/JmeterPlainIndyHttpTest/datasnap-vcl.zip
Note: the combination datasnap+scalemm2 does not work very well: first it tops very high but then get slower and slower (need to work on that if I have time…)
I noticed that the DataSnap becomes increasingly slow. The greater the number of requests, the slower it gets. Probably this is what is causing so much difference between our tests. In my tests I have done a lot more requests to the server.
I do not think it’s a problem with the SMM2, also occurs with FastMM.
NOTE: When performing the test with JMeter, you can not focus on the summary report, because the update of this screen affect the tests.
You used KeepAlive in DataSnap?
The KeepAlive works in a local test. But it does not work in the environment that I have used.
No need to test with scalemm2 – the executable is not removed from processes in XE3. Just an observeration.
I have tested with 4 computers today. Thanks to Andree I had a good reference. The results fit so far. The pure TidHttp server behaves similar and ships good results. The Datasnap server’s degrades anyway, falls back to what you observed …
Concerning Datasnap – I am sure if you have a server that is a lot stronger than the client machines the situation can change, but there is something in Datasnap that slows down the whole environment. On the other hand, why is the datasnap a lot faster when hosted in IIS.
The only thing you have to be careful with is – the Java ‘products’ perform very well in the initial special case, when the work behind the request increases most of the alternatives become ‘somehow’ equal in traditional scenarios with DB access. This why RO and Realthinclient are interesting alternative. In general, if you are not building an Corporate application rely on open stuff – maybe mORMot, maybe Java or WCF – they simply come on board or almost on board.
Did some test with RemObjects too, using Http server, JSON message and then adding an unamed param with value “{“version”:”1.1″,”method”:”NewService.HelloWorld”,”params”:{}}” in JMeter (see .jmx in zip below).
Results:
1 thread
RO, json, indy http 4mb 619/s
50 threads, 50000samples
RO,json,indyhttp,ropool=50 9mb 1815/s
RO,json,indyhttp,ropool=50,indythreadpool=50 1810/s
RO,json,indyhttp,ropool=50,indythreadpool=50,smm2 55mb 7376/s
RO,json,RODX,ropool=50,threadcachesize=50,smm2 59mb 10166/s
RO,json,synapse,ropool=50, ROThreadPool=50,smm2 50mb 10248/s
Conclusion:
– ScaleMM2 let it perform much better than default FastMM4
– RODX and Synapse perform much better than Indy (getting close to node)
Overal conclusion for Embarcadero for Datasnap 🙂
– use a multithreaded memory manager (ScaleMM2, TopMM, Google Chrome TCMalloc, etc), see http://scalemm.googlecode.com/svn/trunk/Challenge/Results/MMBench_all.htm
– also support RODX or Synapse, not only Indy
Download RO test:
https://asmprofiler.googlecode.com/svn/trunk/-Other-/JmeterROHttpJsonTest/JmeterROHttpJsonTest.zip
By the way: another conclusion: RemObjects SDK is much faster than Datasnap when you look at both results (especially with a scaling memory manager):
Datasnap (poolsize 150) => 991/s
Datasnap (poolsize 150) SMM2 => 2742/s
—–
RO,json,indyhttp,ropool=50,indypool=50 => 1810/s
RO,json,indyhttp,ropool=50,indypool=50,smm2 => 7376/s
When you use RO+Synapse+SMM2 it gets even faster:
RO,json,synapse,ropool=50,ROThreadPool=50,smm2 => 10248/s
Definitive proof that Indy is slow for comms, and that Emb’s decision to use it and a terrible memory manager for their products is folly indeed! Do you think that they’ll even entertain any of these wonderful suggestions, Amussche?
Hi Roberto. The DataSnap REST/WebBroker Application Wizard also has option for Server Module: a separate server module is created for hosting the DSServer and the DSServerClass components. Especially important for web server extensions I think. Otherwise DSServer component is placed on web module, and web module is created by web server for each new web request. And this is not what you want I suppose.
Interesting.
As far as I know the only setting in this direction is the Server Methods Lifecycle. How is a REST service, it will work as invocation which will cause it to create objects for each request. I suppose you talking about other objects. I really did not know that.
http://edn.embarcadero.com/article/41289
Do you have any link to documentation that explains this?
You can choose this option on the 3rd step in the wizard – ‘Server Properties’ I think – or something similar. But it did not help in case of the results in case of ‘self hosting’. It slows down the server little in this case about 10% maybe 20% if not using keep alive. Maybe with IIS it’s different.
I’ve created a simple “Hello, World” Server using the RTC SDK v6.05, which does the same as the mORMot Server used in the above test. Precompiled Server (using Delphi 2010 with default compiler settings and the default Memory Manager) and a simple Stress-Test Client (using RTC SDK) can be downloaded from http://www.realthinclient.com/free/SimpleRTCTest.zip – Source code is also available, but only on request (send me an E-Mail), because you need the latest commercial RTC SDk release to compile it (free Starter edition won’t work for this case, because it is limited to max 10 connections at a time).
Thanks Danijel for stress test. Please share source code.
The Server is simple to implement with the RTC SDK. You could write one yourself just by following the Server Quick Start Lesson from http://www.realthinclient.com/flash/SrvLesson1.htm – You would only have to change String being tested for with the Request.FileName property in the OnCheckRequest event and write the appropriate Result using the Write or WriteEx method in the OnDataReceied event. But, you will need the full commercial RealThinClient SDK release to compile a Server capable of handling the load, because the free Starter edition is limited to max 10 connections. Anyway …. if anyone wants the source code of the complete Test Project as posted above, just send me an E-Mail.
Well … I don’t want someone to think that I’m trying to hide something here, so I’ve uploaded the complete Server Project source code and two new precompiled Server executables (one with the default MM and one with ScaleMM2), here: http://www.realthinclient.com/free/SimpleRTCTestServer2.zip – As said, you will need the latest full commercial RealThinClient SDK release to compile the Project. If you compile the Project with the Starter edition, it will not work because of a 10 connection limit. And if you try to compile it with an older RTC SDK version, it won’t compile because JSON support and byte array methods like WriteEx have been added in RTC SDK v6.x
Some quick test results with RTC:
FastMM:
RTC, nonblocking, notmultithreaded = 10829/s
RTC, nonblocking, multithreaded = 11353/s
RTC, blocking, notmultithreaded = 9744/s
RTC, blocking, multithreaded = 12251/s
SMM2:
RTC, nonblocking, notmultithreaded = 10636/s
RTC, nonblocking, multithreaded = 12297/s
RTC, blocking, notmultithreaded = 10418/s
RTC, blocking, multithreaded = 12379/s
So RTC performs very good! It’s between mORMot (13347/s) and RO synapse (10553/s) and WCF (9934/s).
It is also not MM intensive (which is a good sign) because ScaleMM2 version is only a slightly faster (but SMM2 will be good for user code implementations, like strings, objects etc).
Note: hope to blog about my latest test with some charts etc
I would like to emphasize here that “localhost” tests are NOT a good measure of performance of Network communication components. A relatively high performance in single-threaded tests with the RTC SDK compared to multi-threaded results also suggests that the Client used to test the Server is using a lot of the CPU. If this is a CPU with 8 cores, but multi-threaded test results are only 20% higher than single-threaded test results, then either the Kernel is here the main “showstopper”, or the Server process is never using more than 20% of total CPU power available. Other than this, optimizatins aiming at minimizing packet numbers and latency are not visible in a “localhost” test. Loss of a percentage percentage over localhost by using two API calls instead of one could result in up to a 50% loss over a crowded Network with high latency.
It is even possible for an implementation which seems to perform better than the rest over “localhost”, to give the worst results over the Network, simply because other things become more important than raw CPU performance. “Localhost” tests might be a good indication of raw code optimzation, but it doesn’t tell us much about Network communication optimizations (like optimal packet splitting).
And then, there is the question of memory usage. A good framework will keep its Memory requirements as low as possible, without compromising performance and stability.
So … while “localhost” tests might be interesting and relatively easy to perform, I would caution to look at any “localhost” results of Network Applications with a “grain” of salt, simply because there are factors in Network communication which require an actual Network to be tested.
As an example, I’ve had the first RealThinClient “Core” imlemented within a month and it performed extremely well over “localhost”, but then I’ve started testing inside an actual Network and found enough problems to keep me busy for the next six months before I had all the issues ironed out.
As Daniel write, localhost tests are not meaningful.
Localhost benchmarks can at least test the internal data process, but do not reflect neither the stability nor speed of a HTTP server running as real remote server.
So the Roberto’s tests including two computers, or even a small computer family (farm? – as shown on RTC web pages), does make better sense than your “localhost” test.
But for real benchmark, you would rather trust some real project, from real customers, serving its content over the Internet.
With mORMot, I suspect you can get even better results on localhost with out socket-based server instead of http.sys, (setting DontUseHttpApiServer=true) since it uses IOCP and is very light. It is around 1.5-2.5 faster on localhost! But we have found out that http.sys is much more stable and fast when serving its data over the Internet or a local network.
My test (SimpleRTCTestServer2.zip):
– c2d 3,5GHz localhost 13k req /sec
– c2d 3,5GHz client i7 3.5GHz server 1Gbps 26k req/sec
– c2d 3.5GHz client c2d 2.1GHz server 1Gbps 10k req /sec
Michal
Correction, it was a test SimpleRTCTest.zip.
Michal
Could you run the same tests with mORMot, so there is at least one comparison between RTC and mORMot running over a Network?
I have extended the RTC Server Test to respond to the same request and produce the same response as the mORMot Server. Both Servers, compiled with Delphi 2010 using the default memory manager can be downloaded here:
http://www.realthinclient.com/free/ServerTesting123.zip
I mean in DataSnap REST Application Wizard at third step at Server Features page, there you have a checkbox for selecting a Server Module. This option was introduced in Delphi XE2 and XE3 I believe. Documentation says: ‘The Server Module check box enables you to create a separate module for DataSnap server components.’
http://docwiki.embarcadero.com/RADStudio/XE3/en/DataSnap_REST_Application_Wizard#Server_Features_page
Marco Cantù has written white-paper: ‘Development and Deployment of Delphi Multi-tier Applications’. The Server Module option is described at bottom of page 16, and at top of page 18.
http://www.embarcadero.com/rad-in-action/development-and-deployment-of-delphi-multi-tier-applications
Also on page 4 he says: ‘Notice, however, that in case of web applications you need to create a specific data module hosting a DSServer component (something you can achieve by picking one of the wizard’s options) or else you have one server for each web module, which in turn is created when there is a new HTTP request.’
For scalability it is very important to have a separate module for DataSnap server components. However, I noticed in test source code the DSServer component is located on web module. I am very interested in a test where DataSnap server components are on separate server module. And if Marco recommends it, we should do it I guess!
I no longer have the test environment here. But I did some basic tests (local). Apparently it really makes a difference, but I can not say exactly how much.
From what I noticed the difference is not too big, I think around 20% at most. But it’s hard to say without having the numbers.
thank you for the contribution
I suppose you have debug logging enabled when running your mORMot tests. A lot of text is written in the log, so I guess this slow down the process.
You need to register the URL to the internal http.sys list, if you want to run mORMot server without administrator rights. It is clearly stated in the doc, and common to all http.sys servers – including WCF – for security reasons. Once registered (see e.g. ServiceTestSQL3.dpr sample project), you can run it with normal user rights.
And, as stated here, localhost tests are meaningless.
I just retried: mORMot build with XE3 gives 100% error rate, the exact same source build with D2010 just flies… Some kind of regression bug in XE3 icm http.sys?
Because my “plain indy” test has the same performance in XE3 and D2010.
For those who don’t have more than one PC to test with, but have an internet router, you can still run real Network tests by configuring your internet router to forward incoming connections on the Port where the Test Server is running on your PC, then use your Internet IP Address in the Client. This will result in each packet to make a round-trip to your Internet router and back. While this isn’t the same as a real Netowrk test with multiple PCs, it will give you a lot more realistic results than a test over “localhost”.
Forgot to mention … the precompiled mORMot Server in the URL above uses Port 81 and not Port 777, as in the original mORMot Project.
I’ve made a few test of my own with the mORMot Server.
Even though it does perform well in this specific test scepario, because the default mORMot behavior is to create a new Thread for every new connection and destroy the Thread after a disconnect, the mORMot Server doesn’t scale well.
For example, the mORMot Server (as used in the Test case) will stop accepting new connections once the maximum number of Threads on a Windows System is reached. In my tests, this was around 1.500 connections. This might not be a problem inside a small to mid-sized LAN, but I wouldn’t use it on a Web Server.
And second, because it takes time to start up and tear down a thread, the performance of the mORMot Server will dramatically drop when clients are opening and closing connections.
Bottom line is that the mORMot Server, as used in this test, performs well only because it was optimized for this specific test scenario. Commercial Servers, on the other hand, need to be optimized to handle high load and seamlessly scale up and down with the load, without a big performance penalty.
@David
I do not know what you are talking about.
I doubt you made any test, and I suspect you did not even look at mORMot source code.
What you wrote here is pure misinformation.
Your comment is unfair, offensive, and misplaced, especially in your conclusion that “Commercial Servers” only may scale.
mORMot has two HTTP servers available, defined in SynCrtSock.pas unit, and linked to the mORMot RESTful core in mORMotHttpServer.pas unit:
1. THttpServer class which is a WinSock-based IOCP controlled server, using a thread pool;
2. THttpApiServer class which uses fast http.sys kernel-mode server, part of Windows OS since XP.
THttpServer is fast and standard, but is not fully tested in production.
THttpApiServer is http.sys based, is used in production by several customers, and gives amazing results, due to how http.sys is implemented by Microsoft (it is the ISS core, in fact).
BOTH server classes use a thread pool and IOCP (just like RTC), so it is completely WRONG to state that “default mORMot behavior is to create a new Thread for every new connection”.
If you are unable or do not have the time to read the source code, you may read the pdf documentation – see “9.3. Network and Internet access via HTTP” paragraph in 1.18 (latest) version.
The truth is, that RTC features just about 5% of the whole mORMot architecture. That is, it is only the Client-Server access. It is now a complete SOA framework, ready to serve Domain-Driven-Design solutions.
See http://blog.synopse.info/post/2012/04/25/The-mORMot-attitude
RTC may be worth the price if you need it, e.g. cross-platform abilities.
But we are not in 2003 any more, since http.sys is available.
Your comment is perfectly wrong, both from the technical and “commercial” point of view.
I agree.
1. Who is David?
2. I did run a number of tests with mORMot, but I don’t need to read its source code nor documnetaion to see that it is creating and destroying threads as connections get opened and closed. All I need is to monitor the Windows Task Manager while a test is running with Clients which do exactly that.
3. I was kind-of expecting this reaction, but I would like to emphasize that my tests were done using the mORMot Server uploaded to github my Roberto Schneider. If there is a better implementation which does the same, please upload it and I will test that.
Btw … if the mORMot Server has a Thread pool, then it is very poorly imlemented, because (A) it was struggling to handle more than 1.500 connections with thousands of dropped connetions when testing with 5.000 clients and (B) even with a smaller number of connections, the CPU usage would go above 90% and Thread counts would start dropping when Clients started disconnecting.
In a Test-case where a fixed number of Clients are used to send a high number of requests over a stable connection in a single batch before closing the connections, the time required to open and close a connection is not included in the results.
This is similar to the effect you can see when comparing DataSnap results with and without using Keep-Alive.
If you have shorter bursts of requests, where each Client has to open a connection before sending its requests and close it afterwards, the time it takes a Server to accept and close a connection becomes more important than the raw request/response processing time.
And this has not been tested at all in a test scenario set up by Roberto. The test scenario simply assumned that all the Clients will connect to the Server and then use a stable connection to communicate with the Server.
While this might be the case in an ideal world or inside LAN, it is rarely the case over the Intenet, where you can have hundreds of clients connecting and disconneting at the same time.
@Danijel – not @David – my mistake.
As I wrote, I suspect you simply did not test THttpApiServer implementation, but the WinSock-based class. The thread pool is not used if IOCP limits are reached – which is the case with the default number of threads (20) with thousands of connections.
You have to register the URI and port (once with administrator rights) to http.sys – see the documentation (even basic WCF-related doc is enough). Or run the server with administrative rights.
The http.sys based server works very well… unless you state that ISS HTTP layer is not scalable? mORMot layer is used in production on the Internet, with great success.
My apologies. I have not started the Server as Administrator. When the Server is running with Administrator rights, it has no problems with high connection counts nor connections opening and closing.
Btw … I wasn’t aware that someone with no actual experience using the RTC SDK was writing an article about it. And now I see that your article is almost a year old? To be frank, I didn’t even know what mORMot is, before I read this post from Roberto.
Anyway … I’m glad that I’m not the only developer trying to enrich the Delphi 3rd-party community with quality products and I have no problem being compared to other similar products, but I would prefer it if someone with actual experience using RTC SDK and no bias towards other products. If you need testimonials from developers who are actually using the RTC SDK, head over to: http://www.realthinclient.com/powered.htm
You have a great sense of humor,irony and self-derision, as we can see! 🙂
In this blog, I was speaking about RTC features, not about the quality of your work, of course. It is not about about technical level, it is about scope. I wrote a blog article because I did not have enough space in EMB forums to write what I had in mind.
The RTC feature list is quite complete in your web site, samples are provided, but RTC and mORMot are very diverse projects.
mORMot is much more than a RESTful connection layer. RTC is great, I’ve never doubt nor written the contrary. But the communication layer is just one little brick for a Domain Driven Design solution. This is the point of the article.
@A.Bouchez:
RTC and mORMot follow a different design principle and I this is good, because we need diversity. But since all of your “knowledge” about RTC came from a few minutes spent browsing through the RTC product website, I really think you lack the qualifications to write about RealThinClient components.
To make things worse, you have a strong bias towards your own framework, which is clouding your judgement when it comes to evaluating 3rd-party solutions that serve the same or similar purpose.
Bottom line is … you should only write about things you have personal experience with and RTC does not seem to fit that description.
I had the same with my bad 🙂 localhost test:
http://andremussche.blogspot.com.br/2013/01/datasnap-ro-rtc-mormot-wcf-node-speed.html
(again, it is not a real network test but gives an indication of the performance between the different solutions)
It’s great to see someone giving continuity to my work. Especially because I not had enough time to test all the options (RTC and RO), just as you did.
The tests are very different. But you have arrived at the same conclusions. mORMot and RTC are extremely fast. Of course the mORMot is much more than a REST api, but this is not the point.
I enjoyed your post, excellent work.
I’ve been running my own tests for several hours yesterday with the RTC Test Server (using “MultiThreaded” and “Generate JSON on-the-fly”) and the mORMot Console Server (running as Administrator).
In my tests, I have started with 1 Client opening a connection to the Server and sending 20.000 requests in a batch, then closing the connection, writing the results to a LOG file, then increasing the Client count by 1 and repeating the process, so that every client opens a connection and immediately, starts sending requests, until every one of the Clients has sent 20.000 requests,and received 20.000 responses, closing its connecting after receiving its 20.000-th response.
Because of the connection opening at the beginning of each batch and closing at the end, this test shows how the Server would behave at peak times with a specific number of Clients arriving, sending a batch of requests for processing and then disconnect.
Here are my Test Projects (the Test Client and both Servers) in Source Code and as precompiled Win32 executables (compiled with D2010), along with my Test Results for 1 to 25 Clients with a 1-step increaments, 26 to 68 clients with a 2-step increment, 70 to 97 clients with a 3-step increment and 100 to 150 clients with a 5-step increment:
http://www.realthinclient.com/free/ServerTestV2.zip
The Test Client (RTCWebStressTool) was measuring the time and generating LOG files, which I’ve renamed after each test batch. At the end of all test runs (4 test runs per Server), I’ve imported all the raw log data into Excel and created Graphs to show and compare Server behavior. The Excel file is included in the above ZIP file, along with all the Graph data as PNG images:
Please note that all of the included tests were done over Localhost, which means that the Client was running on the same PC as the Server being tested, with no actual Network communication. Even though such tests can be useful when comparing raw performance, they do not show how the Server would behave inside a Network.
Thanks for sharing the results of your tests.
Danijel, thank you for great test
Btw … I was running my tests on an ASUS Notebook with an Intel® Core™ i7 CPU 720QM 2.66 GHz on WIndows 7 Professional (64-bit):
http://www.asus.de/Notebooks/Gaming_Powerhouse/G73Jh/#specifications
Correction: WIndows 7 Home Premium 64-bit @ 1.73Ghz,and 8 GB RAM asccording to Windows “System” information.
Roberto,
Which one is faster than? programs (mormot, rtc, datasnap, wcf etc…) or web servers (apache, iis)?
This may depend on the situation and the framework you use with IIS/apache. But in general, when dealing with delphi, RTC and mORMot are faster.
Apache and IIS are… programs themselves!
If you use Apache and IIS as host, i.e. if your program is called from them (e.g. as ISAPI modules), it will certainly be slower than serving plain file content from Apache/IIS itself, since you are adding a layer. Some users report that IIS can be very difficult to stabilize and work with, when you are outside the “official” .aspx boundaries.
If your application is serving directly HTTP content, it will be as fast as its HTTP layer is fast. RTC is fast. mORMot/WCF both use http.sys API, which is fast. Indy can be fast, but its design sounds outdated (no IOCP).
Note that if you use the http.sys-based API – see http://msdn.microsoft.com/en-us/library/aa364510 – in your application/project (as mORMot or WCF do), you are using in fact directly the lowest layer of IIS. You can even share some port and URI between the “official IIS” and your application (e.g. very handy to handle AJAX security without breaking the cross-site boundaries). When serving for instance plain files or in-memory (calculated) content, you use the same exact code than IIS, with a kernel-mode queue, certified, tested, secured and optimized by Microsoft. For free (or, to be more precise, included in the price of the OS since XP).
But for real web applications or services, the server layer is not the real bottleneck (unless its implementation is broken). The bottleneck is the process itself: how security and sessions are handled, how data is accessed/cached/prepared/optimized, how multi-thread friendly the code works. This is why for instance, in André’s tests, changing the memory manager to ScaleMM2 greatly improves Indy, DataSnap and RO process, which were not designed to scale on multi-core CPUs and a huge number of clients. To write “scaling” applications in Delphi, due to some design choices from Borland’s age (and never enhanced since), you have to follow some patterns like the one we used for mORMot – http://blog.synopse.info/post/2011/05/20/How-to-write-fast-multi-thread-Delphi-applications
thank you
I am currently running stress-tests inside my LAN and have set up remote access to all PCs involved in the Test (9 Client PCs and 1 Server). If anyone would like to take a look while the Test is still running, send me an E-Mail. I plan to leave the test running for a few days, unless something unexpected happens (like a power outage or hardware failure). My contact details can be found at: http://www.realthinclient.com/contact.htm
I have posted a Screenshot and a 15-minute Video to the RealThinClient Facebook page to show how RTC Portal (using the RealThinClient SDK for communication and also available from RTC with full source code) is being used to monitor a stress-test using 9 PCs running RTC Stress-Test Clients (at that time, with 1.500 active connections and growing) to flood a RTC Test Server.
http://www.facebook.com/RealThinClient
At the time of posting, the test has been running for 11 hours, starting with 5 connections per Client PC and increasing the connection count by 5 after sending 20.000 requests from all active connections.
RealThinClient Server performance results after the first hour of running a Stress-test and 10 million processed requests:
RealThinClient Server performance results after 20 hours of running a stress-test and over a billion requests processed:
The test is running inside a 100 MBit LAN with 9 Client PCs (1-8 + X) and 1 Server.
Clients 1 to 8 have a 662 MHz CPU,
Client X has a dual-core 2,2 GHz CPU,
Server has a dual-core 2,8 GHz CPU.
All Clients have started by opening 5 connections and sending 20.000 requests to the Server from each connection in a single “batch”, then closing all connections and calculating the time it took for the Client to open all connections, send out all requests, receive all responses and close all connections again. The test is repeated with 5 connections more per PC in each run, so the number of active clients is growing by 9 x 5 = 45 connetions every batch run.
This test shows how the RTC Server would behave at peak times, when a high number of clients would come to the Server at once, process a high number of requests and disconnect while the Server is already under heavy load.
thank you for test
RealThinClient Server performance results after 40 hours of running a stress-test and over two billion requests processed:

More stress-test results are now available from
http://www.realthinclient.com/tests.htm
Hi Robert,
DHR is asking – what did you eventually choose and why did you choose it?
Are you going to code the client and mobile in Delphi? Would other choices reasonable?
What about mobile clients? Will you be using FMX in the future?
Did you see this?
http://delphihaters.blogspot.com.br/2013/02/xamarin-vs-embarcadero-over-number-of.html
Can you blog about your experiences with Delphi and why you decided to leave it?
Can (this blog) ask about Delphi in Brazil. How are things there?
There are lots of unemployed Delphi developers in Brazil. They have difficulty finding jobs and more and more companies are switching to C#, PHP and Java.
The developers who are left, they find it harder and harder to pay for updates and upgrades every year.
Also, how is IntraWeb usage in Brazil? Would like to ask such questions because there are lots of e-commerce sites in Brazil, but very, very few are made in Delphi.
I do not think Robert “decided to leave” Delphi.
Perhaps DataSnap, but not Delphi – this was not the point of his blog articles.
Delphi is not the problem. It is a very good IDE with a good compiler, which does a very good job when used with properly written components and code.
Hi DHR.
Thanks for the questions.
They are interesting questions. Probably write a post to answer them.
Strange, I did some hacking and profiling to see how fast I could get Datasnap, and I see that also my “Plain Indy” is much slower in XE3 than D2010! In D2010 is get about 11.500 request per second, the same in XE3 only 7.700…
After some hacking in DataSnap (mainly disabling the sessions) I got 4700 req/s (was 3200reg/s) and also stable performance (no steep decline). I could not get more without much rework…
When looking at the DS source code my conclusion is: it is not optimized for “high performance” (stupid advertisement of EMBT!). I mean: all kinds of helper objects are created and destroyed on the fly, many UTF8 decoding conversions (implicit due to rtti?), RTTI context is not cached, no connection pool (new connections are created and closed for each request), etc.
That disappoints me even more.
See my post on G+ Delphi discussion AND the comments: https://plus.google.com/u/0/110131086673878874356/posts/8jzDsMRCW3V
It seems JMeter can give some strange results (lower req/s) but with RTC stress test tool I got some better results with DS (when disabling sessions -> 7000req/s).
Marco Cantu said they are working on many (?) improvements and maybe also making sessions optional. At least they noticed these discussions about it 🙂
This unique article, “DataSnap analysis based on
Speed & Stability tests – Part 2 Roberto Schneiders” ended up being terrific.
I am making out a clone to present my buddies. Thanks for the post,
Lavonne
Thank you for the great job! Will you test XE4, since there are some DataSnap fixes?
Republicou isso em VoidBytee comentado:
Interessante material sobre Datasnap, referente a velocidade e estabilidade do framework.
do you think a new test for XE5?
Not really. I have no more interest in DataSnap. For Delphi projects I’m using mORMot and is working like a charm!
Furthermore, AFAIK there has been no improvement in datasnap versions XE4 and XE5.
I’m working with Java at the moment. And investing my free time in Ruby / Ruby on Rails! 😀
também estou decepcionado com a política de atualizações da Embarcadero…
Roberto, XE6 seems to have made changes to DataSnap. Any chance you could re-compile your test code in XE6 and run the same tests (and preferably on the same hardware, under the same conditions) now to produce a comparable review? – this would be really helpful to so many developers around here…
Hi Alex,
I no longer able to reproduce the tests in the same environment with the same hardware. I no longer work in Sysmo Sistemas.
I do not think there have been major improvements. If that were the case, the Embarcadero would publish their own benchmarks and use as a marketing tool, they are very good at it. Or maybe it’s too embarrassing to publish the results obtained by the older versions (XE2, XE3 and XE4).
Remember that the Marco Cantù said he would need to make major changes in the entire structure to solve these problems. I do not think they have done it.
However, a few days ago, a Brazilian named Roberio Praciano came to me to use the sources to redo the tests with XE5 and XE6. And I think he will do it. Certainly it will redo all the tests (including XE2) in his environment to maintain a consistent basis for comparison.
Hi Roberto!
First of all, thanks a lot for that, it’s being really important to us to decide to give a chance to EMB or implement the server using Node JS.
As you said in your last comment, a Brazilian named Roberio Praciano came to you to redo the test with XE5 and XE6, Does someone knows something about these test? Or someone has tried DataSnap with new versions? Is there any good new?
Thanks guys!
Rafael.
Hi Rafael. Thanks for the feedback.
The new tests aren’t published yet, but it will be soon. I talk to Roberio just now and he did tests with XE7 too, and expects to publish the article in a week.
I can anticipate that have some improvements, but, not enough to come close to TMS Sparkle or Mormot.
Wishing to see that Roberto! Thank you very much for your help, I think all of us are waiting to see if performance is being improved with all those new version of EMB…
So I’ll keep having a look of your blog, Do you know if Roberio will post it here as well?
Thanks once again!
Rafael.
Hi Rafael.
Yes, he will post in his blog but I will post a link here, so, stay tuned in this blog or DelphiFeeds.
If i’m not mistaken all datasnap functionality based upon Wininet libray.
And it is limited to 2 concurrent connections per server.
So basically you cannot really test concurrent perfomance on default settings.
Try this:
uses Winapi.WinInet;
const
INTERNET_OPTION_MAX_CONNS_PER_SERVER = 73;
INTERNET_OPTION_MAX_CONNS_PER_1_0_SERVER = 74;
maxConnections : integer = XX;
initialization
InternetSetOption(Nil, INTERNET_OPTION_MAX_CONNS_PER_SERVER, @maxConnections, SizeOf(maxConnections));
InternetSetOption(Nil, INTERNET_OPTION_MAX_CONNS_PER_1_0_SERVER, @maxConnections, SizeOf(maxConnections));
We used 3 products: RO, RTC and Mormot.. I won’t speak about RO ( slow and heavy ). We tried RTC but was too very slow and CPU consuming in getting lots of 1000 .. 5000 dynamically fetching OPC tags (let’s say list of small objects) – at least once per second (one client). I mean Mormot is FAST and we’re glad to be so. We use Mormot in actual productions 24/7 on several sites: servers don’t even blink on client requests and run smoothly and reliably.
Regards,
Vojko Cendak