Reply
Thread Tools
javispedro's Avatar
Posts: 2,355 | Thanked: 5,249 times | Joined on Jan 2009 @ Barcelona
#191
Originally Posted by attila77 View Post
That stuff should be left to the OS, period.
Hehe, just wait until the processor guys come in and tell this should all be left to the uncore.

The initials of this behaviour we're all already seeing since Nehalem...
 
Posts: 1,341 | Thanked: 708 times | Joined on Feb 2010
#192
Originally Posted by attila77 View Post
? Again back to language vs platform ? But that's not even platform, I was making a point that even with X86 (even if I forget OSes), you have dozens of very different configurations, which in practice make all these targeted cache and thread level optimizations pointless at best or counterproductive at worst.
I think not. Once the VM for that particular hardware-platform is optimized, VM can in the init state quickly see how it can profile and optimize bytecode in the individual hardware configuration it is running at. Of course C++ program could do the same, but noone will want to implement it because most of this info is not useful to fully compiled code. Even if you know L2-cache miss/hit-rate is going nuts, you cannot do anything about it but (tell the user to) restart the program.

OS cannot help heap memory fragmentation problem or locality of alive objects even if it sees it happen in some process, unless the process has VM and re-JITtable bytecode (like Java and Python).
 
Posts: 1,341 | Thanked: 708 times | Joined on Feb 2010
#193
Originally Posted by javispedro View Post
Hehe, just wait until the processor guys come in and tell this should all be left to the uncore.

The initials of this behaviour we're all already seeing since Nehalem...
I do not think even in processor level anything can be done to the process memory heap fragmentation problem, but just increasing L2 data cache. Not, unless there is VM and bytecode. The heap memory always eventually will be in the physical "slow" RAM chips and the cost of caching lots of data scattered around physical (linear) memory will be relatively costly (to the foreseeable future)

Edit:
Well actually this could be done in the CPU+MMU-level. CPU could have BIG fine-grain TLB and MMU could do physical memory re-ordering based on runtime profile information.

Last edited by zimon; 2011-03-07 at 17:43.
 
Posts: 3,319 | Thanked: 5,610 times | Joined on Aug 2008 @ Finland
#194
Originally Posted by zimon View Post
I do not think even in processor level anything can be done to the process memory heap fragmentation problem, but just increasing L2 data cache.
On a side note, I think you're overestimating the impact of heap fragmentation (and current level of VM optimizations) in real life (apart from the anecdotal Firefox one, which is not really relevant as it does it's own memory management and has a fairly unique usage pattern). Take a look at the server domain, synonymous with long-running processes - then stable servers should avoid C(++) daemons, right ? Except... the whole of the LAMP stack is predominantly C/C++ based, Apache, Mysql, PHP, Squid, you name it, all running on high-load servers with several years of uptime. As said, we can go all academic on the possibilities of VMs and lackings of native code, but reality (still) begs to differ (hell, even Sun's Java Web Server was written in C/C++, and while very powerful, J2EE/JSP stuff is definitely not known for it's nimbleness).
__________________
Blogging about mobile linux - The Penguin Moves!
Maintainer of PyQt (see introduction and docs), AppWatch, QuickBrownFox, etc
 
Kangal's Avatar
Posts: 1,789 | Thanked: 1,699 times | Joined on Mar 2010
#195
I would like to resuscitate this thread.

It's very educational, can we keep debating.
Otherwise, does someone knowledgeable want to read everything and write a Conclusion / Summary / Pro's+Con's ?


Last edited by Kangal; 2011-04-04 at 09:07.
 
Capt'n Corrupt's Avatar
Posts: 3,524 | Thanked: 2,958 times | Joined on Oct 2007 @ Delta Quadrant
#196
Google has recently done some tests with optimized C++, Java, Go and Scalla, and C++ came out the clear winner:

WHITE PAPER: https://days2011.scala-lang.org/site...s3-1-Hundt.pdf

Here are some notable parts as they relate to this thread:
Jeremy Manson brought the performance of Java on par
with the original C++ version. This version is kept in the
java_pro directory. Note that Jeremy deliberately refused
to optimize the code further, many of the C++ optimizations
would apply to the Java version as well
and the conclusion:

We find that in regards to performance, C++ wins out by
a large margin. However, it also required the most extensive
tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.
The Java version was probably the simplest to implement,
but the hardest to analyze for performance. Specifically the
effects around garbage collection were complicated and very
hard to tune. Since Scala runs on the JVM, it has the same
issues.
Oh, I wish they had put a larger effort forth to optimize the java version rather than simply leaving it. 3.7x is a large difference in execution efficiency.

But C++ is clearly the fastest in this test.

Still there are far too many unknowns to draw a scientific conclusion. I think Google purposely did this to keep the debate ongoing

A great read.
 

The Following 2 Users Say Thank You to Capt'n Corrupt For This Useful Post:
Posts: 3,319 | Thanked: 5,610 times | Joined on Aug 2008 @ Finland
#197
Originally Posted by Capt'n Corrupt View Post
Still there are far too many unknowns to draw a scientific conclusion. I think Google purposely did this to keep the debate ongoing
I don't think there ever will be a proper conclusion - people want to hear "A is faster than B", i.e. see somebody go through the finish line first. However, the subject matter is complex enough that there are so many factors that can fundamentally change the conclusion. For example, if someone asks you what's quicker, a car, a bus or a bicycle, your gut instinct might be car, but in reality it depends is it open road or urban area, is it day or night, is it a rush hour or not, are you a good pedaler or not etc, etc.
__________________
Blogging about mobile linux - The Penguin Moves!
Maintainer of PyQt (see introduction and docs), AppWatch, QuickBrownFox, etc
 

The Following 4 Users Say Thank You to attila77 For This Useful Post:
erendorn's Avatar
Posts: 738 | Thanked: 983 times | Joined on Apr 2010 @ London
#198
Originally Posted by attila77 View Post
I don't think there ever will be a proper conclusion - people want to hear "A is faster than B", i.e. see somebody go through the finish line first. However, the subject matter is complex enough that there are so many factors that can fundamentally change the conclusion. For example, if someone asks you what's quicker, a car, a bus or a bicycle, your gut instinct might be car, but in reality it depends is it open road or urban area, is it day or night, is it a rush hour or not, are you a good pedaler or not etc, etc.
True, and in the end it depends a lot on how well you drive/ push on the pedals / don't get lost in transit
 

The Following User Says Thank You to erendorn For This Useful Post:
Capt'n Corrupt's Avatar
Posts: 3,524 | Thanked: 2,958 times | Joined on Oct 2007 @ Delta Quadrant
#199
Originally Posted by attila77 View Post
I don't think there ever will be a proper conclusion - people want to hear "A is faster than B", i.e. see somebody go through the finish line first. However, the subject matter is complex enough that there are so many factors that can fundamentally change the conclusion. For example, if someone asks you what's quicker, a car, a bus or a bicycle, your gut instinct might be car, but in reality it depends is it open road or urban area, is it day or night, is it a rush hour or not, are you a good pedaler or not etc, etc.
I think this is a very good assertion.

My gut tells me that in the end, the computer is translating some gobbledygook into native code to execute on the OS and the hardware beneath that. Regardless of the form of the code, there exists the possibility to arrive at the same native code and thus perform the same. The comparison should thus be between the compilers, the host environments and individual aspects of each correctly isolated to give a clearer picture of individuals strenghts/weaknesses rather than an all or nothing win/lose.

As an aside:

I have been developing in java for about a month, and find it a very comfortable working environment. This choice wasn't entered into lightly, as I did a serious comparison of a few other languages to base my future efforts.

I am an individual that does algorithmic and structural optimization to achieve the majority of my speed, though I am very lax on performance tuning, leaving these sorts of things to the compiler (I can already hear the groans ). Should I find myself wanting to eek out hidden performance, I will find myself in the non-trivial role of tuning my code. This will come later but is not a concern currently.

The choice for Java was based upon portability, though I see that with efforts like LLVM on the horizon, portability will not be a feature exclusive to Java. In the short term, this is a strength of the Java VM, and the maturity of its JIT should give it an advantage until LLVM matures sufficiently to make the choice of language moot when desiring portability.

To date, I have implemented a home-grown data structure which signals the beginning of an effort to develop a zero-config database system in Java. I will also be implementing some fast 3D graphics routines later on for my personal learning.

I am also thinking of producing a very simple structure to bring C/C++ like allocation to Java by way of paged references. I am genuinely curious if this will alleviate the GC slowdown typical for performance sensitive code, and the seeming central theme to the Java/C performance debate. Some searches have yielded no such classes, and I'm genuinely surprised that this is not more popular.
 
Capt'n Corrupt's Avatar
Posts: 3,524 | Thanked: 2,958 times | Joined on Oct 2007 @ Delta Quadrant
#200
It seems that my original supposition of GC was false, and if the following reference is to be believed, the very existence of accessible objects has an effect on GC performance.

It seems that GC works by traversing an object tree to find out which objects are no longer accessible. The larger the tree, the longer GC takes. My fallacious supposition was that there was a increment/decrement value stored with each object for a quick GC check and sweep to determine accessibility.

http://blog.dynatrace.com/2011/03/24...a-performance/

Finally allocate as much as you like, but forget as soon as possible, before the next GC run if possible. Still don't overdo it either, there is a reason why using StringBuilder is more efficient than simple String concatenation. And finally, keep your overall memory footprint and especially your old generation as small as possible. The more objects you keep the less the GC will perform.
Very interesting.

This seems somewhat backward to me. It seems it would be far easier to decrement a 'reference count' value on de-allocation (ie. a new assignment eg. foo=bar) on non-primitive types, and then check for a zero 'reference count' on the GC sweep, rather than traversing the entire object hierarchy to determine accessibility. In this scheme, you lose a tiny bit of time on de-allocation of non-primitives, but save TONS of time on the GC sweep. With a correctly chosen data-structure and algorithm as well as the de-allocation procedure, the GC sweep can be O(1) rather than O(N) especially for persistent data structures -- which is pretty much the story of every program in existence. In either case, it seems far faster than tree-traversal over the long-run, and far simpler as well.

The GC would be reduced to a check rather than a traversal.

This scheme would suffer if there were mass amounts of de-allocation, but I would *guess* that this would be a fairly limited case. It would also increase the size of non-primitive objects in memory by the size it would take to store the 'reference count', potentially increasing the memory used -- though it would be trivial. Also, the issue of memory fragmentation with this scheme has not been addressed.

Last edited by Capt'n Corrupt; 2011-06-15 at 16:59.
 
Reply

Tags
bada rox, dalvik, future, java haters, meego, meego?fail, nokia, sandbox sucks


 
Forum Jump


All times are GMT. The time now is 22:31.