Tomcat – Java – Vmware best Performance

Posted on 13 dicembre 2010 di

0


Far girare al massimo Tomcat e Java su Vmware Vsphere.

L’articolo riporta i test effettuati per la scelta della configurazione ottimale in ambiente virtuale, nell’installazione di Tomcat e Java su server vrituale Linux. Ovviamente non affermiamo verità assolute, e molto dipende da come sono scritte le applicazioni. Ci soffermeremo sulla relazione, in fase di installazione,   tra virtual CPU dedicate alle macchine virtuali e il tunning del GarbageCollector.

Cos’è il Garbage Collector?

“Java non prevede un esplicito intervento del programmatore per provvedere alla “pulizia” della memoria. Infatti, tale compito è svolto automaticamente da una speciale routine di sistema, il Garbage Collector, che si occupa di ricercare nell’heap gli oggetti non più referenziati, recuperando la memoria che essi occupavano.”


Risorse

Si consiglia la lettura dei docuemnti di seguito

Performance of Enterprise Java

Applications on VMware vSphere 4.1

and SpringSource tc Server”

http://www.vmware.com/files/pdf/techpaper/vsp_41_olio_tcServer.pdf

Configurazione

Di seguito degli estratti dai documenti ufficiali su riportati che segnalano dei paramentri importanti nella scelta del numero di CPU e parametri di configurazione del Garbage Collector.

Virtual CPUs

Determine the optimum number of virtual CPUs for a virtual machine that hosts a Java

application by testing the virtual machine configured with different numbers of virtual

CPUs with the same test load.

If you are using multiple garbage collector (GC) threads in your JVM (such as those

occasions when you use a parallel garbage collector) , then the number of garbage

collector threads should be equal to or less than the number of virtual CPUs that are

configured in the virtual machine”

Pag 10

In the virtual machine implementation, if there are several threads that are ready to run at the

same time, then that application system may or may not benefit from having multiple virtual

CPUs present in the virtual machine. The nature of the work being done in the threads really

determines the performance gain from parallel thread execution. If each thread takes up only

a small portion of its allotted time slice, then a single virtual CPU may be as good as or better

than multiple virtual CPUs. This really needs to be thoroughly tested with your particular

application under suitable loads to establish the best virtual CPU configuration.

Virtual CPUs and Threads for Garbage Collection

JVMs have options that allow the user to determine the number of garbage collection threads

that may be active at any one time. This feature is determined by the following JVM runtime

options:

Xgcthreads<n> (for the IBM JVM)

-XXgcthreads<n> (for the JRockit JVM)

-XX:ParallelGCThreads=<n> (for the Sun JVM) White Paper Java in Virtual Machines on VMware ESX: Best Practices

11

where <n> represents the number of GC threads to be used by the JVM. If the number of

virtual CPUs configured in the virtual machine containing a Java program is not equal to or

greater than the number of Java GC threads, then the performance gains that are expected

from using multiple GC threads will be affected. Since in that case there are not enough virtual

CPUs to schedule all of the GC threads at once, then some of the GC threads will be held up

and the time to complete GC events will likely be longe

Dal documento vmware si evince in modo chiaro che un numero elevato di CPU non determina delle maggiori performance ma è vero il contrario sopratutto nell’utilizzo di UseParallelGC (opzione abilitata nella configurazione del nostro ambiente in produzione).

Alla ricerca di maggiori informazioni, ed approfondendo il perchè di un uso così massiccio della CPU, ho trovato nel forum vmware un utente che ha riscontrato il nostro medesimo problema:

http://communities.vmware.com/thread/211871?start=0&amp;tstart=0

While the Java process was taking 100% of the CPU, I attached GDB to the process and I ran a “thread apply all bt” and I noticed that all of the threads were in a pthread_cond_wait kernel vsyscall with the exception of two threads which had the following stack traces:

#0 0xb78c097e in PSPromotionManager::copy_to_survivor_space () from /usr/lib/jvm/java-1.5.0-sun-1.5.0.18/jre/lib/i386/server/libjvm.so
#1 0xb768e23c in instanceKlass::oop_copy_contents () from /usr/lib/jvm/java-1.5.0-sun-1.5.0.18/jre/lib/i386/server/libjvm.so
#2 0xb78c0721 in PSPromotionManager::drain_stacks () from /usr/lib/jvm/java-1.5.0-sun-1.5.0.18/jre/lib/i386/server/libjvm.so
#3 0xb78c2a4e in StealTask::do_it () from /usr/lib/jvm/java-1.5.0-sun-1.5.0.18/jre/lib/i386/server/libjvm.so
#4 0xb76521bf in GCTaskThread::run () from /usr/lib/jvm/java-1.5.0-sun-1.5.0.18/jre/lib/i386/server/libjvm.so

So, I did a little bit of reading on the GC options and discovered that on a single CPU system the parallel GC is not used which is probably why my single CPU system didn’t have any problems. In my experimentation I also found that setting the -XX:ParallelGCThreads=1 had no apparent effect until I also set -XX:UseParallelOldGC which tells the JVM to use parallel GC for both young and old generations.

When using either -XX:+UseSerialGC or the combination of -XX:ParallelGCThreads=1 and -XX:UseParallelOldGC, I no longer hit the 100% cpu usage problem in my initial testing. I am going to run with the -XX:+UseSerialGC and multiple vCPUs to see if everything appears stable. I also am going to leave the memory reservation set since it sounds like that’s a good idea for Java.”

4.2 The Throughput collector

  • By default the throughput collector uses the number of CPU’s as its value for number of GC threads.
  • On a computer with one CPU it will not perform as well as the default collector
  • Overhead from parallel execution (synchronization costs)
  • With 2 CPU’s the throughput collector performs as well as the default garbage collector.
  • With more then 2 CPU’s you can expect to see a reduction in minor GC pause times
  • You can control the number of threads with -XX:ParallelGCThreads=n
  • Fragmentation can occur
    • Reduce GC threads
    • Increase Tenured Generation size

Conclusioni

Un numero elevato di Cpu non comporta automaticamente un aumento delle performance

Nella scelta ParallelGarbageColletor è importante definire un numero di thred non superiore al numero di CPU



Annunci
Messo il tag: , , ,
Posted in: Java, Vmware