Issue: 1- The JVM was doing Garbage Collection every 10 to 15 seconds and spiking the CPU to grinding halt.
2- The thread dumps revealed that the reflect class unloading is happening on a Full GC - see below:
sun.reflect.GeneratedSerializationConstructorAccessor
3- The message file size exception occurred as the maximum file size limit was set to 30MB and the messages were more than this size
4- The full Garbage collection was taking more time and JVM pause are seen in the thread dump.
Environment: HPUX 11iV3 with itanium 64 bit
Solution:
This is how it got resolved - tweaked the JAVA memory and mapping the memory page separately for each call with increasing the stacksize and applying some missed libraries and Patches.
The issue is that on HPUX Itanium 64 bit - the java runs in a 32 bit mode by default unless you ask it to run on 64 bit mode. This can be verified by "java -version" and the "java -d64 -version" and then by "java -d32 -version". However if the libraries exist and the kernel patches are installed then Weblogic makes a call and understand that there is 64 bit installed and will add these flages "-client -d64" but in some cases it does not do thatand we need to add.
Also the JDK that gets installed on HPUX is not Oracle JDK but Oracle JDK ported by HP on HPUX and hence the version 6.0.10 is same as oracle JDK 1.6 update 24 and the latest oracle JDK 1.6 update 29 is in 6.0.13. Oracle JDK 1.7 is just release in December 2011 and Version 7.0.00 is Oracle JDK 7.0u1 .
Besides all of the above - to get all the huge piles of messages processed the JAVA tuning is needed as well and after careful consideration, below is the solution.
Here are the Steps:
1- Go to the Web site
http://hpux.connect.org.uk/
libiconv-1.14libxml2-2.7.8libxslt-1.1.26zlib-1.2.5
PHSS_37501
PHCO_38050
PHSS_38139
PHKL_40208
PHKL_35552
max_thread_proc 1024
maxfiles 256
nkthread 3635
nproc 2068
JAVA_VM=-server
Export JAVA_VM5- Make the changes in setDomainEnv.sh and make sure that these MEM arguments take effect – I mean if you are setting something in startManagedServer.sh or somewhere else then it need to be modified there.
If your system is not enabled with NUMA then:
if [ "${SERVER_NAME}" =
"AdminServer" ] ; then
USER_MEM_ARGS="-d64 -Xmpas:on -Xss1024k -Xms4096m
-Xmx4096m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC"
else
USER_MEM_ARGS="-Xmpas:on –Xss2048k –Xmx8g –Xmn6g –Xingc -XX:+ForceMmapReserved –XX:PermSize=2g
–XX:MaxPermSize=2g –XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=3
–XX:NewRatio=4 –XXCMSTriggerRatio=50 -XX:+UseCompressedOops”
fi
If your system is enabled with NUMA then:
if [ "${SERVER_NAME}" = "AdminServer" ] ; then
USER_MEM_ARGS="-d64 -Xmpas:on -Xss1024k -Xms4096m -Xmx4096m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC"
else
USER_MEM_ARGS="-Xmpas:on –Xss2048k –Xmx8g –Xmn6g –Xingc -XX:+ForceMmapReserved –XX:PermSize=2g –XX:MaxPermSize=2g –XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=6 – -XX:+UseCompressedOops -XX:+UseNUMA -XX:-UseLargePages”
fi
[The difference is that when NUMA is enabled the I/O to memory is not an issue and I let the Full GC at 92% as it is not going to impact. Also note the thread - I changed it to 6 as I have a 8 CPU machine with NUMA so I keep 2 CPU free whereas in the non-NUMA case I only had 4 CPUs]
[The difference is that when NUMA is enabled the I/O to memory is not an issue and I let the Full GC at 92% as it is not going to impact. Also note the thread - I changed it to 6 as I have a 8 CPU machine with NUMA so I keep 2 CPU free whereas in the non-NUMA case I only had 4 CPUs]
Here you can add other conditions to have each managed server different memory setting as per need basis.
Explanation: -Xmn (New Size) need to be set in HPUX
environments – If not set then it is 1/3 the value of the Xmx. So I changed it
to go 6G to make the New size bigger to avoid out of room and less frequent GC.
Not setting the Xms option and replacing it with
-XX:+ForceMmapReserved is more efficient than asking the JVM to allocate pages.
This way the OS MMAP reserves the pages.
-- More details from HP document:
-XX:+ForceMmapReserved
used to map the Java™ heaps and ensures that swap is reserved for the full memory
mapped region when it is first created. When using this option the JVM no longer needs
to touch the memory pages within the committed area to reserve the swap and as a
result , no physical memory is allocated until the page is actually used by the application
-- More details from HP document:
-XX:+ForceMmapReserved
Tells the JVM to reserve the swap space for all large memory regions used by the JVM
va™ heap). This effectively removes the MAP_NORESERVE flag from the mmap callused to map the Java™ heaps and ensures that swap is reserved for the full memory
mapped region when it is first created. When using this option the JVM no longer needs
to touch the memory pages within the committed area to reserve the swap and as a
result , no physical memory is allocated until the page is actually used by the application
Adding the ParallelGC thread – to make sure to change the default
behavior. Bt default it is equivalent to the number of processor. SO I want to
make CPU available while GC is going on – as I remember correctly that we only
have 4 CPU.
Adding NewRatio and making it to 4 is also to change the
default behavior which is a ration of new to old generation and by
default it is 1:8 and that seems to be small for this setup at CIGNA and
hence increasing it to make a bigger size – Now the GC will not run too
often as it is now 1:4 ratio.
CMSTriggerRatio is being suggested at 50 percent – so there will
always be a heap size available when the Full GC is in progress. This is the
ratio between free to non-free heap. So up to 4Gig of space will be
available when Full GC starts and will only run on 2 CPUs
Finally -XX:+UseCompressedOops – directing JVM to save memory by
using 32bit pointers whenever possible and hence use less memory.
-Xingc = Use incremental GC that means run the GC on
unused memory in the concurrent mark sweep generation.
6- Added following to the JAVA_OPTIONS of setDomainEnv.sh file:
“-Dsun.reflect.noInflation=true”
sometime the environment can be tricky so alternate to setDomainEnv.sh would be to add to startWeblogic.sh
7- Implement the message size increase across the Weblogic Server by setting following (110MB)
“-Dsun.reflect.noInflation=true”
-Dweblogic.MaxMessageSize=115343360
8- In other case of OSB on IA64 I also end up adding following options.
More Details on NUMA from HP Document:
Starting in JDK 6.0.06, the Parallel Scavenger garbage collector has been extended to take advantage of the machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.
In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from" and "to" survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.
The NUMA-aware allocator can be turned on with the -XX:+UseNUMA flag in
conjunction with the selection of the Parallel Scavenger garbage collector. The Parallel Scavenger garbage collector is the default for a server-class machine. The Parallel Scavenger garbage collector can also be turned on explicitly by specifying the -XX:+UseParallelGC option.
Applications that create a large amount of thread-specific data are likely to benefit most from UseNUMA. For example, the SPECjbb2005 benchmark improves by about 25% on NUMA-aware IA-64 systems. Some applications might require a larger heap, and especially a larger young generation, to see benefit from UseNUMA, because of the division of eden space as described above. Use -Xmx, -Xms, and -Xmn to increase the overall heap and young generation sizes, respectively. There are some applications that ultimately do not benefit because of their heap-usage patterns.
Specifying UseNUMA also enables UseLargePages by default. UseLargePages can
have the side effect of consuming more address space, because of the stronger alignment of memory regions. This means that in environments where memory is tight but a large Java heap is specified, UseLargePages might require the heap size to be reduced, or Java will fail to start up. If this occurs when UseNUMA is specified, you can disable UseLargePages on your command line and still use UseNUMA; for example:
-XX:+UseNUMA -XX:-UseLargePages.
8- In other case of OSB on IA64 I also end up adding following options.
-XX:+UseNUMA -XX:-UseLargePages
Explanation: When NUMA is enabled the LargePages are
enabled by default – so we want to use NUMA power but disabling the large
pages.
More Details on NUMA from HP Document:
Starting in JDK 6.0.06, the Parallel Scavenger garbage collector has been extended to take advantage of the machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.
In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from" and "to" survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.
The NUMA-aware allocator can be turned on with the -XX:+UseNUMA flag in
conjunction with the selection of the Parallel Scavenger garbage collector. The Parallel Scavenger garbage collector is the default for a server-class machine. The Parallel Scavenger garbage collector can also be turned on explicitly by specifying the -XX:+UseParallelGC option.
Applications that create a large amount of thread-specific data are likely to benefit most from UseNUMA. For example, the SPECjbb2005 benchmark improves by about 25% on NUMA-aware IA-64 systems. Some applications might require a larger heap, and especially a larger young generation, to see benefit from UseNUMA, because of the division of eden space as described above. Use -Xmx, -Xms, and -Xmn to increase the overall heap and young generation sizes, respectively. There are some applications that ultimately do not benefit because of their heap-usage patterns.
Specifying UseNUMA also enables UseLargePages by default. UseLargePages can
have the side effect of consuming more address space, because of the stronger alignment of memory regions. This means that in environments where memory is tight but a large Java heap is specified, UseLargePages might require the heap size to be reduced, or Java will fail to start up. If this occurs when UseNUMA is specified, you can disable UseLargePages on your command line and still use UseNUMA; for example:
-XX:+UseNUMA -XX:-UseLargePages.