Friday, December 30, 2011

HPUX IA64 JVM issue with Weblogic

I thought, it would be interesting to add this as a post for Oracle Fusion Middleware application. Recently I was asked to help a large healthcare company that was having issue with processing the messages those are being generated by OSB for claims. They had a separate weblogic clustered domain for handling of messages only and OSB was on a separate weblogic domain.


Issue: 1- The JVM was doing Garbage Collection every 10 to 15 seconds and spiking the CPU to grinding halt.
        2- The thread dumps revealed that the reflect class unloading is happening on a Full GC - see below:


       sun.reflect.GeneratedSerializationConstructorAccessor
     
       3- The message file size exception occurred as the maximum file size limit was set to 30MB and the messages were more than this size


      4- The full Garbage collection was taking more time and JVM pause are seen in the thread dump.


Environment: HPUX 11iV3 with itanium 64 bit


Solution:


This is how it got resolved - tweaked the JAVA memory and mapping the memory page separately for each call with increasing the stacksize and applying some missed libraries and Patches. 


The issue is that on HPUX Itanium 64 bit - the java runs in a 32 bit mode by default unless you ask it to run on 64 bit mode. This can be verified by  "java -version" and the "java -d64 -version" and then by "java -d32 -version". However if the libraries exist and the kernel patches are installed then Weblogic makes a call and understand that there is 64 bit installed and will add these flages "-client -d64" but in some cases it does not do thatand we need to add. 


Also the JDK that gets installed on HPUX is not  Oracle JDK but Oracle JDK ported by HP on HPUX and hence the version 6.0.10 is same as oracle JDK 1.6 update 24 and the latest oracle JDK 1.6 update 29 is in 6.0.13. Oracle JDK 1.7 is just release in December 2011  and Version 7.0.00 is Oracle JDK 7.0u1 .




Besides all of the above - to get all the huge piles of messages processed the JAVA tuning is needed as well and after careful consideration, below is the solution.


Here are the Steps:


1-    Go to the Web site

     http://hpux.connect.org.uk/
   Use the search button to find the following libraries:
 

libiconv-1.14
libxml2-2.7.8
libxslt-1.1.26
zlib-1.2.5
 2--    Make sure that you have all of the following patches installed. 

             PHSS_37501
             PHCO_38050
             PHSS_38139
             PHKL_40208
             PHKL_35552
  
 3- Make sure that following HPUX kernel parameters have following value at the least.

max_thread_proc  1024
maxfiles                256
nkthread               3635
nproc                    2068

  
 4- The JVM by default runs on "client" mode - its okay but when the volume is large - it need to be changed to the server mode. To do this, I have added the following line in startWeblogic.sh (This is if you are running in a development mode).



JAVA_VM=-server
Export JAVA_VM


5- Make the changes in setDomainEnv.sh and make sure that these MEM arguments take effect – I mean if you are setting something in startManagedServer.sh or somewhere else then it need to be modified there. 




If your system is not enabled with NUMA then:


if [ "${SERVER_NAME}" = "AdminServer" ] ; then
        USER_MEM_ARGS="-d64 -Xmpas:on -Xss1024k -Xms4096m -Xmx4096m -XX:MaxPermSize=1024m  -XX:+UseConcMarkSweepGC"

else
        USER_MEM_ARGS="-Xmpas:on –Xss2048k –Xmx8g –Xmn6g –Xingc -XX:+ForceMmapReserved –XX:PermSize=2g –XX:MaxPermSize=2g –XX:+UseConcMarkSweepGC  -XX:ParallelGCThreads=3 –XX:NewRatio=4 –XXCMSTriggerRatio=50 -XX:+UseCompressedOops”

fi

If your system is enabled with NUMA then:



if [ "${SERVER_NAME}" = "AdminServer" ] ; then
        USER_MEM_ARGS="-d64 -Xmpas:on -Xss1024k -Xms4096m -Xmx4096m -XX:MaxPermSize=1024m  -XX:+UseConcMarkSweepGC"

else
        USER_MEM_ARGS="-Xmpas:on –Xss2048k –Xmx8g –Xmn6g –Xingc -XX:+ForceMmapReserved –XX:PermSize=2g –XX:MaxPermSize=2g –XX:+UseConcMarkSweepGC  -XX:ParallelGCThreads=6 – -XX:+UseCompressedOops -XX:+UseNUMA  -XX:-UseLargePages

fi


[The difference is that when NUMA is enabled the I/O to memory is not an issue and I let the Full GC at 92% as it is not going to impact. Also note the thread - I changed it to 6 as I have a 8 CPU machine with NUMA so I keep 2 CPU  free whereas in the non-NUMA case I only had 4 CPUs]


Here you can add other conditions to have each managed server different memory setting as per need basis.


Explanation: -Xmn  (New Size) need to be set in HPUX environments – If not set then it is 1/3 the value of the Xmx. So I changed it to go 6G to make the New size bigger to avoid out of room and less frequent GC.

Not setting the Xms option and replacing it with -XX:+ForceMmapReserved  is more efficient than asking the JVM to allocate pages. This way the OS MMAP  reserves the pages. 


-- More details from HP document:



-XX:+ForceMmapReserved 


Tells the JVM to reserve the swap space for all large memory regions used by the JVM 
va™ heap). This effectively removes the MAP_NORESERVE flag from the mmap call
used to map the Java™ heaps and ensures that swap is reserved for the full memory
mapped region when it is first created. When using this option the JVM no longer needs
to touch the memory pages within the committed area to reserve the swap and as a
result , no physical memory is allocated until the page is actually used by the application

Adding the ParallelGC thread – to make sure to change the default behavior. Bt default it is equivalent to the number of processor. SO I want to make CPU available while GC is going on – as I remember correctly that we only have 4 CPU.

Adding NewRatio and making it to 4  is also to change the default behavior which is  a ration of new to old generation and by default it is 1:8 and that seems to be small for this setup at CIGNA and hence  increasing it to make a bigger size – Now the GC will not run too often as it is now 1:4 ratio.

CMSTriggerRatio is being suggested at 50 percent – so there will always be a heap size available when the Full GC is in progress. This is the ratio between free to non-free heap.  So up to 4Gig of space will be available when Full GC starts and will only run on 2 CPUs

Finally -XX:+UseCompressedOops – directing JVM to save memory by using 32bit pointers  whenever possible and hence use less memory.


-Xingc = Use incremental GC  that means run the GC on unused memory  in the concurrent mark sweep generation.



6- Added following to the JAVA_OPTIONS of setDomainEnv.sh file:


   “-Dsun.reflect.noInflation=true”


sometime the environment can be tricky so alternate to setDomainEnv.sh would be to add to startWeblogic.sh

7- Implement the message size increase across the Weblogic Server by setting following (110MB)
 -Dweblogic.MaxMessageSize=115343360


8- In other case of OSB on IA64 I also end up adding following options.



     -XX:+UseNUMA  -XX:-UseLargePages

Explanation:  When NUMA is enabled the LargePages are enabled by default – so we want to use NUMA power but disabling the large pages.


More Details on NUMA from HP Document:



Starting in JDK 6.0.06, the Parallel Scavenger garbage collector has been extended to take advantage of the machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.


In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young  generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from" and "to" survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.


The NUMA-aware allocator can be turned on with the -XX:+UseNUMA flag in
conjunction with the selection of the Parallel Scavenger garbage collector. The Parallel Scavenger garbage collector is the default for a server-class machine. The Parallel Scavenger garbage collector can also be turned on explicitly by specifying the -XX:+UseParallelGC option.


Applications that create a large amount of thread-specific data are likely to benefit most  from UseNUMA. For example, the SPECjbb2005 benchmark improves by about 25% on NUMA-aware IA-64 systems. Some applications might require a larger heap, and especially a larger young generation, to see benefit from UseNUMA, because of the division of eden space as described above. Use -Xmx, -Xms, and -Xmn to increase the overall heap and young generation sizes, respectively. There are some applications that ultimately do not benefit because of their heap-usage patterns.


Specifying UseNUMA also enables UseLargePages by default. UseLargePages can
have the side effect of consuming more address space, because of the stronger alignment of memory regions. This means that in environments where memory is tight but a large Java heap is specified, UseLargePages might require the heap size to be reduced, or Java will fail to start up. If this occurs when UseNUMA is specified, you can disable UseLargePages on your command line and still use UseNUMA; for example:
-XX:+UseNUMA -XX:-UseLargePages.

Please note that after applying the OS patches and libraries - it may not require the "-d64" flag. Please check with console log and see if this flag is getting added twice - in that case just remove it from above.

 Happy troubleshooting !!