Java – Persistent GC issues with Cassandra – long app pauses

cassandragarbage-collectionjava

We are using a 3 site, 3 nodes per site Cassandra 1.1.12 cluster that has 8GB ram allocated per node. We have been seeing long GC pauses on nodes periodically and that is messing with our applications realtime requirements. The systems we run on are 8 core systems with 24GB of ram.

We've been seeing 120+ second pauses where it does a stop the world GC.

We are running with these flags on JDK 1.7.0_04

-XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 
-Xms8G 
-Xmx8G 
-Xmn1600M 
-XX:+HeapDumpOnOutOfMemoryError 
-Xss180k 
-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+PrintHeapAtGC 
-XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime 
-XX:+PrintPromotionFailure 
-XX:PrintFLSStatistics=1 

Here is the detailed GC log leading up to the long pause:


2014-02-23T11:50:19.231-0500: 2119627.980: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 145905916
Max   Chunk Size: 3472057
Number of Blocks: 160577
Av.  Block  Size: 908
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119627.981: [ParNew
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   32269664 bytes,   32269664 total
: 1356995K->44040K(1474560K), 4.5270760 secs] 5829345K->4546031K(8224768K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 144857324
Max   Chunk Size: 2423465
Number of Blocks: 160577
Av.  Block  Size: 902
Tree      Height: 146
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 4.5295190 secs] [Times: user=24.78 sys=0.07, real=4.52 secs] 
Heap after GC invocations=82068 (full 2561):
 par new generation   total 1474560K, used 44040K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,  26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 4.5524690 seconds
{Heap before GC invocations=82068 (full 2561):
 par new generation   total 1474560K, used 1354760K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000)
  from space 163840K,  26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
2014-02-23T11:51:14.221-0500: 2119682.970: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 144857324
Max   Chunk Size: 2423465
Number of Blocks: 160577
Av.  Block  Size: 902
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119682.971: [ParNew
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   41744088 bytes,   41744088 total
: 1354760K->52443K(1474560K), 2.1589280 secs] 5856751K->4582809K(8224768K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 143937754
Max   Chunk Size: 1505947
Number of Blocks: 160575
Av.  Block  Size: 896
Tree      Height: 146
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 2.1613420 secs] [Times: user=12.82 sys=0.04, real=2.16 secs] 
Heap after GC invocations=82069 (full 2561):
 par new generation   total 1474560K, used 52443K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,  32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000)
  to   space 163840K,   0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000)
 concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 2.1719930 seconds
{Heap before GC invocations=82069 (full 2561):
 par new generation   total 1474560K, used 1363163K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000)
  from space 163840K,  32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000)
  to   space 163840K,   0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000)
 concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
2014-02-23T11:52:33.089-0500: 2119761.837: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 143937754
Max   Chunk Size: 1505947
Number of Blocks: 160575
Av.  Block  Size: 896
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119761.839: [ParNew
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   37906760 bytes,   37906760 total
: 1363163K->48710K(1474560K), 3.5105890 secs] 5893529K->4611208K(8224768K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 142756036
Max   Chunk Size: 326281
Number of Blocks: 160573
Av.  Block  Size: 889
Tree      Height: 146
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 3.5130550 secs] [Times: user=18.81 sys=0.08, real=3.52 secs] 
Heap after GC invocations=82070 (full 2561):
 par new generation   total 1474560K, used 48710K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,  29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 3.5236060 seconds
{Heap before GC invocations=82070 (full 2561):
 par new generation   total 1474560K, used 1359430K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000)
  from space 163840K,  29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
2014-02-23T11:55:59.448-0500: 2119968.196: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 142756036
Max   Chunk Size: 326281
Number of Blocks: 160573
Av.  Block  Size: 889
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119968.198: [ParNew (0: promotion failure size = 131074)  (1: promotion failure size = 131074)  (2: promotion failure size = 131074)  (4: promotion failure size = 131074)  (7: promotion failure size = 131074)  (promotion failed)
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   34318480 bytes,   34318480 total
: 1359430K->1353373K(1474560K), 1.5971880 secs]2119969.795: [CMSCMS: Large block 0x000000073aa59270
: 4586726K->3600612K(6750208K), 149.1735470 secs] 5921928K->3600612K(8224768K), [CMS Perm : 23966K->23925K(40012K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 403131826
Max   Chunk Size: 403131826
Number of Blocks: 1
Av.  Block  Size: 403131826
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 150.7724630 secs] [Times: user=28.89 sys=11.57, real=150.75 secs] 
Heap after GC invocations=82071 (full 2562):
 par new generation   total 1474560K, used 0K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
  to   space 163840K,   0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000)
 concurrent mark-sweep generation total 6750208K, used 3600612K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23925K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 150.7833360 seconds

I also have setup a nightly job that forces GC on the instance at 2am in hopes that it would alleviate these problems — and it has done some good, but we're still having problems every few days with nodes.

— Added detail
After changing NewGen to 2GB (From 1600M), we saw a 15 second pause within first 12 hours of running.

… Class histograms dump:

15772.786: [Class Histogram (before full gc): 
 num     #instances         #bytes  class name
----------------------------------------------
   1:       9743656      526609104  [B
   2:       9176097      440452656  java.nio.HeapByteBuffer
   3:       8152787      326111480  java.math.BigInteger
   4:       8126173      321393760  [I
   5:        207997      307212904  [J
   6:       8940730      214577520  java.lang.Long
   7:       8121743      194921832  org.apache.cassandra.db.DecoratedKey
   8:       8121399      129942384  org.apache.cassandra.dht.BigIntegerToken
   9:        174374       78049856  [Ljava.lang.Object;
  10:        914261       43884528  edu.stanford.ppl.concurrent.SnapTreeMap$Node
  11:       1112269       35592608  java.util.concurrent.ConcurrentHashMap$HashEntry
  12:       1101827       35258464  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  13:        306955       29467680  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
  14:       1216257       29190168  org.apache.cassandra.cache.KeyCacheKey
  15:       1111387       26673288  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  16:        427695       20529360  edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
  17:        417278       16691120  org.apache.cassandra.db.ExpiringColumn
  18:        306955        9822560  edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
  19:        401296        9631104  org.apache.cassandra.db.ISortedColumns$DeletionInfo
  20:           305        8691760  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  21:        528460        8455360  java.util.concurrent.atomic.AtomicReference
  22:        263965        8446880  edu.stanford.ppl.concurrent.SnapTreeMap
  23:        342775        8226600  org.apache.cassandra.db.ColumnFamily
  24:        263965        6335160  org.apache.cassandra.db.AtomicSortedColumns$Holder
  25:        193133        6180256  org.apache.cassandra.db.DeletedColumn
  26:        179519        5744608  java.util.ArrayList$Itr
  27:        221515        5316360  java.util.concurrent.ConcurrentSkipListMap$Node
  28:         35607        5025400  
  29:         35607        4852600  
  30:         61748        4445856  org.apache.cassandra.io.sstable.SSTableIdentityIterator
  31:        264409        4230544  edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr
  32:         52605        4216504  [Ljava.util.HashMap$Entry;
  33:          3389        3868680  
  34:        221414        3542624  org.apache.cassandra.db.AtomicSortedColumns
  35:        107140        3428480  org.apache.cassandra.db.Column
  36:         69734        3347232  java.util.TreeMap
  37:        131777        3162648  java.util.ArrayList
  38:        111752        2682048  java.util.concurrent.ConcurrentSkipListMap$Index
  39:         51879        2490192  java.util.HashMap
  40:          3389        2438184  
  41:          2978        2284352  
  42:         55660        1781120  org.apache.cassandra.io.util.DataOutputBuffer
  43:         69664        1671936  org.apache.cassandra.db.TreeMapBackedSortedColumns
  44:         51697        1654304  org.apache.cassandra.db.ArrayBackedSortedColumns
  45:         61842        1485456  [Ljava.nio.ByteBuffer;
  46:         61748        1481952  org.apache.cassandra.utils.BytesReadTracker
  47:          3533        1462144  
  48:         55672        1336128  org.apache.cassandra.io.util.FastByteArrayOutputStream
  49:         21743        1318848  [C
  50:         33109        1268616  [Lorg.apache.cassandra.db.IColumn;
  51:         28607        1144280  java.util.HashMap$KeyIterator
  52:         17623        1127616  [Ljava.util.Hashtable$Entry;
  53:         35003        1120096  java.util.Vector
  54:         17335        1109440  com.sun.jmx.remote.util.OrderClassLoaders
  55:         17601         844848  java.util.Hashtable
  56:         18528         741120  org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
  57:         22634         724288  java.lang.String
  58:         17514         700560  java.security.ProtectionDomain
  59:         19829         634528  java.util.HashMap$Entry
  60:         25327         607848  org.apache.cassandra.utils.IntervalTree.Interval
  61:         17513         560416  java.security.CodeSource
  62:         22026         528624  java.lang.Double
  63:         21740         521760  java.util.concurrent.LinkedBlockingDeque$Node
  64:         18782         486728  [[J
  65:         18660         447840  org.apache.cassandra.utils.BloomFilter
  66:         18660         447840  org.apache.cassandra.utils.obs.OpenBitSet
  67:         18529         444696  org.apache.cassandra.db.compaction.PrecompactedRow
  68:          3717         444688  java.lang.Class
  69:         18528         444672  org.apache.cassandra.db.ColumnIndexer$RowHeader
  70:         18528         444672  org.apache.cassandra.utils.BloomCalculations$BloomSpecification
  71:         24572         393152  java.lang.Object
  72:          5753         335568  [S
  73:          7326         293040  java.util.TreeMap$Entry
  74:         17817         285072  java.util.HashSet
  75:          5583         282488  [[I
  76:         17514         280224  java.security.ProtectionDomain$Key
  77:         17514         280224  [Ljava.security.Principal;
  78:          6612         264480  org.apache.cassandra.service.WriteResponseHandler
  79:          7862         251584  org.apache.cassandra.db.RowMutation
  80:         10083         241992  org.apache.cassandra.db.EchoedRow
  81:          6110         195520  org.apache.cassandra.utils.ExpiringMap$CacheableObject
  82:           311         179136  
  83:          2927         163912  org.apache.cassandra.thrift.TBinaryProtocol
  84:          6563         157512  org.apache.cassandra.net.Message
  85:          6289         150936  org.apache.cassandra.net.Header
  86:          6110         146640  org.apache.cassandra.net.CallbackInfo
  87:          5996         143904  java.util.concurrent.LinkedBlockingQueue$Node
  88:          8004         128064  java.util.HashMap$EntrySet
  89:          7885         126160  java.util.TreeMap$Values
  90:          7790         124640  java.util.concurrent.atomic.AtomicInteger
  91:          5132         123168  org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder
  92:          6640         106240  org.apache.cassandra.utils.SimpleCondition
  93:          6626         106016  java.util.HashMap$Values
  94:           830          92960  java.net.SocksSocketImpl
  95:          1076          86080  java.lang.reflect.Method
  96:          4176          66816  java.lang.Integer
  97:          2005          64160  java.lang.ThreadLocal$ThreadLocalMap$Entry
  98:          1592          63680  java.lang.ref.SoftReference
  99:          1586          63440  com.google.common.collect.SingletonImmutableMap
 100:          1743          55776  java.util.TreeMap$ValueIterator
 101:           470          48880  java.lang.Thread
 102:          1469          47008  java.util.concurrent.locks.AbstractQueuedSynchronizer$Node
 103:          1041          41640  java.lang.ref.Finalizer
 104:          1002          40080  java.util.LinkedHashMap$Entry
 105:          1603          38472  com.google.common.collect.SingletonImmutableSet
 106:          1586          38064  com.google.common.collect.ImmutableEntry
 107:           435          34800  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
 108:           970          31040  java.net.Inet4Address
 109:           425          30600  java.lang.reflect.Constructor
 110:           396          28512  java.lang.reflect.Field
 111:           825          26400  java.net.Socket
 112:          1050          25200  java.util.concurrent.atomic.AtomicLong
 113:           687          21984  java.util.concurrent.SynchronousQueue$TransferStack$SNode
 114:           387          21672  sun.security.provider.MD5
 115:           451          21648  java.util.concurrent.ThreadPoolExecutor$Worker
 116:           442          21216  java.net.SocketInputStream
 117:           525          21000  org.apache.cassandra.thrift.TCustomSocket
 118:           514          20560  org.apache.cassandra.db.CounterColumn
 119:           639          20448  
 120:           353          19768  org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT
 121:           762          18288  java.io.FileDescriptor
 122:           570          18240  org.apache.thrift.transport.TFramedTransport
 123:           939          18200  [Ljava.lang.Class;
 124:           546          17472  java.util.concurrent.locks.ReentrantLock$NonfairSync
 125:           532          17464  [Ljava.lang.String;
 126:           527          16864  java.util.Hashtable$Entry
 127:           298          16688  org.apache.cassandra.service.ClientState$1
 128:           298          16688  org.apache.cassandra.service.ClientState$2
 129:           405          16200  org.apache.cassandra.thrift.Column
 130:           323          15504  java.net.SocketOutputStream
 131:           298          14304  org.apache.cassandra.service.ClientState
 132:           585          14040  org.apache.thrift.transport.TMemoryInputTransport
 133:           570          13680  org.apache.thrift.TByteArrayOutputStream
 134:           284          13632  sun.nio.cs.UTF_8$Encoder
 135:           405          12960  org.apache.cassandra.thrift.ColumnOrSuperColumn
 136:           324          12960  java.io.BufferedInputStream
 137:           530          12720  org.apache.cassandra.utils.EstimatedHistogram
 138:           525          12600  org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess
 139:           388          12416  java.security.MessageDigest$Delegate
 140:           129          12384  org.apache.cassandra.io.sstable.SSTableReader
 141:           305          12200  java.util.concurrent.ConcurrentHashMap$Segment
 142:           376          12032  org.apache.cassandra.db.columniterator.SSTableSliceIterator
 143:            47          10904  [Z
 144:           334          10688  org.apache.cassandra.net.OutboundTcpConnection$Entry
 145:           330          10560  java.lang.ref.WeakReference
 146:           435          10440  java.lang.ThreadLocal$ThreadLocalMap
 147:           426          10224  java.util.BitSet
 148:           314          10048  org.apache.cassandra.net.MessageDeliveryTask
... [removed anything taking less than 10K]
Total      72302201     2937460496
, 3.1156640 secs]
15775.902: [CMSCMS: Large block 0x0000000714aace48
: 2751863K->2568755K(6340608K), 12.0184460 secs]15787.921: [Class Histogram (after full gc): 
 num     #instances         #bytes  class name
----------------------------------------------
   1:       8644126      451299632  [B
   2:       8434549      404858352  java.nio.HeapByteBuffer
   3:       7759859      310394360  java.math.BigInteger
   4:         10686      293655512  [J
   5:       7763265      248560200  [I
   6:       8630461      207131064  java.lang.Long
   7:       7759615      186230760  org.apache.cassandra.db.DecoratedKey
   8:       7759483      124151728  org.apache.cassandra.dht.BigIntegerToken
   9:          1825       64173960  [Ljava.lang.Object;
  10:       1096613       35091616  java.util.concurrent.ConcurrentHashMap$HashEntry
  11:       1092266       34952512  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  12:       1092266       26214384  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  13:       1092266       26214384  org.apache.cassandra.cache.KeyCacheKey
  14:        532463       25558224  edu.stanford.ppl.concurrent.SnapTreeMap$Node
  15:        221414       21255744  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
  16:        411588       16463520  org.apache.cassandra.db.ExpiringColumn
  17:        341705       16401840  edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
  18:           305        8691760  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  19:        442919        7086704  java.util.concurrent.atomic.AtomicReference
  20:        221414        7085248  edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
  21:        221414        7085248  edu.stanford.ppl.concurrent.SnapTreeMap
  22:        221515        5316360  java.util.concurrent.ConcurrentSkipListMap$Node
  23:        221418        5314032  org.apache.cassandra.db.ColumnFamily
  24:        221418        5314032  org.apache.cassandra.db.ISortedColumns$DeletionInfo
  25:        221414        5313936  org.apache.cassandra.db.AtomicSortedColumns$Holder
  26:         35593        5023912  
  27:         35593        4850696  
  28:          3382        3862240  
  29:        116355        3723360  org.apache.cassandra.db.DeletedColumn
  30:        221414        3542624  org.apache.cassandra.db.AtomicSortedColumns
  31:        221414        3542624  edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr
  32:        111752        2682048  java.util.concurrent.ConcurrentSkipListMap$Index
  33:          3382        2433816  
  34:          2971        2279488  
  35:          3533        1462144  
  36:         11281         891344  [C
  37:          3710         443904  java.lang.Class
  38:         12509         400288  java.lang.String
  39:         23066         369056  java.lang.Object
  40:          5742         334480  [S
  41:          5417         275792  [[I
  42:          8620         206880  java.lang.Double
  43:          8596         206304  java.util.concurrent.LinkedBlockingDeque$Node
  44:           311         179136  
  45:          1631         139152  [Ljava.util.HashMap$Entry;
  46:          4006         128192  org.apache.cassandra.db.Column
  47:          3630         116160  java.util.HashMap$Entry
  48:          1066          85280  java.lang.reflect.Method
  49:          1310          62880  java.util.HashMap
  50:           509          57008  java.net.SocksSocketImpl
  51:          1178          47120  java.lang.ref.SoftReference
  52:           228          40752  [[J
  53:           411          29592  java.lang.reflect.Constructor
  54:           703          28120  java.lang.ref.Finalizer
  55:           367          26424  java.lang.reflect.Field
  56:           248          25792  java.lang.Thread
  57:           804          25728  java.lang.ThreadLocal$ThreadLocalMap$Entry
  58:           554          22160  java.util.LinkedHashMap$Entry
  59:           431          20688  java.net.SocketInputStream
  60:           514          20560  org.apache.cassandra.db.CounterColumn
  61:           835          20040  java.util.concurrent.atomic.AtomicLong
  62:           912          17768  [Ljava.lang.Class;
  63:           546          17472  java.util.concurrent.locks.ReentrantLock$NonfairSync
  64:           524          17272  [Ljava.lang.String;
  65:           535          17120  java.net.Inet4Address
  66:          1068          17088  java.lang.Integer
  67:           213          17040  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
  68:           708          16992  java.util.ArrayList
  69:          1043          16688  java.util.concurrent.atomic.AtomicInteger
  70:           684          16416  java.io.FileDescriptor
  71:           506          16192  java.util.Hashtable$Entry
  72:           504          16128  java.net.Socket
  73:           184          15296  [Ljava.util.Hashtable$Entry;
  74:           264          14784  org.apache.cassandra.thrift.TBinaryProtocol
  75:           305          12200  java.util.concurrent.ConcurrentHashMap$Segment
  76:           506          12144  org.apache.cassandra.utils.EstimatedHistogram
  77:           201          11256  org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT
  78:           117          11232  org.apache.cassandra.io.sstable.SSTableReader
  79:           229          10992  java.util.concurrent.ThreadPoolExecutor$Worker
  80:            47          10904  [Z
  81:           320          10240  java.lang.ref.WeakReference
  82:           319          10208  java.util.Vector
  83:           181          10136  sun.security.provider.MD5
... [removed anything taking up less than 10K]
Total      65302653     2582483384
, 2.5709140 secs]
 2951753K->2568755K(8183808K), [CMS Perm : 21044K->21000K(35192K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 482780727
Max   Chunk Size: 482780727
Number of Blocks: 1
Av.  Block  Size: 482780727
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 17.7063220 secs] [Times: user=16.57 sys=0.05, real=17.70 secs] 
Heap after GC invocations=1233 (full 21):
 par new generation   total 1843200K, used 0K [0x00000005fae00000, 0x0000000677e00000, 0x0000000677e00000)
  eden space 1638400K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000065ee00000)
  from space 204800K,   0% used [0x000000065ee00000, 0x000000065ee00000, 0x000000066b600000)
  to   space 204800K,   0% used [0x000000066b600000, 0x000000066b600000, 0x0000000677e00000)
 concurrent mark-sweep generation total 6340608K, used 2568755K [0x0000000677e00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 35192K, used 21000K [0x00000007fae00000, 0x00000007fd05e000, 0x0000000800000000)
}
Total time for which application threads were stopped: 17.7081220 seconds

Best Answer

First an extra tip, consider to also add this to your GC settings. It will dump a histogram on full GCs which you can use to figure out what is occupying the heap so badly:

JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"

Also look at your system.log for messages from MemoryMeter.java. Those usually spit out during these situations and can be handy to get more clues.

From the messages in your GC Log, it seems you have promotions failures:

2119968.198: [ParNew (0: promotion failure size = 131074)  (1: promotion failure size = 131074)  (2: promotion failure size = 131074)  (4: promotion failure size = 131074)  (7: promotion failure size = 131074)  (promotion failed)
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   34318480 bytes,   34318480 total

This means that you have objects leaving inside your HEAP for too long and your heap gets fragmented causing it to not have enough room to promote objects from new generation to survivor or old generation. This is not normal for the strong hardware that you have, but it did happen to me even on a 16 core machine with 70Gb, so here are a few things to look at serially so you can identify the problem. If you had that histogram I told you about we could jump into conclusions faster but here is the list to try:

1.If you have enabled multi-threaded compactions, disable it;

2.How is your CPU doing? If high utilization, reduce concurrent compactors which defaults to 8 threads in your system to 4;

3.Cassandra 1.1 has a bug with Key Cache size calculation. It uses a hardcoded size of 48 bytes to calculate the heap space used for key cache and if your application produces keys that are longer than that, your keycache usage on the heap will be significantly higher than what you actually set it to in casssandra.yaml and it will cause a lot of heap pressure specially stuff that sits there for a long time causing promotion failures:

References: https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/service/CacheService.java;hb=02672936#l102

https://issues.apache.org/jira/browse/CASSANDRA-4315

You can examine your key cache usage from nodetool info command or appropriate JMX MBean. If it is usually full, that could be a clue; unfortunately the fix is not available in 1.1. You should upgrade to 1.2 if you have this issue or to work around it without upgrade, lower the value of your key cache size in cassandra.yaml significantly;

4.Are you using row cache? Try turning that off.

5.Wide Row or what I usually call the row of death is read. If your application is producing wide rows (1 key with lots of columns), and is trying to read a large slice of that row (>100 columns) chances are depending on the size of data inside those columns, you will make your node come into halt like this with long GCs. Try consulting your developers and make sure you are reading small slices at a time from wide rows;

You can look at nodetool cfstats command to see what is the Max Row size of your Column Families. If you have a CF with max row size that is few hundred MBs or more, it could be a clue;

6.Is your application mixed read/write heavy? If you have a lot of writes it could be the memtable pressure. Try reducing the memtable size in cassandra.yaml.

7.For read/write heavy loads chances are the default tenuring threshold flag in GC setting is very low causing premature promotions. This could be very likely a symptom of your problem. Try increasing the tenuringthreshold to a large value like 32 to prevent premature promotions so that ParNew collects most garbage in young generation.

You can observer this tenuring problem and promotion failures by running jstat command and see how the survivor spaces in heap are utilized. I bet you they are not utilized much and objects go straight from eden to old generation. ;)

8.If all failed, chances are your heap is not big enough to handle your workload, so you have to adjust the size. This is less likely and you should consider it as your last resort and do it with caution.

8A: Increase your HEAP_NEWGEN. Try increasing it to 2GB out of your 8GB. Also increase the tenuring threshold like above;

8B: If the above improved the situation but did not eliminate the problem 100%, consider increasing your MAX_HEAP_SIZE 2GB at a time. Increase NEWGEN with it. Make sure it does not go beyond 1/4 of the MAX HEAP.

Continue up to 16GB but no more than that since you only have 24GB total memory.

About myself: I had all the above problems and I eliminated them one at a time. Currently I run my nodes with 24GB heap and 6GB new gen and tenuring threshold of 32. The maximum GC pauses I have is <250ms 1-2 times a day during hefty pick hours.

Hope this helps you.

Here are a few articles about GC: http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Java%20Heap%20%26%20GC%20Tuning http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html http://java.dzone.com/articles/how-tame-java-gc-pauses http://grokbase.com/t/cassandra/user/113qf50x4r/parnew-promotion-failed

Related Topic