JVM GC(2) | Modern Garbage Collectors - CMS, G1, ZGC, Shenandoah GC
WarmUp
Basic operations in GC
- Mark: mark the reachable objects
- Sweep: clear the unreachable objects
- Copy-Move: Copy the reachable objects to another space
- Compaction: Compact the reachable objects in-place
Some GC docs regard the Copy-Move operation as Compaction as well.
Parallel vs Concurrent
- Parallel: GC process runs multiple threads in parallel with STW.
- Concurrent: GC runs concurrently with the application without STW.
Why STW? - GC races
GC races with Application:
- Change object reference when marking
- Access object when reference is changing (Compact, Copy&Move)
GC barriers
ZGC and Shenandoah GC inject GC-related code before loading/storing the object on the assembly code.
1 | 0. loading: f = obj.field / storing: obj.f = o |
Garbage Collector Landscape
Parallel New + Parallel Old
The ParallelOld GC is a two-generational parallel STW garbage collector, which means that whenever a garbage collection occurs in either generation, all application threads are stopped, and the GC work is performed using multiple threads.
- new generation:
copy
- old generation:
compaction
Parallel New + CMS(Concurrent Mark Sweep)
CMS was developed in response to a growing number of applications that demanded a collector with lower worst-case pause.
major change against Parallel Old
use
sweep
to achieve concurrent processingbecause the object to be released will not be accessed by the application threads.
Cons: more memory fragments(holes)
- How CMS handles fragments? CMS FGC sometimes triggers compaction. (Related parameters about when to trigger compaction:
-XX:+UseCMS-CompactAtFullCollection
,-XX:CMSFullGCsBefore-Compaction
) - So the worst pause of CMS is decided by compaction instead of sweep.
- How CMS handles fragments? CMS FGC sometimes triggers compaction. (Related parameters about when to trigger compaction:
introduce multi-stage mark
- mark alive objects concurrently
- remark fewer objects which have changed reference at
concurrent mark
G1 (Garbage-first)
Young and old generation are never a contiguous chunk of memory. Both the young and old generations are a set of regions
where most GC operations can be applied individually to each region. Also, regions that belong to the same set and therefore same generation do not need to be contiguous in memory.
Optimization of region
- Use fine-grained(region by region) copy to replace entire generation compaction/copy.
- Regions filled with all alive objects skip compaction and copying.
- it’s really fast when changing the color(generation) of the region, e.g. survivor->old
- Make GC pauses to achieve predictability, namely, the maximum pause time of GC could be controlled by GC could be stopped at any stage with completing a part of regions and the size of generations could be adjusted.
Shenandoah GC (SGC) 2.0
Work for all JDK LTS version(8,11,17) and 32/64 bits system!!!
Compared with G1
- Scalable Low Latency GC
- All regions in single Generations
- It prefers to relocate region with most garbage/ least live object first.
- Concurrent Compaction&mark
- How to reduce GC races
- Postpone Remark: SGC postponed the remark(remap) to next GC mark. Remark is a step to remap the reference which is changed by application threads in-or-after the latest concurrent mark.
- Barrier: Uses load/store barrier to do tiny relocation with STW.
1
2
3
4//Load Barrier(LRB - Load Reference Barrier)
Object f = obj.f;
if (in_evac_phase && in_collection_set(obj) && !is_forwarded(obj))
slow_path(); //do something with STW to handle GC races
- Postpone Remark: SGC postponed the remark(remap) to next GC mark. Remark is a step to remap the reference which is changed by application threads in-or-after the latest concurrent mark.
Load Barrier example: What if GC barrier hit conflict with normal GC relocation? GC relocation skip it, since GC barrier did relocation of the page.
- Object is in Evacuation (GC Copy),
- Application tries to load the object and triggers LRB, the
slow_path
clones a new object- Application updates the forward pointer, the ongoing GC copy will skip resetting forward pointer when finding the forward pointer set already.
- Application update reference automatically when meeting a valid forward pointer
ZGC (Production X64 JDK17)
(Introduced in JDK11, production in 64bits
JDK17)
ZGC and SGC are like a twin, but ZGC works different on managing the object state. Since 64 bits system has some reserved bits, ZGC could involve more efficient methods.
ZGC is not the upgraded version of SGC. Actually, ZGC was introduced before SGC and they are maintained by different teams(ZGC - ORCALE, SGC - RedHat). ZGC and SGC share the similar design and we could regard SGC as a adaptable version for 32bit and legacy JDK.
Compared with SGC
Fine-grained Region ~ ZPages
- Small (2 MiB - object size up to 256 KiB)
- Medium (32 MiB - object size up to 4 MiB)
- Large (4+ MiB - object size > 4 MiB)
Mark objects with colored pointers:
Uses 4 reserved bits from 64-bits address to represent the state of the object.- MultiMapping
Note: some memory indicators(
RSS
) may get several times size as it takes 3 virtual address. We’d better use PSS to track memory usage.
- MultiMapping
Load barrier with color bits
1
2
3
4Object f = obj.field;
if (addr_of(f) & wrong_gc_color) { //Wrong color => take action
slow_path() // do something with STW to handle GC races
}
How to Choose GC
Compatibility
- Parallel: default for JDK (<=)8
- CMS: introduced in JDK 5, removed in JDK14
- G1: default in JDK 9
- ZGC: production in 64-bits JDK 17
- SGC: support all JDK with LTS versions(8,11,17)
Useful GC Parameters
Common GC parameters
-Xms
: min heap size-Xmx
: max heap size-XX:ConcGCThreads
: the number of threads in concurrent process-XX:ParallelGCThreads
: the number of threads in STW process-XX:SoftRefLRUPolicyMSPerMB
: impact the early release of soft reference.
Parallel New
-XX:MaxTenuringThreshold
: specifies for how many minor GC cycles an object will stay in the survivor spaces until it finally gets tenured into the old space.-XX:SurvivorRatio
(8 by default): the ratio of Eden size to Survivor size.-XX:MaxNewSize
,-XX:NewSize
: the size of the new generation
G1
-XX:MaxGCPauseMillis
: It’s the most important parameter for G1, G1 will adjust young generation size(-XX:G1NewSizePercent) automatically according to real pause time.
ZGC, SGC:
- Aha, maybe it’s enough to set common GC parameters in most cases.
GC log print & analysis
-XX:+PrintGC
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-Xloggc
: gc file
Suggest to use third-party GC visual tool to review GC log, such as GCeasy
No Universal GC
- Sensitive to worst or P99 pause(<10ms). e.g. web serivces, real-time big-data system
- ZGC for 64bit system
- SGC for legacy JDK(8,11,16) or 32-bit system
- has very strict and clear pause-time goals and a modest overall throughput
- G1
?
- G1
- Requires high throughput but does not care about the worst pause. e.g. batch task
- Parallel, CMS
?
- Parallel, CMS
Note: I mark G1, Parallel, CSM with ?
, as overload is related the cpu load/usage but may not stands for the reduction of throughput.
Benchmark: Quick glance at Cassandra
RI = read-intensive (75% read, 25% write)
- Througput
- Pause