WarmUp

Basic operations in GC

  • Mark: mark the reachable objects
  • Sweep: clear the unreachable objects
  • Copy-Move: Copy the reachable objects to another space
  • Compaction: Compact the reachable objects in-place

    Some GC docs regard the Copy-Move operation as Compaction as well.

Parallel vs Concurrent

  • Parallel: GC process runs multiple threads in parallel with STW.
  • Concurrent: GC runs concurrently with the application without STW.

Why STW? - GC races

GC races with Application:

  • Change object reference when marking
  • Access object when reference is changing (Compact, Copy&Move)

GC barriers

ZGC and Shenandoah GC inject GC-related code before loading/storing the object on the assembly code.

1
2
3
4
5
0. loading: f = obj.field  /   storing: obj.f = o
// ------ barrier -------
1. get the state of address(obejct)
2. if the state indicates a possible GC race, then STW and do something
// ------ barrier -------

Garbage Collector Landscape

Parallel New + Parallel Old

The ParallelOld GC is a two-generational parallel STW garbage collector, which means that whenever a garbage collection occurs in either generation, all application threads are stopped, and the GC work is performed using multiple threads.

  • new generation: copy
  • old generation: compaction

Parallel New + CMS(Concurrent Mark Sweep)

CMS was developed in response to a growing number of applications that demanded a collector with lower worst-case pause.
2022-12-08T211550

major change against Parallel Old

  • use sweep to achieve concurrent processing

    • because the object to be released will not be accessed by the application threads.

    • Cons: more memory fragments(holes)

      • How CMS handles fragments? CMS FGC sometimes triggers compaction. (Related parameters about when to trigger compaction: -XX:+UseCMS-CompactAtFullCollection, -XX:CMSFullGCsBefore-Compaction)
      • So the worst pause of CMS is decided by compaction instead of sweep.
  • introduce multi-stage mark

    1. mark alive objects concurrently
    2. remark fewer objects which have changed reference at concurrent mark

G1 (Garbage-first)

Young and old generation are never a contiguous chunk of memory. Both the young and old generations are a set of regions where most GC operations can be applied individually to each region. Also, regions that belong to the same set and therefore same generation do not need to be contiguous in memory.
2022-12-08T214448

Optimization of region

  1. Use fine-grained(region by region) copy to replace entire generation compaction/copy.
  2. Regions filled with all alive objects skip compaction and copying.
    • it’s really fast when changing the color(generation) of the region, e.g. survivor->old
  3. Make GC pauses to achieve predictability, namely, the maximum pause time of GC could be controlled by GC could be stopped at any stage with completing a part of regions and the size of generations could be adjusted.

Shenandoah GC (SGC) 2.0

Work for all JDK LTS version(8,11,17) and 32/64 bits system!!!

Compared with G1

  • Scalable Low Latency GC
  • All regions in single Generations
    • It prefers to relocate region with most garbage/ least live object first.
  • Concurrent Compaction&mark
  • How to reduce GC races
    • Postpone Remark: SGC postponed the remark(remap) to next GC mark. Remark is a step to remap the reference which is changed by application threads in-or-after the latest concurrent mark.
      2022-12-11T213732
    • Barrier: Uses load/store barrier to do tiny relocation with STW.
      1
      2
      3
      4
      //Load Barrier(LRB - Load Reference Barrier)
      Object f = obj.f;
      if (in_evac_phase && in_collection_set(obj) && !is_forwarded(obj))
      slow_path(); //do something with STW to handle GC races

Load Barrier example: What if GC barrier hit conflict with normal GC relocation? GC relocation skip it, since GC barrier did relocation of the page.

  1. Object is in Evacuation (GC Copy),
    SGC_01
  2. Application tries to load the object and triggers LRB, the slow_path clones a new object
    SGC_02
  3. Application updates the forward pointer, the ongoing GC copy will skip resetting forward pointer when finding the forward pointer set already.
    SGC_03
  4. Application update reference automatically when meeting a valid forward pointer
    SGC_04

ZGC (Production X64 JDK17)

(Introduced in JDK11, production in 64bits JDK17)

ZGC and SGC are like a twin, but ZGC works different on managing the object state. Since 64 bits system has some reserved bits, ZGC could involve more efficient methods.

ZGC is not the upgraded version of SGC. Actually, ZGC was introduced before SGC and they are maintained by different teams(ZGC - ORCALE, SGC - RedHat). ZGC and SGC share the similar design and we could regard SGC as a adaptable version for 32bit and legacy JDK.

Compared with SGC

  • Fine-grained Region ~ ZPages

    • Small (2 MiB - object size up to 256 KiB)
    • Medium (32 MiB - object size up to 4 MiB)
    • Large (4+ MiB - object size > 4 MiB)
  • Mark objects with colored pointers:
    Uses 4 reserved bits from 64-bits address to represent the state of the object.
    x

    • MultiMapping
      multi_mapping

      Note: some memory indicators(RSS) may get several times size as it takes 3 virtual address. We’d better use PSS to track memory usage.

  • Load barrier with color bits

    1
    2
    3
    4
    Object f = obj.field;
    if (addr_of(f) & wrong_gc_color) { //Wrong color => take action
    slow_path() // do something with STW to handle GC races
    }

How to Choose GC

Compatibility

  • Parallel: default for JDK (<=)8
  • CMS: introduced in JDK 5, removed in JDK14
  • G1: default in JDK 9
  • ZGC: production in 64-bits JDK 17
  • SGC: support all JDK with LTS versions(8,11,17)

Useful GC Parameters

  • Common GC parameters

    • -Xms: min heap size
    • -Xmx: max heap size
    • -XX:ConcGCThreads: the number of threads in concurrent process
    • -XX:ParallelGCThreads: the number of threads in STW process
    • -XX:SoftRefLRUPolicyMSPerMB: impact the early release of soft reference.
  • Parallel New

    • -XX:MaxTenuringThreshold: specifies for how many minor GC cycles an object will stay in the survivor spaces until it finally gets tenured into the old space.
    • -XX:SurvivorRatio(8 by default): the ratio of Eden size to Survivor size.
    • -XX:MaxNewSize, -XX:NewSize: the size of the new generation
  • G1

    • -XX:MaxGCPauseMillis: It’s the most important parameter for G1, G1 will adjust young generation size(-XX:G1NewSizePercent) automatically according to real pause time.
  • ZGC, SGC:

    • Aha, maybe it’s enough to set common GC parameters in most cases.
  • GC log print & analysis

    • -XX:+PrintGC
    • -XX:+PrintGCTimeStamps
    • -XX:+PrintGCDetails
    • -Xloggc: gc file

Suggest to use third-party GC visual tool to review GC log, such as GCeasy

No Universal GC

  • Sensitive to worst or P99 pause(<10ms). e.g. web serivces, real-time big-data system
    • ZGC for 64bit system
    • SGC for legacy JDK(8,11,16) or 32-bit system
  • has very strict and clear pause-time goals and a modest overall throughput
    • G1?
  • Requires high throughput but does not care about the worst pause. e.g. batch task
    • Parallel, CMS?

Note: I mark G1, Parallel, CSM with ?, as overload is related the cpu load/usage but may not stands for the reduction of throughput.

Benchmark: Quick glance at Cassandra
RI = read-intensive (75% read, 25% write)

  • Througput
  • Pause

Reference