Challenges for supporting copying GC include
Handling of roots:
Upstream changes needed for copying GC
- mmtk-core
- Ruby
- Make more PPP types movable by implementing updating functions and removing
rb_gc_mark.
Correctness goals
Performance goals
Un-update-able references
One challenge of supporting copying GC in Ruby is that some object references cannot be updated. Specifically,
- Due to conservative stack scanning, local variables (in C functions, not in Ruby functions) cannot be updated.
- Some fields in global data structures are marked with
rb_gc_mark during gc_mark_roots. Those fields cannot be updated.
- Some objects have fields that cannot be updated.
Because object references held in those places cannot be updated, the objects pointed by those reference must be pinned. In other words, if an object has type T_DATA, T_IMEMO, T_HASH or has the EXIVAR flag, the object itself can move, but it pins its children.
Recording "potential pinning parents"
The Ruby binding shall maintain a runtime list of "potential pinning parents" (PPP for short). That includes all types of object in (3) in the previous section. Specifically,
- When
T_DATA, and those listed T_MEMO are instantiated, we add them to the PPP list.
- When a
T_HASH becomes compare_by_identity or when an object gets the EXIVAR flag, we add it to the PPP list.
Note that some PPPs don't always pin their children. Some T_DATA can actually move their children because it is modern, and the developers used rb_gc_mark_movable and provided the dcompact function. For other PPPs, their pinning fields may just be nil at the moment of the GC. But we have to add them to the PPP list conservatively because we don't know if a T_DATA is modern enough or if any field is nil.
We visit all PPPs before GC and pin their children (via pinning fields only), so when GC starts, those children won't move. Note that
- those children are not kept alive, and
- the PPPs' children's children are not recursively pinned.
After GC, re-visit the PPP list, and remove all dead objects from it. Unpin live objects in the PPP list.
More language-neutral discussions on this topic are here: mmtk/mmtk-core#690
Ways to reduce the number of potential pinning parents
- Introduce declarative marking.
T_DATA objects that support declarative marking are not considered PPPs.
- Whitelist
T_DATA types in Ruby core/stdlib that are known not to pin children
- Fix
T_DATA types in Ruby core/stdlib, replace their rb_gc_mark with rb_gc_mark_movable, and introduce compaction functions for them.
Address-aware data structures
Global address-to-ID table
In Ruby, objects may optionally have an ID. Once the ID of an object is seen, it will never change as long as it is alive. In vanilla Ruby, it maintains two tables:
typedef struct rb_objspace {
// ...
st_table *id_to_obj_tbl;
st_table *obj_to_id_tbl;
// ...
} rb_objspace_t;
Those table are maintained when objects are moved (in gc_move) and entries are removed when objects die (in obj_free).
In MMTk, we should consider them weak maps, and use our existing weak reference processing framework to handle them. Effectively, those table entries are things that gets garbage-collected when their owners (the object) die. We can treat both id_to_obj_tbl and obj_to_id_tbl as one single bi-directional weak map.
Interestingly, this is exactly what WeakReferences are intended for. In Java documentation:
Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings.
Global address-to-gen_ivtbl map
If an object is not T_OBJECT, its instance variables (@foo, @bar, ... in Ruby language) are stored in an external generic instance variable table (gen_ivtbl), and is associated to the object via a global hash map (generic_iv_tbl_), with the object address as key, and the gen_ivtbl as value.
This global map (generic_iv_tbl_ in variable.c) needs to be updated whenever an object is moved. In vanilla Ruby, this is done in gc_move, by calling rb_mv_generic_ivar
Like the address-to-ID table, this is also a "canonical map", and should be treated as a weak map with weak keys.
Challenges for supporting copying GC include
finalizer_trableHandling of roots:
mmtk::memory_manager::pin_object)Make some roots movable.Upstream changes needed for copying GC
Scan roots usingtrace_objectdirectly. Exposetrace_objectto the VM binding mmtk-core#710Needed to support movable rootsNo. We probably don't need it.Adding support for red/black roots mmtk-core#706This PR enables transitively pinning (TP) objects from particular roots for Immix/StickyImmix mmtk-core#897rb_gc_mark.Correctness goals
Performance goals
Un-update-able references
One challenge of supporting copying GC in Ruby is that some object references cannot be updated. Specifically,
rb_gc_markduringgc_mark_roots. Those fields cannot be updated.Because object references held in those places cannot be updated, the objects pointed by those reference must be pinned. In other words, if an object has type
T_DATA,T_IMEMO,T_HASHor has the, the object itself can move, but it pins its children.EXIVARflagRecording "potential pinning parents"
The Ruby binding shall maintain a runtime list of "potential pinning parents" (PPP for short). That includes all types of object in (3) in the previous section. Specifically,
T_DATA, and those listedT_MEMOare instantiated, we add them to the PPP list.T_HASHbecomescompare_by_identityor when an object gets theEXIVARflag, we add it to the PPP list.Note that some PPPs don't always pin their children. Some
T_DATAcan actually move their children because it is modern, and the developers usedrb_gc_mark_movableand provided thedcompactfunction. For other PPPs, their pinning fields may just benilat the moment of the GC. But we have to add them to the PPP list conservatively because we don't know if aT_DATAis modern enough or if any field isnil.We visit all PPPs before GC and pin their children (via pinning fields only), so when GC starts, those children won't move. Note that
After GC, re-visit the PPP list, and remove all dead objects from it. Unpin live objects in the PPP list.
More language-neutral discussions on this topic are here: mmtk/mmtk-core#690
Ways to reduce the number of potential pinning parents
T_DATAobjects that support declarative marking are not considered PPPs.T_DATAtypes in Ruby core/stdlib that are known not to pin childrenT_DATAtypes in Ruby core/stdlib, replace theirrb_gc_markwithrb_gc_mark_movable, and introduce compaction functions for them.Address-aware data structures
Global address-to-ID table
In Ruby, objects may optionally have an ID. Once the ID of an object is seen, it will never change as long as it is alive. In vanilla Ruby, it maintains two tables:
Those table are maintained when objects are moved (in
gc_move) and entries are removed when objects die (inobj_free).In MMTk, we should consider them weak maps, and use our existing weak reference processing framework to handle them. Effectively, those table entries are things that gets garbage-collected when their owners (the object) die. We can treat both
id_to_obj_tblandobj_to_id_tblas one single bi-directional weak map.Interestingly, this is exactly what WeakReferences are intended for. In Java documentation:
Global address-to-gen_ivtbl map
If an object is not
T_OBJECT, its instance variables (@foo,@bar, ... in Ruby language) are stored in an external generic instance variable table (gen_ivtbl), and is associated to the object via a global hash map (generic_iv_tbl_), with the object address as key, and the gen_ivtbl as value.This global map (
generic_iv_tbl_invariable.c) needs to be updated whenever an object is moved. In vanilla Ruby, this is done ingc_move, by callingrb_mv_generic_ivarLike the address-to-ID table, this is also a "canonical map", and should be treated as a weak map with weak keys.