Heap Debugging
This guide explains how to navigate Ruby's heap to find objects, diagnose memory issues, and understand object relationships.
Why Heap Debugging Matters
Ruby's garbage collector manages thousands to millions of objects across heap pages. When debugging memory leaks, performance issues, or trying to find specific objects, you need to navigate this heap efficiently. Standard GDB cannot understand Ruby's object layout or page structure, making manual heap inspection tedious and error-prone.
Use heap debugging when you need:
- Find leaked objects: Locate objects that should have been garbage collected
- Discover fiber instances: Find all fibers in a crashed or hung application
- Track object relationships: See what objects reference others
- Diagnose memory bloat: Understand what types of objects consume memory
Heap Structure Overview
Ruby's heap is organized as:
- Heap Pages: Fixed-size pages allocated from the system
- Slots: Each page contains fixed-size slots for objects
- Objects: Ruby objects (RBasic, RHash, RArray, etc.) stored in slots
- Object Space: The global heap manager (
objspace)
Scanning the Heap
Finding Objects by Type
Scan the heap for specific types of Ruby objects:
(gdb) rb-heap-scan --type RUBY_T_STRING --limit 10
Scanning heap for type 0x05, limit=10...
Found 10 object(s):
[0] $heap0 = <T_STRING@...> "..."
[1] $heap1 = <T_STRING@...> "..."
...
[9] $heap9 = <T_STRING@...> "..."
Objects saved in $heap0 through $heap9
Next scan address saved to $heap: 0x...
Run 'rb-heap-scan --from $heap ...' for next page
Finding Fibers
Find all fiber objects:
(gdb) rb-fiber-scan-heap
Scanning heap for Fiber objects...
Checked 45000 objects, found 12 fiber(s)...
Found 12 fiber(s):
Fiber #0: <T_DATA@...> → <struct rb_fiber_struct@...>
Status: SUSPENDED
Stack: <void *@...>
VM Stack: <VALUE *@...>
CFP: <rb_control_frame_t@...>
...
This uses a specialized scanner optimized for fibers.
How Heap Scanning Works
The scanner:
- Accesses
ruby_current_vm_ptr->gc->objspace - Iterates through
heap_pages->sorted->data[] - For each page, checks
total_slotsobjects - Identifies objects by their
flagsfield (bits 0-4 = type) - Filters for specific types if
--typeis specified
Common Types
Ruby type constants and their numeric values:
RUBY_T_STRING = 0x05 # Strings
RUBY_T_ARRAY = 0x07 # Arrays
RUBY_T_HASH = 0x08 # Hashes
RUBY_T_DATA = 0x0c # Data objects (like Fibers)
RUBY_T_OBJECT = 0x01 # Generic objects
Examples:
(gdb) rb-heap-scan --type RUBY_T_STRING --limit 20
(gdb) rb-heap-scan --type 0x08 --limit 10 # Same as RUBY_T_HASH
(gdb) rb-heap-scan --limit 100 # All types
Pagination
Continue scanning from where you left off:
(gdb) rb-heap-scan --type RUBY_T_STRING --limit 10
Scanning heap for type 0x05, limit=10...
Found 10 object(s):
[0] $heap0 = <T_STRING@...> "..."
...
Objects saved in $heap0 through $heap9
Next scan address saved to $heap: 0x...
(gdb) rb-heap-scan --type RUBY_T_STRING --limit 10 --from $heap
Scanning heap for type 0x05, limit=10...
Starting from saved position: 0x...
Found 10 object(s):
[0] $heap0 = <T_STRING@...> "..."
...
Understanding Object Layout
Object Headers (RBasic)
Every Ruby object starts with:
struct RBasic {
VALUE flags; # Type and flag information
VALUE klass; # Object's class
}
The flags field encodes:
- Object type (bits 0-4): T_STRING, T_HASH, T_ARRAY, etc.
- GC flags (bit 5-11): Mark bits, frozen status, etc.
- Type-specific flags (bits 12+): Embedded vs heap, size info, etc.
Type Detection
Check an object's type:
(gdb) set $obj = (VALUE)0x7f8a1c888888
(gdb) set $basic = (struct RBasic *)$obj
(gdb) p/x $basic->flags
$1 = 0x20040005
(gdb) p/x $basic->flags & 0x1f
$2 = 0x5 # T_STRING (0x05)
Or use Ruby commands:
obj_type = flags & 0x1f
type_names = {
0x05: "T_STRING",
0x07: "T_ARRAY",
0x08: "T_HASH",
0x0c: "T_DATA",
...
}
Practical Examples
Finding All Objects of a Type
Scan for all string objects (requires custom script):
python
for obj, flags in iterate_heap():
if (flags & 0x1f) == 0x05: # T_STRING
print(f"String at {obj}")
end
Locating Large Arrays
Find arrays consuming significant memory:
python
for obj, flags in iterate_heap():
if (flags & 0x1f) == 0x07: # T_ARRAY
rarray = obj.cast(gdb.lookup_type("struct RArray").pointer())
if (flags & (1 << 12)) == 0: # Heap array (not embedded)
length = int(rarray["as"]["heap"]["len"])
if length > 1000:
print(f"Large array at {obj}: {length} elements")
end
Finding Objects by Content
Search for strings containing specific text:
python
import gdb
def find_strings_containing(search_text):
"""Find all Ruby strings containing specific text"""
matches = []
for obj, flags in iterate_heap():
if (flags & 0x1f) == 0x05: # T_STRING
try:
rstring = obj.cast(gdb.lookup_type("struct RString").pointer())
length = int(rstring["len"])
# Read string content (simplified)
if length < 1000: # Reasonable size
# ... (read string bytes) ...
if search_text in string_content:
matches.append(obj)
except:
pass
return matches
end
Memory Analysis
Heap Page Statistics
Understand heap organization:
(gdb) p ruby_current_vm_ptr->gc->objspace->heap_pages->allocated_pages
$1 = 1250 # Total pages allocated
(gdb) p ruby_current_vm_ptr->gc->objspace->heap_pages->sorted->meta->length
$2 = 1250 # Accessible pages
# Average objects per page:
(gdb) p 67890 / 1250
$3 = 54 # ~54 objects per page
Object Slot Details
Examine a specific heap page:
(gdb) set $page = ruby_current_vm_ptr->gc->objspace->heap_pages->sorted->data[0]
(gdb) p $page->start # First object address
$4 = 0x7f8a1c000000
(gdb) p $page->total_slots # Objects in this page
$5 = 64
(gdb) p $page->slot_size # Bytes per slot
$6 = 40
Memory Consumption
Calculate memory usage by type:
python
type_counts = {}
type_memory = {}
for obj, flags in iterate_heap():
obj_type = flags & 0x1f
type_counts[obj_type] = type_counts.get(obj_type, 0) + 1
# Each object consumes at minimum one slot
type_memory[obj_type] = type_memory.get(obj_type, 0) + 40 # slot_size
for obj_type in sorted(type_counts.keys()):
count = type_counts[obj_type]
memory_kb = type_memory[obj_type] / 1024
print(f"Type 0x{obj_type:02x}: {count:6} objects, {memory_kb:8.1f} KB")
end
Advanced Techniques
Custom Heap Iterators
The fiber scanning code provides a reusable heap iterator:
python
# In your custom GDB script:
from fiber import RubyFiberDebug
debug = RubyFiberDebug()
if debug.initialize():
for obj, flags in debug.iterate_heap():
# Your custom logic here
pass
end
Finding Objects by Reference
Locate what holds a reference to an object:
python
target_address = 0x7f8a1c888888
for obj, flags in iterate_heap():
obj_type = flags & 0x1f
# Check if this is a hash
if obj_type == 0x08:
rhash = obj.cast(gdb.lookup_type("struct RHash").pointer())
# ... iterate hash entries ...
# ... check if any value == target_address ...
# Check if this is an array
elif obj_type == 0x07:
# ... iterate array elements ...
end
This helps track down unexpected object retention.
Heap Fragmentation Analysis
Check how objects are distributed across pages:
python
page_utilization = []
objspace = gdb.parse_and_eval("ruby_current_vm_ptr->gc->objspace")
allocated_pages = int(objspace["heap_pages"]["allocated_pages"])
for i in range(allocated_pages):
page = objspace["heap_pages"]["sorted"]["data"][i]
total_slots = int(page["total_slots"])
# Count non-free objects
used_slots = 0
# ... iterate and count ...
utilization = (used_slots / total_slots) * 100
page_utilization.append(utilization)
average = sum(page_utilization) / len(page_utilization)
print(f"Average page utilization: {average:.1f}%")
end
Low utilization indicates fragmentation.