Async::Container::SupervisorGuidesGetting Started

Getting Started

This guide explains how to get started with async-container-supervisor to supervise and monitor worker processes in your Ruby applications.

Installation

Add the gem to your project:

$ bundle add async-container-supervisor

Core Concepts

async-container-supervisor provides a robust process supervision system built on top of Async::Service::Generic. The key components are:

Process Architecture

The supervisor operates as a multi-process architecture with three layers:

graph TD Controller[Async::Container::Controller<br/>Root Process] Controller -->|spawns & manages| Supervisor[Supervisor Process<br/>async-container-supervisor] Controller -->|spawns & manages| Worker1[Worker Process 1] Controller -->|spawns & manages| Worker2[Worker Process 2] Controller -->|spawns & manages| WorkerN[Worker Process N...] Worker1 -.->|connects via IPC| Supervisor Worker2 -.->|connects via IPC| Supervisor WorkerN -.->|connects via IPC| Supervisor style Controller fill:#e1f5ff style Supervisor fill:#fff4e1 style Worker1 fill:#e8f5e9 style Worker2 fill:#e8f5e9 style WorkerN fill:#e8f5e9

Important: The supervisor process is itself just another process managed by the root controller. If the supervisor crashes, the controller will restart it, and all worker processes will automatically reconnect to the new supervisor. This design ensures high availability and fault tolerance.

Usage

To use the supervisor, you need to define two services: one for the supervisor itself and one for your workers that will be supervised.

Basic Example

Create a service configuration file (e.g., service.rb):

#!/usr/bin/env async-service
# frozen_string_literal: true

require "async/container/supervisor"

class MyWorkerService < Async::Service::Generic
	def setup(container)
		super
		
		container.run(name: self.class.name, count: 4, restart: true) do |instance|
			Async do
				# Connect to the supervisor if available:
				if @environment.implements?(Async::Container::Supervisor::Supervised)
					@evaluator.make_supervised_worker(instance).run
				end
				
				# Mark the worker as ready:
				instance.ready!
				
				# Your worker logic here:
				loop do
					# Do work...
					sleep 1
					
					# Periodically update readiness:
					instance.ready!
				end
			end
		end
	end
end

# Define the worker service:
service "worker" do
	service_class MyWorkerService
	
	# Enable supervision for this service:
	include Async::Container::Supervisor::Supervised
end

# Define the supervisor service:
service "supervisor" do
	include Async::Container::Supervisor::Environment
end

Running the Service

Make the service executable and run it:

$ chmod +x service.rb
$ ./service.rb

This will start:

  • A supervisor process listening on a Unix socket
  • Four worker processes that connect to the supervisor

Adding Health Monitors

You can add monitors to detect and respond to unhealthy conditions. For example, to add a memory monitor:

service "supervisor" do
	include Async::Container::Supervisor::Environment
	
	monitors do
		[
			# Restart workers that exceed 500MB of memory:
			Async::Container::Supervisor::MemoryMonitor.new(
				interval: 10,  # Check every 10 seconds
				limit: 1024 * 1024 * 500  # 500MB limit
			)
		]
	end
end

The class Async::Container::Supervisor::MemoryMonitor will periodically check worker memory usage and restart any workers that exceed the configured limit.

Collecting Diagnostics

The supervisor can collect various diagnostics from workers on demand:

  • Memory dumps: Full heap dumps for memory analysis via ObjectSpace.dump_all.
  • Memory samples: Lightweight sampling to identify memory leaks.
  • Thread dumps: Stack traces of all threads.
  • Scheduler dumps: Async fiber hierarchy
  • Garbage collection profiles: GC performance data

These can be triggered programmatically or via command-line tools (when available).

Memory Leak Diagnosis

To identify memory leaks, you can use the memory sampling feature which is much lighter weight than a full memory dump. It tracks allocations over a time period and focuses on retained objects.

Using the bake task:

# Sample for 30 seconds and print report to console
$ bake async:container:supervisor:memory_sample duration=30

Programmatically:

# Assuming you have a connection to a worker:
result = connection.call(do: :memory_sample, duration: 30)
puts result[:data]

This will sample memory allocations for the specified duration, then force a garbage collection and return a JSON report showing what objects were allocated during that period and retained after GC. Late-lifecycle allocations that are retained are likely memory leaks.

The JSON report includes:

  • total_allocated: Total allocated memory and count
  • total_retained: Total retained memory and count
  • by_gem: Breakdown by gem/library
  • by_file: Breakdown by source file
  • by_location: Breakdown by specific file:line locations
  • by_class: Breakdown by object class
  • strings: String allocation analysis

This is much more efficient than do: :memory_dump which uses ObjectSpace.dump_all and can be slow and blocking on large heaps. The JSON format also makes it easy to integrate with monitoring and analysis tools.

Advanced Usage

Custom Monitors

You can create custom monitors by implementing the monitor interface. A monitor should:

  1. Accept connections and periodically check worker health
  2. Take action (like restarting workers) when unhealthy conditions are detected

Fault Tolerance

The supervisor architecture is designed for fault tolerance:

  • Supervisor crashes: When the supervisor process crashes, the root controller automatically restarts it. Workers detect the disconnection and reconnect to the new supervisor.
  • Worker crashes: The container automatically restarts crashed workers based on the restart: true configuration.
  • Communication failures: Workers gracefully handle supervisor unavailability and will attempt to reconnect.

This design ensures your application remains operational even when individual processes fail.