Container Policies

This guide explains how to configure container policies for your services and understand the default failure handling behavior.

Default Failure Handling

All services use DEFAULT = self.new.freeze which monitors failure rates and stops the container when failures exceed a threshold.

Default threshold: 6 failures in 60 seconds (0.1 failures per second).

This means:

Services can tolerate occasional failures and transient issues.
More than 6 failures in any 60-second window stops the container.
Prevents services from restart-looping indefinitely when fundamentally broken.

This fail-fast behavior is appropriate for orchestrated environments (Kubernetes, systemd) where the orchestrator will restart the entire service.

Why This Default?

Without failure monitoring, a broken service with restart: true would restart indefinitely, wasting resources. The default policy:

Catches problems quickly: Broken services stop within 10-20 seconds.
Prevents resource waste: Doesn't keep trying to start services that will never succeed.
Enables orchestrator recovery: Systemd/Kubernetes can restart the whole process with a clean state.
Detects environmental issues: Bad hardware, corrupted pre-fork state, or system-level problems can't be fixed by restarting children - the entire service needs to be restarted (potentially on different hardware).
Signals clear failure: Exit code indicates the service couldn't maintain healthy operation.

Configuring Policies

Use container_policy in your service configuration to customize failure handling:

# config/service.rb

# More lenient: allow 5 failures per minute:
container_policy Async::Service::Policy.new(maximum_failures: 5, window: 60)

service "web" do
	# Your service configuration.
end

service "worker" do
	# Also uses the same policy.
end

The policy applies to all services in the configuration file.

Choosing a Threshold

Consider your service characteristics:

Strict (catch problems immediately):

container_policy Async::Service::Policy.new(maximum_failures: 1, window: 5)

Balanced (tolerate transient issues):

container_policy Async::Service::Policy.new(maximum_failures: 5, window: 60)

Lenient (allow many retries):

container_policy Async::Service::Policy.new(maximum_failures: 20, window: 60)

Factors to consider:

Traffic volume: High-traffic services may have more absolute failures.
Error types: Some errors are transient (network timeouts, rate limits).
Dependencies: Upstream services may need time to recover.
Deployment environment: Kubernetes/systemd handle restarts, local dev doesn't.

Per-Container Policy Instances

The container_policy method accepts a block that's evaluated each time a container is created:

# config/service.rb
container_policy do
	# This block is called for EACH container created
	# Each container gets its own policy instance with fresh state
	Async::Service::Policy.new(maximum_failures: 5, window: 60)
end

If your policy is tracking per-container state, this will ensure each container has new policy with clean state.