Protocol::URLGuidesGetting Started

Getting Started

This guide explains how to get started with protocol-url for parsing, manipulating, and constructing URLs in Ruby.

Installation

Add the gem to your project:

$ bundle add protocol-url

Core Concepts

protocol-url provides a clean, standards-compliant API for working with URLs according to RFC 3986. The library is organized around three main classes:

Additionally, the module Protocol::URL::Path module provides low-level utilities for path manipulation including splitting, joining, simplifying, and expanding paths according to RFC 3986 rules.

Usage

Parse complete URLs with scheme and authority:

require "protocol/url"

# Parse an absolute URL:
url = Protocol::URL["https://api.example.com:8080/v1/users?page=2#results"]
url.scheme      # => "https"
url.authority   # => "api.example.com:8080"
url.path        # => "/v1/users"
url.query       # => "page=2"
url.fragment    # => "results"

Parse relative URLs and references:

# Parse a relative URL:
relative = Protocol::URL["/api/v1/users"]
relative.path   # => "/api/v1/users"

# Parse a reference with query and fragment:
reference = Protocol::URL["/search?q=ruby#top"]
reference.path      # => "/search"
reference.query     # => "q=ruby"
reference.fragment  # => "top"

Constructing URLs

Build URLs programmatically:

# Create an absolute URL:
url = Protocol::URL::Absolute.new("https", "example.com", "/api/users")
url.to_s  # => "https://example.com/api/users"

# The authority can include port and userinfo:
url = Protocol::URL::Absolute.new("https", "user:pass@api.example.com:8080", "/v1")
url.to_s  # => "https://user:pass@api.example.com:8080/v1"

# Create a reference with components:
reference = Protocol::URL::Reference.new("/api/search", "q=ruby&limit=10", "results")
reference.to_s  # => "/api/search?q=ruby&limit=10#results"

Combining URLs

URLs can be combined following RFC 3986 resolution rules:

# Combine absolute URL with relative path:
base = Protocol::URL["https://example.com/docs/guide/"]
relative = Protocol::URL::Relative.new("../api/reference.html")

result = base + relative
result.to_s  # => "https://example.com/docs/api/reference.html"

# Absolute paths replace the base path:
absolute_path = Protocol::URL::Relative.new("/completely/different/path")
result = base + absolute_path
result.to_s  # => "https://example.com/completely/different/path"

Path Manipulation

The module Protocol::URL::Path module provides powerful utilities for working with URL paths:

Splitting and Joining Paths

# Split paths into components:
Protocol::URL::Path.split("/a/b/c")     # => ["", "a", "b", "c"]
Protocol::URL::Path.split("a/b/c")      # => ["a", "b", "c"]
Protocol::URL::Path.split("a/b/c/")     # => ["a", "b", "c", ""]

# Join components back into paths:
Protocol::URL::Path.join(["", "a", "b", "c"])  # => "/a/b/c"
Protocol::URL::Path.join(["a", "b", "c"])      # => "a/b/c"

Simplifying Paths

Remove dot segments (. and ..) from paths:

# Simplify a path:
components = ["a", "b", "..", "c", ".", "d"]
simplified = Protocol::URL::Path.simplify(components)
# => ["a", "c", "d"]

# Works with absolute paths:
components = ["", "a", "b", "..", "..", "c"]
simplified = Protocol::URL::Path.simplify(components)
# => ["", "c"]

Expanding Paths

Merge two paths according to RFC 3986 rules:

# Expand a relative path against a base:
result = Protocol::URL::Path.expand("/a/b/c", "../d")
# => "/a/b/d"

# Handle complex relative paths:
result = Protocol::URL::Path.expand("/a/b/c/d", "../../e/f")
# => "/a/b/e/f"

# Absolute relative paths replace the base:
result = Protocol::URL::Path.expand("/a/b/c", "/x/y/z")
# => "/x/y/z"

The expand method has an optional pop parameter (default: true) that controls whether the last component of the base path is removed before merging:

# With pop=true (default), behaves like URI resolution:
Protocol::URL::Path.expand("/a/b/file.html", "other.html")
# => "/a/b/other.html"

# With pop=false, treats base as a directory:
Protocol::URL::Path.expand("/a/b/file.html", "other.html", false)
# => "/a/b/file.html/other.html"

Converting to Local File System Paths

Convert URL paths to local file system paths safely:

# Convert URL path to local file system path:
Protocol::URL::Path.to_local_path("/documents/report.pdf")
# => "/documents/report.pdf"

# Handles percent-encoded characters:
Protocol::URL::Path.to_local_path("/files/My%20Document.txt")
# => "/files/My Document.txt"

# Security: Preserves percent-encoded path separators
# This prevents directory traversal attacks:
Protocol::URL::Path.to_local_path("/folder/safe%2Fname/file.txt")
# => "/folder/safe%2Fname/file.txt"
# %2F (/) and %5C (\) are NOT decoded, preventing them from creating
# additional path components in the file system

Working with References

class Protocol::URL::Reference extends relative URLs with query parameters and fragments. For detailed information on working with references, see the Working with References guide.

Quick example:

# Create a reference with query and fragment:
reference = Protocol::URL::Reference.new("/api/users", "status=active", "results")
reference.to_s  # => "/api/users?status=active#results"

# Update components immutably:
updated = reference.with(query: "status=inactive")
updated.to_s  # => "/api/users?status=inactive#results"

URL Encoding

The library handles URL encoding automatically for path components:

require "protocol/url/encoding"

# Escape path components (preserves slashes):
escaped = Protocol::URL::Encoding.escape_path("/path/with spaces/file.html")
# => "/path/with%20spaces/file.html"

# Escape query parameters:
escaped = Protocol::URL::Encoding.escape("hello world!")
# => "hello%20world%21"

# Unescape percent-encoded strings:
unescaped = Protocol::URL::Encoding.unescape("hello%20world%21")
# => "hello world!"

Practical Examples

Building API URLs

# Build a base API URL:
base = Protocol::URL::Absolute.new("https", "api.example.com", "/v2")

# Add resource paths:
users_endpoint = base + Protocol::URL::Relative.new("users")
users_endpoint.to_s  # => "https://api.example.com/v2/users"

# Add specific resource ID:
user_detail = users_endpoint + Protocol::URL::Relative.new("123")
user_detail.to_s  # => "https://api.example.com/v2/users/123"

URL Normalization

Clean up URLs by simplifying paths:

# URL with redundant path segments:
messy = Protocol::URL["https://example.com/a/b/../c/./d"]

# The path is automatically simplified:
messy.path  # => "/a/c/d"
messy.to_s  # => "https://example.com/a/c/d"

Best Practices

Choose the Right Class

Path Manipulation

When manipulating paths:

Encoding

  • The library handles encoding automatically for path components
  • Use module Protocol::URL::Encoding methods directly when you need explicit control
  • Remember that spaces become %20 in paths and + or %20 in query strings

Common Pitfalls

Pop Behavior in Path Expansion

The expand method pops the last path component by default to match RFC 3986 URI resolution:

# This might be surprising:
Protocol::URL::Path.expand("/api/users", "groups")
# => "/api/groups" (not "/api/users/groups")

# To prevent popping, use pop=false:
Protocol::URL::Path.expand("/api/users", "groups", false)
# => "/api/users/groups"

Empty Paths

Empty relative paths return the base unchanged:

base = Protocol::URL::Reference.new("/api/users")
same = base.with(path: "")
same.to_s  # => "/api/users" (unchanged)

Trailing Slashes

Trailing slashes are preserved and have semantic meaning:

# Directory (trailing slash):
Protocol::URL::Path.expand("/docs/", "page.html")
# => "/docs/page.html"

# File (no trailing slash):
Protocol::URL::Path.expand("/docs", "page.html") 
# => "/page.html" (pops "docs")