Regular Expressions

Pattern matching, substitution, and text processing powered by PCRE2.

PCRE2-Powered

Strada uses PCRE2 (Perl Compatible Regular Expressions) for full Perl-style regex support, including lookahead, lookbehind, named captures, lazy quantifiers, and more. If PCRE2 is not available at build time, Strada falls back to POSIX Extended Regular Expressions with reduced functionality.

Pattern Matching

Use the =~ operator to test if a string matches a pattern, and !~ for negated matching:

my str $text = "Hello, World!";

# Match operator
if ($text =~ /World/) {
    say("Found World!");
}

# Negated match
if ($text !~ /Goodbye/) {
    say("No Goodbye found");
}

The =~ operator returns 1 if the pattern matches, 0 otherwise. The !~ operator returns the opposite.

Pattern Syntax

Anchors

Pattern	Meaning
`^`	Start of string (or line in `/m` mode)
`$`	End of string (or line in `/m` mode)
`\b`	Word boundary
`\B`	Non-word boundary

# ^ matches start of string
if ("hello world" =~ /^hello/) {
    say("Starts with hello");
}

# $ matches end of string
if ("hello world" =~ /world$/) {
    say("Ends with world");
}

# \b matches word boundaries
if ("cat catalog" =~ /\bcat\b/) {
    say("Found standalone 'cat'");
}

Character Classes

Pattern	Meaning
`.`	Any character except newline (unless `/s` flag)
`\d`	Digit (0-9)
`\D`	Non-digit
`\w`	Word character (a-z, A-Z, 0-9, _)
`\W`	Non-word character
`\s`	Whitespace (space, tab, newline)
`\S`	Non-whitespace
`[abc]`	Character class (a, b, or c)
`[^abc]`	Negated class (not a, b, or c)
`[a-z]`	Character range

if ($text =~ /\d+/) {
    say("Contains digits");
}

if ($text =~ /[aeiou]/) {
    say("Contains a vowel");
}

if ($text =~ /[A-Z][a-z]+/) {
    say("Contains capitalized word");
}

Quantifiers

Pattern	Meaning
`*`	Zero or more (greedy)
`+`	One or more (greedy)
`?`	Zero or one (greedy)
`*?`	Zero or more (lazy/non-greedy)
`+?`	One or more (lazy/non-greedy)
`??`	Zero or one (lazy/non-greedy)
`{n}`	Exactly n times
`{n,}`	n or more times
`{n,m}`	Between n and m times

if ($text =~ /a+/) {
    say("One or more a's");
}

if ($text =~ /\d{3}-\d{4}/) {
    say("Phone number format");
}

# Lazy quantifier: match shortest possible
if ($html =~ /<.*?>/) {
    say("Found first HTML tag");
}

Alternation and Grouping

Pattern	Meaning
`\|`	Alternation (or)
`(...)`	Grouping and capture
`(?:...)`	Non-capturing group
`(?P<name>...)`	Named capture group

if ($text =~ /cat|dog/) {
    say("Found cat or dog");
}

if ($text =~ /(hello|hi) world/) {
    say("Greeting found");
}

# Non-capturing group (no capture overhead)
if ($text =~ /(?:https?|ftp):\/\//) {
    say("URL protocol found");
}

Lookahead and Lookbehind

Zero-width assertions that match a position without consuming characters:

Pattern	Meaning
`(?=...)`	Positive lookahead
`(?!...)`	Negative lookahead
`(?<=...)`	Positive lookbehind
`(?<!...)`	Negative lookbehind

# Positive lookahead: digits followed by "px"
if ("width: 100px" =~ /\d+(?=px)/) {
    say("Found pixel value");
}

# Negative lookahead: digits NOT followed by "px"
if ("count: 42" =~ /\d+(?!px)/) {
    say("Found non-pixel number");
}

# Positive lookbehind: digits preceded by "$"
if ("price: $99" =~ /(?<=\$)\d+/) {
    say("Found price amount");
}

# Negative lookbehind: word NOT preceded by "un"
if ("happy" =~ /(?<!un)happy/) {
    say("Not unhappy!");
}

Flags and Modifiers

Flags are specified after the closing delimiter:

Flag	Meaning
`i`	Case-insensitive matching
`m`	Multi-line mode (`^` and `$` match line boundaries)
`s`	Single-line/dotall mode (`.` matches newline)
`x`	Extended mode (whitespace and comments ignored in pattern)
`g`	Global (for substitution — replace all occurrences)
`e`	Evaluate replacement as expression (substitution only)

# Case-insensitive matching
if ($text =~ /hello/i) {
    say("Found hello (any case)");
}

# Multi-line mode
my str $lines = "line1\nline2";
if ($lines =~ /^line2/m) {
    say("Found line2 at start of line");
}

# Dotall mode: . matches newlines
if ($data =~ /start.*end/s) {
    say("Matched across lines");
}

# Extended mode: whitespace ignored, allows comments
if ($text =~ /
    \d{4}   # year
    -       # separator
    \d{2}   # month
    -       # separator
    \d{2}   # day
/x) {
    say("Date format matched");
}

Capture Variables ($1 - $9)

After a successful regex match, capture groups are available via $1 through $9:

my str $date = "2024-01-15";

if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    say("Year: " . $1);    # 2024
    say("Month: " . $2);   # 01
    say("Day: " . $3);     # 15
}

if ("hello world" =~ /(\w+)\s+(\w+)/) {
    say($1);  # "hello"
    say($2);  # "world"
}

$1–$9 return undef if the group doesn't exist. They are syntactic sugar for captures()[N].

The captures() Function

Use captures() to get all capture groups as an array. Use this when you need the full match or more than 9 groups:

my str $date = "2024-01-15";

if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    my array @parts = captures();
    say("Full match: " . $parts[0]);   # 2024-01-15
    say("Year: " . $parts[1]);        # 2024
    say("Month: " . $parts[2]);       # 01
    say("Day: " . $parts[3]);         # 15
}

Capture Indices

captures()[0] is the full match (entire matched string)
captures()[1] / $1 is the first capture group
captures()[2] / $2 is the second capture group, and so on

Always check that the pattern matched before accessing captures, and save the result immediately since captures are cleared on the next match:

if ($text =~ /(\w+)@(\w+)/) {
    my array @saved = captures();  # Save immediately
    my str $user = $saved[1];
    my str $domain = $saved[2];
    say("User: " . $user . ", Domain: " . $domain);
}

Named Captures

Use (?P<name>...) to create named capture groups, then retrieve them with named_captures():

my str $date = "2024-01-15";

if ($date =~ /(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})/) {
    my hash %nc = named_captures();
    say("Year: " . $nc{"year"});    # 2024
    say("Month: " . $nc{"month"});  # 01
    say("Day: " . $nc{"day"});      # 15
}

# Parse a log line with named captures
my str $log = "[2024-01-15 10:30:45] ERROR: connection refused";

if ($log =~ /\[(?P<timestamp>[^\]]+)\]\s+(?P<level>\w+):\s+(?P<message>.*)/) {
    my hash %nc = named_captures();
    say("Time: " . $nc{"timestamp"});
    say("Level: " . $nc{"level"});
    say("Msg: " . $nc{"message"});
}

Note: Named captures require PCRE2. If Strada was built with POSIX regex fallback, named_captures() returns an empty hash.

Substitution

Use s/// for search and replace:

my str $text = "Hello World";

# Replace first occurrence
$text =~ s/World/Strada/;
say($text);  # Hello Strada

# Replace all occurrences with /g flag
my str $repeated = "cat cat cat";
$repeated =~ s/cat/dog/g;
say($repeated);  # dog dog dog

# Case-insensitive replacement
my str $greeting = "HELLO world";
$greeting =~ s/hello/Hi/i;
say($greeting);  # Hi world

# Backreferences in replacement
my str $swapped = "John Smith";
$swapped =~ s/(\w+) (\w+)/$2, $1/;
say($swapped);  # Smith, John

Case modification: Strada does not support \U, \L, or \E in replacement strings. Use the uc() and lc() functions instead.

The /e Modifier — Evaluate Replacement

The /e flag evaluates the replacement as an expression instead of a literal string:

my str $text = "apples: 3, oranges: 7";

# Double all numbers
$text =~ s/(\d+)/$1 * 2/eg;
say($text);  # "apples: 6, oranges: 14"

# Uppercase matched words
my str $words = "hello world";
$words =~ s/(\w+)/uc(captures()[1])/eg;
say($words);  # "HELLO WORLD"

# /ei - evaluate with case-insensitive match
my str $mixed = "Hello HELLO hello";
$mixed =~ s/(hello)/lc(captures()[1])/eig;
say($mixed);  # "hello hello hello"

Variable Interpolation

Variables are interpolated inside regex patterns when prefixed with $:

my str $search = "world";

# Variable is interpolated into the pattern
if ($text =~ /$search/) {
    say("Found it!");
}

$ as Anchor vs Variable

The $ character has two meanings inside a pattern:

End-of-line anchor — when at the end of the pattern or not followed by a word character
Variable interpolation — when followed by a variable name (a word character)

# $ as end-of-line anchor
if ($text =~ /end$/) {
    say("Ends with 'end'");
}

# $ as variable interpolation
my str $var = "test";
if ($text =~ /$var/) {
    say("Found test");
}

# Both together: variable followed by anchor
my str $suffix = "world";
if ($text =~ /$suffix$/) {
    say("Text ends with world");
}

Escaping Special Characters

To match literal special characters, escape them with a backslash:

# Match literal dot
if ($filename =~ /\.txt$/) {
    say("Text file");
}

# Match literal dollar sign
if ($price =~ /\$\d+/) {
    say("Price found");
}

# Common metacharacters that need escaping:
#   . * + ? [ ] { } ( ) | ^ $ \

Built-in Functions

Strada provides several built-in functions for regex and string operations.

match()

Simple pattern match that returns 1 or 0. Also sets capture variables.

my int $found = match($text, "pattern");

if (match($email, "@")) {
    say("Looks like an email");
}

capture()

Get all capture groups as an array (pass the string and pattern as arguments):

my array @groups = capture($text, "(\\d+)-(\\d+)");
# @groups[0] = full match, @groups[1] = first group, etc.

captures()

Returns capture groups from the last =~ match as an array:

if ($text =~ /(\w+)\s+(\w+)/) {
    my array @all = captures();
    say($all[0]);  # Full match
    say($all[1]);  # First group (same as $1)
    say($all[2]);  # Second group (same as $2)
}

named_captures()

Returns named capture groups from the last =~ match as a hash:

if ($text =~ /(?P<first>\w+)\s+(?P<last>\w+)/) {
    my hash %nc = named_captures();
    say($nc{"first"});  # First name
    say($nc{"last"});   # Last name
}

split()

Split a string by a regex pattern:

# Split on whitespace
my array @words = split("\\s+", $text);

# Split on comma
my array @parts = split(",", $csv);

Important split() uses regex patterns. To split on literal special characters, escape them with a double backslash:

# Split on literal dot
my array @parts = split("\\.", $ip_address);

# Split on literal pipe
my array @items = split("\\|", $data);

join()

Join array elements with a separator (not regex, but commonly paired with split()):

my str $result = join(", ", @items);
say($result);  # "apple, banana, cherry"

replace() and replace_all()

Functional alternatives to the s/// operator:

# replace() — regex replacement (returns new string)
my str $new = replace($text, "world", "Strada");

# replace_all() — plain string replacement (NOT regex)
my str $cleaned = replace_all($text, "foo", "bar");

replace() vs replace_all() vs s///:

replace() uses regex and returns a new string (first match only). Supports $1, $2 backreferences in the replacement.
replace_all() uses plain string matching (NOT regex) and returns a new string.
s/// uses regex and modifies the variable in place. Add /g for global.

Transliteration (tr///, y///)

The tr/// operator (alias y///) performs character-by-character transliteration. Unlike s///, it does not use regular expressions — it maps individual characters from a search list to a replacement list.

Basic Syntax

$str =~ tr/SEARCHLIST/REPLACEMENTLIST/flags;
$str =~ y/SEARCHLIST/REPLACEMENTLIST/flags;  # Alias

Each character in the search list is replaced by the character at the corresponding position in the replacement list. Ranges like a-z are expanded.

Examples

my str $text = "Hello World";

# Lowercase to uppercase
$text =~ tr/a-z/A-Z/;
say($text);  # "HELLO WORLD"

# ROT13 encoding
my str $secret = "Hello World";
$secret =~ tr/A-Za-z/N-ZA-Mn-za-m/;
say($secret);  # "Uryyb Jbeyq"

# Replace vowels with stars
my str $censored = "hello";
$censored =~ tr/aeiou/*****/;
say($censored);  # "h*ll*"

Flags

Flag	Meaning
`c`	Complement — transliterate characters NOT in the search list
`d`	Delete — delete characters found in search list that have no corresponding replacement
`s`	Squeeze — collapse duplicate replacement characters into a single one
`r`	Return — return a modified copy, do not change the original

Delete (/d)

# Delete all digits
my str $data = "abc123def456";
$data =~ tr/0-9//d;
say($data);  # "abcdef"

# Delete all non-alphanumeric characters
my str $clean = "hello, world! #2024";
$clean =~ tr/a-zA-Z0-9//cd;
say($clean);  # "helloworld2024"

Squeeze (/s)

# Collapse repeated spaces
my str $spaced = "hello    world    foo";
$spaced =~ tr/ / /s;
say($spaced);  # "hello world foo"

Return (/r)

# Get a copy without modifying original
my str $original = "hello world";
my str $upper = ($original =~ tr/a-z/A-Z/r);
say($original);  # "hello world" (unchanged)
say($upper);     # "HELLO WORLD"

Complement (/c)

# Replace all non-letters with dash
my str $text = "hello 123 world!";
$text =~ tr/a-zA-Z/-/c;
say($text);  # "hello-----world-"

# Combine complement + squeeze: replace non-letters with single dash
my str $slug = "Hello, World! 2024";
$slug =~ tr/a-zA-Z/-/cs;
say($slug);  # "Hello-World-"

tr/// vs s///

tr/// maps characters one-to-one — it does not use regex and is much faster for simple character replacements. Use s/// when you need regex patterns or variable-length replacements.

Common Patterns

Email Validation

func is_valid_email(str $email) int {
    return match($email, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");
}

Phone Number

func is_phone(str $phone) int {
    return match($phone, "^\\d{3}[-.]?\\d{3}[-.]?\\d{4}$");
}

URL Extraction

func extract_urls(str $text) array {
    my array @urls = ();
    while ($text =~ /(https?:\/\/[^\s]+)/) {
        push(@urls, $1);
        $text =~ s/https?:\/\/[^\s]+//;
    }
    return @urls;
}

Whitespace Cleanup

# Trim leading whitespace
$text =~ s/^\s+//;

# Trim trailing whitespace
$text =~ s/\s+$//;

# Collapse multiple spaces to a single space
$text =~ s/\s+/ /g;

PCRE2 Feature Summary

Feature	Syntax	Status
Lazy quantifiers	`*?`, `+?`, `??`	Supported
Non-capturing groups	`(?:...)`	Supported
Named captures	`(?P<name>...)`	Supported
Positive lookahead	`(?=...)`	Supported
Negative lookahead	`(?!...)`	Supported
Positive lookbehind	`(?<=...)`	Supported
Negative lookbehind	`(?<!...)`	Supported
Word boundaries	`\b`, `\B`	Supported
Backreferences	`$1`–`$9`	Supported
Character class shortcuts	`\d`, `\w`, `\s`	Supported
Flags	`/i`, `/m`, `/s`, `/x`, `/g`, `/e`	Supported
Possessive quantifiers	`*+`, `++`	Not supported
Unicode properties	`\p{...}`	Not supported

Performance Tips

Regex Performance

Compiled PCRE2 patterns are cached (128-slot cache) for fast repeated matching
Use anchors (^, $) when possible to limit the search space
Prefer index() for simple substring searches (no regex overhead)
Use replace_all() for literal string replacement (faster than regex)
Use non-capturing groups (?:...) when you don't need the captured text

Troubleshooting

Pattern not matching?

Check anchor usage — ^ and $ match start/end of the entire string by default
Verify escaping — special characters need a backslash
Use the /i flag for case-insensitive matching
Test pattern components separately

Captures not working?

Ensure the pattern actually matched before calling captures() or using $1–$9
Remember captures()[0] is the full match — groups start at index 1
Captures are cleared on each new match — save them immediately
$1–$9 return undef if the group doesn't exist

if ($text =~ /(\w+)/) {
    my array @saved = captures();  # Save immediately!

    # Don't do another match before using @saved
    say("First word: " . $saved[1]);
}

← Previous Object-Oriented Programming