Regular Expressions

Pattern matching, substitution, and text processing powered by PCRE2.

PCRE2-Powered

Strada uses PCRE2 (Perl Compatible Regular Expressions) for full Perl-style regex support, including lookahead, lookbehind, named captures, lazy quantifiers, and more. If PCRE2 is not available at build time, Strada falls back to POSIX Extended Regular Expressions with reduced functionality.

Pattern Matching

Use the =~ operator to test if a string matches a pattern, and !~ for negated matching:

my str $text = "Hello, World!";

# Match operator
if ($text =~ /World/) {
    say("Found World!");
}

# Negated match
if ($text !~ /Goodbye/) {
    say("No Goodbye found");
}

The =~ operator returns 1 if the pattern matches, 0 otherwise. The !~ operator returns the opposite.

Pattern Syntax

Anchors

Pattern Meaning
^Start of string (or line in /m mode)
$End of string (or line in /m mode)
\bWord boundary
\BNon-word boundary
# ^ matches start of string
if ("hello world" =~ /^hello/) {
    say("Starts with hello");
}

# $ matches end of string
if ("hello world" =~ /world$/) {
    say("Ends with world");
}

# \b matches word boundaries
if ("cat catalog" =~ /\bcat\b/) {
    say("Found standalone 'cat'");
}

Character Classes

Pattern Meaning
.Any character except newline (unless /s flag)
\dDigit (0-9)
\DNon-digit
\wWord character (a-z, A-Z, 0-9, _)
\WNon-word character
\sWhitespace (space, tab, newline)
\SNon-whitespace
[abc]Character class (a, b, or c)
[^abc]Negated class (not a, b, or c)
[a-z]Character range
if ($text =~ /\d+/) {
    say("Contains digits");
}

if ($text =~ /[aeiou]/) {
    say("Contains a vowel");
}

if ($text =~ /[A-Z][a-z]+/) {
    say("Contains capitalized word");
}

Quantifiers

Pattern Meaning
*Zero or more (greedy)
+One or more (greedy)
?Zero or one (greedy)
*?Zero or more (lazy/non-greedy)
+?One or more (lazy/non-greedy)
??Zero or one (lazy/non-greedy)
{n}Exactly n times
{n,}n or more times
{n,m}Between n and m times
if ($text =~ /a+/) {
    say("One or more a's");
}

if ($text =~ /\d{3}-\d{4}/) {
    say("Phone number format");
}

# Lazy quantifier: match shortest possible
if ($html =~ /<.*?>/) {
    say("Found first HTML tag");
}

Alternation and Grouping

Pattern Meaning
|Alternation (or)
(...)Grouping and capture
(?:...)Non-capturing group
(?P<name>...)Named capture group
if ($text =~ /cat|dog/) {
    say("Found cat or dog");
}

if ($text =~ /(hello|hi) world/) {
    say("Greeting found");
}

# Non-capturing group (no capture overhead)
if ($text =~ /(?:https?|ftp):\/\//) {
    say("URL protocol found");
}

Lookahead and Lookbehind

Zero-width assertions that match a position without consuming characters:

Pattern Meaning
(?=...)Positive lookahead
(?!...)Negative lookahead
(?<=...)Positive lookbehind
(?<!...)Negative lookbehind
# Positive lookahead: digits followed by "px"
if ("width: 100px" =~ /\d+(?=px)/) {
    say("Found pixel value");
}

# Negative lookahead: digits NOT followed by "px"
if ("count: 42" =~ /\d+(?!px)/) {
    say("Found non-pixel number");
}

# Positive lookbehind: digits preceded by "$"
if ("price: $99" =~ /(?<=\$)\d+/) {
    say("Found price amount");
}

# Negative lookbehind: word NOT preceded by "un"
if ("happy" =~ /(?<!un)happy/) {
    say("Not unhappy!");
}

Flags and Modifiers

Flags are specified after the closing delimiter:

Flag Meaning
iCase-insensitive matching
mMulti-line mode (^ and $ match line boundaries)
sSingle-line/dotall mode (. matches newline)
xExtended mode (whitespace and comments ignored in pattern)
gGlobal (for substitution — replace all occurrences)
eEvaluate replacement as expression (substitution only)
# Case-insensitive matching
if ($text =~ /hello/i) {
    say("Found hello (any case)");
}

# Multi-line mode
my str $lines = "line1\nline2";
if ($lines =~ /^line2/m) {
    say("Found line2 at start of line");
}

# Dotall mode: . matches newlines
if ($data =~ /start.*end/s) {
    say("Matched across lines");
}

# Extended mode: whitespace ignored, allows comments
if ($text =~ /
    \d{4}   # year
    -       # separator
    \d{2}   # month
    -       # separator
    \d{2}   # day
/x) {
    say("Date format matched");
}

Capture Variables ($1 - $9)

After a successful regex match, capture groups are available via $1 through $9:

my str $date = "2024-01-15";

if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    say("Year: " . $1);    # 2024
    say("Month: " . $2);   # 01
    say("Day: " . $3);     # 15
}

if ("hello world" =~ /(\w+)\s+(\w+)/) {
    say($1);  # "hello"
    say($2);  # "world"
}

$1$9 return undef if the group doesn't exist. They are syntactic sugar for captures()[N].

The captures() Function

Use captures() to get all capture groups as an array. Use this when you need the full match or more than 9 groups:

my str $date = "2024-01-15";

if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    my array @parts = captures();
    say("Full match: " . $parts[0]);   # 2024-01-15
    say("Year: " . $parts[1]);        # 2024
    say("Month: " . $parts[2]);       # 01
    say("Day: " . $parts[3]);         # 15
}
Capture Indices
  • captures()[0] is the full match (entire matched string)
  • captures()[1] / $1 is the first capture group
  • captures()[2] / $2 is the second capture group, and so on

Always check that the pattern matched before accessing captures, and save the result immediately since captures are cleared on the next match:

if ($text =~ /(\w+)@(\w+)/) {
    my array @saved = captures();  # Save immediately
    my str $user = $saved[1];
    my str $domain = $saved[2];
    say("User: " . $user . ", Domain: " . $domain);
}

Named Captures

Use (?P<name>...) to create named capture groups, then retrieve them with named_captures():

my str $date = "2024-01-15";

if ($date =~ /(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})/) {
    my hash %nc = named_captures();
    say("Year: " . $nc{"year"});    # 2024
    say("Month: " . $nc{"month"});  # 01
    say("Day: " . $nc{"day"});      # 15
}
# Parse a log line with named captures
my str $log = "[2024-01-15 10:30:45] ERROR: connection refused";

if ($log =~ /\[(?P<timestamp>[^\]]+)\]\s+(?P<level>\w+):\s+(?P<message>.*)/) {
    my hash %nc = named_captures();
    say("Time: " . $nc{"timestamp"});
    say("Level: " . $nc{"level"});
    say("Msg: " . $nc{"message"});
}
Note: Named captures require PCRE2. If Strada was built with POSIX regex fallback, named_captures() returns an empty hash.

Substitution

Use s/// for search and replace:

my str $text = "Hello World";

# Replace first occurrence
$text =~ s/World/Strada/;
say($text);  # Hello Strada

# Replace all occurrences with /g flag
my str $repeated = "cat cat cat";
$repeated =~ s/cat/dog/g;
say($repeated);  # dog dog dog

# Case-insensitive replacement
my str $greeting = "HELLO world";
$greeting =~ s/hello/Hi/i;
say($greeting);  # Hi world

# Backreferences in replacement
my str $swapped = "John Smith";
$swapped =~ s/(\w+) (\w+)/$2, $1/;
say($swapped);  # Smith, John
Case modification: Strada does not support \U, \L, or \E in replacement strings. Use the uc() and lc() functions instead.

The /e Modifier — Evaluate Replacement

The /e flag evaluates the replacement as an expression instead of a literal string:

my str $text = "apples: 3, oranges: 7";

# Double all numbers
$text =~ s/(\d+)/$1 * 2/eg;
say($text);  # "apples: 6, oranges: 14"

# Uppercase matched words
my str $words = "hello world";
$words =~ s/(\w+)/uc(captures()[1])/eg;
say($words);  # "HELLO WORLD"

# /ei - evaluate with case-insensitive match
my str $mixed = "Hello HELLO hello";
$mixed =~ s/(hello)/lc(captures()[1])/eig;
say($mixed);  # "hello hello hello"

Variable Interpolation

Variables are interpolated inside regex patterns when prefixed with $:

my str $search = "world";

# Variable is interpolated into the pattern
if ($text =~ /$search/) {
    say("Found it!");
}

$ as Anchor vs Variable

The $ character has two meanings inside a pattern:

# $ as end-of-line anchor
if ($text =~ /end$/) {
    say("Ends with 'end'");
}

# $ as variable interpolation
my str $var = "test";
if ($text =~ /$var/) {
    say("Found test");
}

# Both together: variable followed by anchor
my str $suffix = "world";
if ($text =~ /$suffix$/) {
    say("Text ends with world");
}

Escaping Special Characters

To match literal special characters, escape them with a backslash:

# Match literal dot
if ($filename =~ /\.txt$/) {
    say("Text file");
}

# Match literal dollar sign
if ($price =~ /\$\d+/) {
    say("Price found");
}

# Common metacharacters that need escaping:
#   . * + ? [ ] { } ( ) | ^ $ \

Built-in Functions

Strada provides several built-in functions for regex and string operations.

match()

Simple pattern match that returns 1 or 0. Also sets capture variables.

my int $found = match($text, "pattern");

if (match($email, "@")) {
    say("Looks like an email");
}

capture()

Get all capture groups as an array (pass the string and pattern as arguments):

my array @groups = capture($text, "(\\d+)-(\\d+)");
# @groups[0] = full match, @groups[1] = first group, etc.

captures()

Returns capture groups from the last =~ match as an array:

if ($text =~ /(\w+)\s+(\w+)/) {
    my array @all = captures();
    say($all[0]);  # Full match
    say($all[1]);  # First group (same as $1)
    say($all[2]);  # Second group (same as $2)
}

named_captures()

Returns named capture groups from the last =~ match as a hash:

if ($text =~ /(?P<first>\w+)\s+(?P<last>\w+)/) {
    my hash %nc = named_captures();
    say($nc{"first"});  # First name
    say($nc{"last"});   # Last name
}

split()

Split a string by a regex pattern:

# Split on whitespace
my array @words = split("\\s+", $text);

# Split on comma
my array @parts = split(",", $csv);
Important split() uses regex patterns. To split on literal special characters, escape them with a double backslash:
# Split on literal dot
my array @parts = split("\\.", $ip_address);

# Split on literal pipe
my array @items = split("\\|", $data);

join()

Join array elements with a separator (not regex, but commonly paired with split()):

my str $result = join(", ", @items);
say($result);  # "apple, banana, cherry"

replace() and replace_all()

Functional alternatives to the s/// operator:

# replace() — regex replacement (returns new string)
my str $new = replace($text, "world", "Strada");

# replace_all() — plain string replacement (NOT regex)
my str $cleaned = replace_all($text, "foo", "bar");
replace() vs replace_all() vs s///:
  • replace() uses regex and returns a new string (first match only). Supports $1, $2 backreferences in the replacement.
  • replace_all() uses plain string matching (NOT regex) and returns a new string.
  • s/// uses regex and modifies the variable in place. Add /g for global.

Transliteration (tr///, y///)

The tr/// operator (alias y///) performs character-by-character transliteration. Unlike s///, it does not use regular expressions — it maps individual characters from a search list to a replacement list.

Basic Syntax

$str =~ tr/SEARCHLIST/REPLACEMENTLIST/flags;
$str =~ y/SEARCHLIST/REPLACEMENTLIST/flags;  # Alias

Each character in the search list is replaced by the character at the corresponding position in the replacement list. Ranges like a-z are expanded.

Examples

my str $text = "Hello World";

# Lowercase to uppercase
$text =~ tr/a-z/A-Z/;
say($text);  # "HELLO WORLD"

# ROT13 encoding
my str $secret = "Hello World";
$secret =~ tr/A-Za-z/N-ZA-Mn-za-m/;
say($secret);  # "Uryyb Jbeyq"

# Replace vowels with stars
my str $censored = "hello";
$censored =~ tr/aeiou/*****/;
say($censored);  # "h*ll*"

Flags

FlagMeaning
cComplement — transliterate characters NOT in the search list
dDelete — delete characters found in search list that have no corresponding replacement
sSqueeze — collapse duplicate replacement characters into a single one
rReturn — return a modified copy, do not change the original

Delete (/d)

# Delete all digits
my str $data = "abc123def456";
$data =~ tr/0-9//d;
say($data);  # "abcdef"

# Delete all non-alphanumeric characters
my str $clean = "hello, world! #2024";
$clean =~ tr/a-zA-Z0-9//cd;
say($clean);  # "helloworld2024"

Squeeze (/s)

# Collapse repeated spaces
my str $spaced = "hello    world    foo";
$spaced =~ tr/ / /s;
say($spaced);  # "hello world foo"

Return (/r)

# Get a copy without modifying original
my str $original = "hello world";
my str $upper = ($original =~ tr/a-z/A-Z/r);
say($original);  # "hello world" (unchanged)
say($upper);     # "HELLO WORLD"

Complement (/c)

# Replace all non-letters with dash
my str $text = "hello 123 world!";
$text =~ tr/a-zA-Z/-/c;
say($text);  # "hello-----world-"

# Combine complement + squeeze: replace non-letters with single dash
my str $slug = "Hello, World! 2024";
$slug =~ tr/a-zA-Z/-/cs;
say($slug);  # "Hello-World-"
tr/// vs s///

tr/// maps characters one-to-one — it does not use regex and is much faster for simple character replacements. Use s/// when you need regex patterns or variable-length replacements.

Common Patterns

Email Validation

func is_valid_email(str $email) int {
    return match($email, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");
}

Phone Number

func is_phone(str $phone) int {
    return match($phone, "^\\d{3}[-.]?\\d{3}[-.]?\\d{4}$");
}

URL Extraction

func extract_urls(str $text) array {
    my array @urls = ();
    while ($text =~ /(https?:\/\/[^\s]+)/) {
        push(@urls, $1);
        $text =~ s/https?:\/\/[^\s]+//;
    }
    return @urls;
}

Whitespace Cleanup

# Trim leading whitespace
$text =~ s/^\s+//;

# Trim trailing whitespace
$text =~ s/\s+$//;

# Collapse multiple spaces to a single space
$text =~ s/\s+/ /g;

PCRE2 Feature Summary

Feature Syntax Status
Lazy quantifiers*?, +?, ??Supported
Non-capturing groups(?:...)Supported
Named captures(?P<name>...)Supported
Positive lookahead(?=...)Supported
Negative lookahead(?!...)Supported
Positive lookbehind(?<=...)Supported
Negative lookbehind(?<!...)Supported
Word boundaries\b, \BSupported
Backreferences$1$9Supported
Character class shortcuts\d, \w, \sSupported
Flags/i, /m, /s, /x, /g, /eSupported
Possessive quantifiers*+, ++Not supported
Unicode properties\p{...}Not supported

Performance Tips

Regex Performance
  • Compiled PCRE2 patterns are cached (128-slot cache) for fast repeated matching
  • Use anchors (^, $) when possible to limit the search space
  • Prefer index() for simple substring searches (no regex overhead)
  • Use replace_all() for literal string replacement (faster than regex)
  • Use non-capturing groups (?:...) when you don't need the captured text

Troubleshooting

Pattern not matching?

Captures not working?

if ($text =~ /(\w+)/) {
    my array @saved = captures();  # Save immediately!

    # Don't do another match before using @saved
    say("First word: " . $saved[1]);
}