Regular Expressions
Pattern matching, substitution, and text processing powered by PCRE2.
Strada uses PCRE2 (Perl Compatible Regular Expressions) for full Perl-style regex support, including lookahead, lookbehind, named captures, lazy quantifiers, and more. If PCRE2 is not available at build time, Strada falls back to POSIX Extended Regular Expressions with reduced functionality.
Pattern Matching
Use the =~ operator to test if a string matches a pattern, and !~ for negated matching:
my str $text = "Hello, World!";
# Match operator
if ($text =~ /World/) {
say("Found World!");
}
# Negated match
if ($text !~ /Goodbye/) {
say("No Goodbye found");
}
The =~ operator returns 1 if the pattern matches, 0 otherwise. The !~ operator returns the opposite.
Pattern Syntax
Anchors
| Pattern | Meaning |
|---|---|
^ | Start of string (or line in /m mode) |
$ | End of string (or line in /m mode) |
\b | Word boundary |
\B | Non-word boundary |
# ^ matches start of string
if ("hello world" =~ /^hello/) {
say("Starts with hello");
}
# $ matches end of string
if ("hello world" =~ /world$/) {
say("Ends with world");
}
# \b matches word boundaries
if ("cat catalog" =~ /\bcat\b/) {
say("Found standalone 'cat'");
}
Character Classes
| Pattern | Meaning |
|---|---|
. | Any character except newline (unless /s flag) |
\d | Digit (0-9) |
\D | Non-digit |
\w | Word character (a-z, A-Z, 0-9, _) |
\W | Non-word character |
\s | Whitespace (space, tab, newline) |
\S | Non-whitespace |
[abc] | Character class (a, b, or c) |
[^abc] | Negated class (not a, b, or c) |
[a-z] | Character range |
if ($text =~ /\d+/) {
say("Contains digits");
}
if ($text =~ /[aeiou]/) {
say("Contains a vowel");
}
if ($text =~ /[A-Z][a-z]+/) {
say("Contains capitalized word");
}
Quantifiers
| Pattern | Meaning |
|---|---|
* | Zero or more (greedy) |
+ | One or more (greedy) |
? | Zero or one (greedy) |
*? | Zero or more (lazy/non-greedy) |
+? | One or more (lazy/non-greedy) |
?? | Zero or one (lazy/non-greedy) |
{n} | Exactly n times |
{n,} | n or more times |
{n,m} | Between n and m times |
if ($text =~ /a+/) {
say("One or more a's");
}
if ($text =~ /\d{3}-\d{4}/) {
say("Phone number format");
}
# Lazy quantifier: match shortest possible
if ($html =~ /<.*?>/) {
say("Found first HTML tag");
}
Alternation and Grouping
| Pattern | Meaning |
|---|---|
| | Alternation (or) |
(...) | Grouping and capture |
(?:...) | Non-capturing group |
(?P<name>...) | Named capture group |
if ($text =~ /cat|dog/) {
say("Found cat or dog");
}
if ($text =~ /(hello|hi) world/) {
say("Greeting found");
}
# Non-capturing group (no capture overhead)
if ($text =~ /(?:https?|ftp):\/\//) {
say("URL protocol found");
}
Lookahead and Lookbehind
Zero-width assertions that match a position without consuming characters:
| Pattern | Meaning |
|---|---|
(?=...) | Positive lookahead |
(?!...) | Negative lookahead |
(?<=...) | Positive lookbehind |
(?<!...) | Negative lookbehind |
# Positive lookahead: digits followed by "px"
if ("width: 100px" =~ /\d+(?=px)/) {
say("Found pixel value");
}
# Negative lookahead: digits NOT followed by "px"
if ("count: 42" =~ /\d+(?!px)/) {
say("Found non-pixel number");
}
# Positive lookbehind: digits preceded by "$"
if ("price: $99" =~ /(?<=\$)\d+/) {
say("Found price amount");
}
# Negative lookbehind: word NOT preceded by "un"
if ("happy" =~ /(?<!un)happy/) {
say("Not unhappy!");
}
Flags and Modifiers
Flags are specified after the closing delimiter:
| Flag | Meaning |
|---|---|
i | Case-insensitive matching |
m | Multi-line mode (^ and $ match line boundaries) |
s | Single-line/dotall mode (. matches newline) |
x | Extended mode (whitespace and comments ignored in pattern) |
g | Global (for substitution — replace all occurrences) |
e | Evaluate replacement as expression (substitution only) |
# Case-insensitive matching
if ($text =~ /hello/i) {
say("Found hello (any case)");
}
# Multi-line mode
my str $lines = "line1\nline2";
if ($lines =~ /^line2/m) {
say("Found line2 at start of line");
}
# Dotall mode: . matches newlines
if ($data =~ /start.*end/s) {
say("Matched across lines");
}
# Extended mode: whitespace ignored, allows comments
if ($text =~ /
\d{4} # year
- # separator
\d{2} # month
- # separator
\d{2} # day
/x) {
say("Date format matched");
}
Capture Variables ($1 - $9)
After a successful regex match, capture groups are available via $1 through $9:
my str $date = "2024-01-15";
if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
say("Year: " . $1); # 2024
say("Month: " . $2); # 01
say("Day: " . $3); # 15
}
if ("hello world" =~ /(\w+)\s+(\w+)/) {
say($1); # "hello"
say($2); # "world"
}
$1–$9 return undef if the group doesn't exist. They are syntactic sugar for captures()[N].
The captures() Function
Use captures() to get all capture groups as an array. Use this when you need the full match or more than 9 groups:
my str $date = "2024-01-15";
if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
my array @parts = captures();
say("Full match: " . $parts[0]); # 2024-01-15
say("Year: " . $parts[1]); # 2024
say("Month: " . $parts[2]); # 01
say("Day: " . $parts[3]); # 15
}
captures()[0]is the full match (entire matched string)captures()[1]/$1is the first capture groupcaptures()[2]/$2is the second capture group, and so on
Always check that the pattern matched before accessing captures, and save the result immediately since captures are cleared on the next match:
if ($text =~ /(\w+)@(\w+)/) {
my array @saved = captures(); # Save immediately
my str $user = $saved[1];
my str $domain = $saved[2];
say("User: " . $user . ", Domain: " . $domain);
}
Named Captures
Use (?P<name>...) to create named capture groups, then retrieve them with named_captures():
my str $date = "2024-01-15";
if ($date =~ /(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})/) {
my hash %nc = named_captures();
say("Year: " . $nc{"year"}); # 2024
say("Month: " . $nc{"month"}); # 01
say("Day: " . $nc{"day"}); # 15
}
# Parse a log line with named captures
my str $log = "[2024-01-15 10:30:45] ERROR: connection refused";
if ($log =~ /\[(?P<timestamp>[^\]]+)\]\s+(?P<level>\w+):\s+(?P<message>.*)/) {
my hash %nc = named_captures();
say("Time: " . $nc{"timestamp"});
say("Level: " . $nc{"level"});
say("Msg: " . $nc{"message"});
}
named_captures() returns an empty hash.
Substitution
Use s/// for search and replace:
my str $text = "Hello World";
# Replace first occurrence
$text =~ s/World/Strada/;
say($text); # Hello Strada
# Replace all occurrences with /g flag
my str $repeated = "cat cat cat";
$repeated =~ s/cat/dog/g;
say($repeated); # dog dog dog
# Case-insensitive replacement
my str $greeting = "HELLO world";
$greeting =~ s/hello/Hi/i;
say($greeting); # Hi world
# Backreferences in replacement
my str $swapped = "John Smith";
$swapped =~ s/(\w+) (\w+)/$2, $1/;
say($swapped); # Smith, John
\U, \L, or \E in replacement strings. Use the uc() and lc() functions instead.
The /e Modifier — Evaluate Replacement
The /e flag evaluates the replacement as an expression instead of a literal string:
my str $text = "apples: 3, oranges: 7";
# Double all numbers
$text =~ s/(\d+)/$1 * 2/eg;
say($text); # "apples: 6, oranges: 14"
# Uppercase matched words
my str $words = "hello world";
$words =~ s/(\w+)/uc(captures()[1])/eg;
say($words); # "HELLO WORLD"
# /ei - evaluate with case-insensitive match
my str $mixed = "Hello HELLO hello";
$mixed =~ s/(hello)/lc(captures()[1])/eig;
say($mixed); # "hello hello hello"
Variable Interpolation
Variables are interpolated inside regex patterns when prefixed with $:
my str $search = "world";
# Variable is interpolated into the pattern
if ($text =~ /$search/) {
say("Found it!");
}
$ as Anchor vs Variable
The $ character has two meanings inside a pattern:
- End-of-line anchor — when at the end of the pattern or not followed by a word character
- Variable interpolation — when followed by a variable name (a word character)
# $ as end-of-line anchor
if ($text =~ /end$/) {
say("Ends with 'end'");
}
# $ as variable interpolation
my str $var = "test";
if ($text =~ /$var/) {
say("Found test");
}
# Both together: variable followed by anchor
my str $suffix = "world";
if ($text =~ /$suffix$/) {
say("Text ends with world");
}
Escaping Special Characters
To match literal special characters, escape them with a backslash:
# Match literal dot
if ($filename =~ /\.txt$/) {
say("Text file");
}
# Match literal dollar sign
if ($price =~ /\$\d+/) {
say("Price found");
}
# Common metacharacters that need escaping:
# . * + ? [ ] { } ( ) | ^ $ \
Built-in Functions
Strada provides several built-in functions for regex and string operations.
match()
Simple pattern match that returns 1 or 0. Also sets capture variables.
my int $found = match($text, "pattern");
if (match($email, "@")) {
say("Looks like an email");
}
capture()
Get all capture groups as an array (pass the string and pattern as arguments):
my array @groups = capture($text, "(\\d+)-(\\d+)");
# @groups[0] = full match, @groups[1] = first group, etc.
captures()
Returns capture groups from the last =~ match as an array:
if ($text =~ /(\w+)\s+(\w+)/) {
my array @all = captures();
say($all[0]); # Full match
say($all[1]); # First group (same as $1)
say($all[2]); # Second group (same as $2)
}
named_captures()
Returns named capture groups from the last =~ match as a hash:
if ($text =~ /(?P<first>\w+)\s+(?P<last>\w+)/) {
my hash %nc = named_captures();
say($nc{"first"}); # First name
say($nc{"last"}); # Last name
}
split()
Split a string by a regex pattern:
# Split on whitespace
my array @words = split("\\s+", $text);
# Split on comma
my array @parts = split(",", $csv);
split() uses regex patterns. To split on literal special characters, escape them with a double backslash:
# Split on literal dot
my array @parts = split("\\.", $ip_address);
# Split on literal pipe
my array @items = split("\\|", $data);
join()
Join array elements with a separator (not regex, but commonly paired with split()):
my str $result = join(", ", @items);
say($result); # "apple, banana, cherry"
replace() and replace_all()
Functional alternatives to the s/// operator:
# replace() — regex replacement (returns new string)
my str $new = replace($text, "world", "Strada");
# replace_all() — plain string replacement (NOT regex)
my str $cleaned = replace_all($text, "foo", "bar");
replace()uses regex and returns a new string (first match only). Supports$1,$2backreferences in the replacement.replace_all()uses plain string matching (NOT regex) and returns a new string.s///uses regex and modifies the variable in place. Add/gfor global.
Transliteration (tr///, y///)
The tr/// operator (alias y///) performs character-by-character transliteration. Unlike s///, it does not use regular expressions — it maps individual characters from a search list to a replacement list.
Basic Syntax
$str =~ tr/SEARCHLIST/REPLACEMENTLIST/flags;
$str =~ y/SEARCHLIST/REPLACEMENTLIST/flags; # Alias
Each character in the search list is replaced by the character at the corresponding position in the replacement list. Ranges like a-z are expanded.
Examples
my str $text = "Hello World";
# Lowercase to uppercase
$text =~ tr/a-z/A-Z/;
say($text); # "HELLO WORLD"
# ROT13 encoding
my str $secret = "Hello World";
$secret =~ tr/A-Za-z/N-ZA-Mn-za-m/;
say($secret); # "Uryyb Jbeyq"
# Replace vowels with stars
my str $censored = "hello";
$censored =~ tr/aeiou/*****/;
say($censored); # "h*ll*"
Flags
| Flag | Meaning |
|---|---|
c | Complement — transliterate characters NOT in the search list |
d | Delete — delete characters found in search list that have no corresponding replacement |
s | Squeeze — collapse duplicate replacement characters into a single one |
r | Return — return a modified copy, do not change the original |
Delete (/d)
# Delete all digits
my str $data = "abc123def456";
$data =~ tr/0-9//d;
say($data); # "abcdef"
# Delete all non-alphanumeric characters
my str $clean = "hello, world! #2024";
$clean =~ tr/a-zA-Z0-9//cd;
say($clean); # "helloworld2024"
Squeeze (/s)
# Collapse repeated spaces
my str $spaced = "hello world foo";
$spaced =~ tr/ / /s;
say($spaced); # "hello world foo"
Return (/r)
# Get a copy without modifying original
my str $original = "hello world";
my str $upper = ($original =~ tr/a-z/A-Z/r);
say($original); # "hello world" (unchanged)
say($upper); # "HELLO WORLD"
Complement (/c)
# Replace all non-letters with dash
my str $text = "hello 123 world!";
$text =~ tr/a-zA-Z/-/c;
say($text); # "hello-----world-"
# Combine complement + squeeze: replace non-letters with single dash
my str $slug = "Hello, World! 2024";
$slug =~ tr/a-zA-Z/-/cs;
say($slug); # "Hello-World-"
tr/// maps characters one-to-one — it does not use regex and is much faster for simple character replacements. Use s/// when you need regex patterns or variable-length replacements.
Common Patterns
Email Validation
func is_valid_email(str $email) int {
return match($email, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");
}
Phone Number
func is_phone(str $phone) int {
return match($phone, "^\\d{3}[-.]?\\d{3}[-.]?\\d{4}$");
}
URL Extraction
func extract_urls(str $text) array {
my array @urls = ();
while ($text =~ /(https?:\/\/[^\s]+)/) {
push(@urls, $1);
$text =~ s/https?:\/\/[^\s]+//;
}
return @urls;
}
Whitespace Cleanup
# Trim leading whitespace
$text =~ s/^\s+//;
# Trim trailing whitespace
$text =~ s/\s+$//;
# Collapse multiple spaces to a single space
$text =~ s/\s+/ /g;
PCRE2 Feature Summary
| Feature | Syntax | Status |
|---|---|---|
| Lazy quantifiers | *?, +?, ?? | Supported |
| Non-capturing groups | (?:...) | Supported |
| Named captures | (?P<name>...) | Supported |
| Positive lookahead | (?=...) | Supported |
| Negative lookahead | (?!...) | Supported |
| Positive lookbehind | (?<=...) | Supported |
| Negative lookbehind | (?<!...) | Supported |
| Word boundaries | \b, \B | Supported |
| Backreferences | $1–$9 | Supported |
| Character class shortcuts | \d, \w, \s | Supported |
| Flags | /i, /m, /s, /x, /g, /e | Supported |
| Possessive quantifiers | *+, ++ | Not supported |
| Unicode properties | \p{...} | Not supported |
Performance Tips
- Compiled PCRE2 patterns are cached (128-slot cache) for fast repeated matching
- Use anchors (
^,$) when possible to limit the search space - Prefer
index()for simple substring searches (no regex overhead) - Use
replace_all()for literal string replacement (faster than regex) - Use non-capturing groups
(?:...)when you don't need the captured text
Troubleshooting
Pattern not matching?
- Check anchor usage —
^and$match start/end of the entire string by default - Verify escaping — special characters need a backslash
- Use the
/iflag for case-insensitive matching - Test pattern components separately
Captures not working?
- Ensure the pattern actually matched before calling
captures()or using$1–$9 - Remember
captures()[0]is the full match — groups start at index 1 - Captures are cleared on each new match — save them immediately
$1–$9returnundefif the group doesn't exist
if ($text =~ /(\w+)/) {
my array @saved = captures(); # Save immediately!
# Don't do another match before using @saved
say("First word: " . $saved[1]);
}