Regular Expressions Mastery Across Languages

!Architecture Overview

Regular Expressions Mastery Across Languages

Introduction

Prerequisites

Requirement	Details
Basic setup and tooling	Basic setup and tooling

Figure: Code pattern examples for regular expressions mastery across languages—syntax comparison, idiomatic approaches, performance characteristics, and common pitfalls.

Figure: Best practices implementation for regular expressions mastery across languages—error handling, testing strategies, maintainability patterns, and documentation standards.

Figure: Production readiness checklist for regular expressions mastery across languages—logging, monitoring, performance tuning, and security hardening.

Regular expressions (regex) provide powerful pattern matching for text processing. This guide covers regex syntax—character classes, quantifiers, anchors, groups, capturing, lookaheads/lookbehinds—with practical examples for validation, extraction, and replacement across JavaScript, Python, C#, and Java.

Basic Patterns

Literal Characters and Metacharacters

Simple matching:

// JavaScript
const text = "Hello World";

// Literal match
/Hello/.test(text);  // true
/hello/.test(text);  // false (case-sensitive by default)

// Case-insensitive flag
/hello/i.test(text);  // true

// Match any single character (.)
/H.llo/.test("Hello");   // true
/H.llo/.test("Hallo");   // true
/H.llo/.test("H123lo");  // false (. matches one char)

// Escape metacharacters
/example\.com/.test("contoso.com");  // true
/\$19\.99/.test("$19.99");           // true

// Metacharacters requiring escape: . ^ $ * + ? { } [ ] \ | ( )

Character Classes

Predefined classes:

# Python
import re

## \d = digit [0-9]
re.search(r'\d+', 'Order 12345')  # Matches '12345'

![\d = digit [0-9]](/images/articles/programming-languages/2025-05-05-regular-expressions-mastery-across-languages-sec5-generic.jpg)


## \w = word character [a-zA-Z0-9_]
re.search(r'\w+', 'hello_world')  # Matches 'hello_world'

![\w = word character [a-zA-Z0-9_]](/images/articles/programming-languages/2025-05-05-regular-expressions-mastery-across-languages-sec6-generic.jpg)


## \s = whitespace [ \t\n\r\f\v]
re.search(r'\s+', 'hello   world')  # Matches '   '

![\s = whitespace [ \t\n\r\f\v]](/images/articles/programming-languages/2025-05-05-regular-expressions-mastery-across-languages-sec7-generic.jpg)


## Negated classes:




## \D = non-digit [^0-9]

![\D = non-digit [^0-9]](/images/articles/programming-languages/2025-05-05-regular-expressions-mastery-across-languages-sec9-generic.jpg)

## \W = non-word character [^a-zA-Z0-9_]

![\W = non-word character [^a-zA-Z0-9_]](/images/articles/programming-languages/2025-05-05-regular-expressions-mastery-across-languages-sec10-generic.jpg)

## \S = non-whitespace





## Custom character class
re.search(r'[aeiou]', 'hello')  # Matches 'e' (first vowel)
re.search(r'[0-9]', 'abc123')   # Matches '1'
re.search(r'[^0-9]', '123abc')  # Matches 'a' (first non-digit)





## Ranges
re.search(r'[a-z]+', 'Hello')       # Matches 'ello'
re.search(r'[A-Z]+', 'Hello')       # Matches 'H'
re.search(r'[a-zA-Z]+', 'Hello123') # Matches 'Hello'
re.search(r'[0-9a-fA-F]+', 'FF00AA')  # Matches 'FF00AA' (hex)

C# examples:

using System.Text.RegularExpressions;

// Character class matching
Regex.IsMatch("Hello123", @"[a-z]+");      // false (lowercase only)
Regex.IsMatch("Hello123", @"[a-zA-Z]+");   // true
Regex.IsMatch("user@contoso.com", @"[\w@.]+");  // true

// Extract digits
var match = Regex.Match("Price: $199.99", @"\d+\.\d+");
Console.WriteLine(match.Value);  // "199.99"

Quantifiers

Repetition Patterns

Basic quantifiers:

// JavaScript
const patterns = {
```text
'*': 'Zero or more',
'+': 'One or more',
'?': 'Zero or one (optional)',
'{n}': 'Exactly n times',
'{n,}': 'At least n times',
'{n,m}': 'Between n and m times'```
};

// Examples
/\d+/.test('123');       // true - one or more digits
/\d*/.test('');          // true - zero or more digits
/colou?r/.test('color'); // true - 'u' is optional
/colou?r/.test('colour');// true

// Specific counts
/\d{4}/.test('2025');         // true - exactly 4 digits
/\d{2,4}/.test('99');         // true - 2 to 4 digits
/\d{2,4}/.test('12345');      // true - matches first 4
/\w{3,}/.test('hello');       // true - at least 3 word chars

// Phone number pattern
/\d{3}-\d{3}-\d{4}/.test('555-123-4567');  // true

Greedy vs lazy (non-greedy):

## Python
import re

































text = "<div>Content</div><div>More</div>"

## Greedy (default) - matches as much as possible
greedy = re.search(r'<div>.*</div>', text)
print(greedy.group())  # '<div>Content</div><div>More</div>'





## Lazy (non-greedy) - matches as little as possible
lazy = re.search(r'<div>.*?</div>', text)
print(lazy.group())  # '<div>Content</div>'





## Password validation (8-20 chars)
pattern = r'^.{8,20}$'
re.match(pattern, 'password123')  # Valid
re.match(pattern, 'short')        # None (too short)

Anchors and Boundaries

Position Matching

Start and end anchors:

// JavaScript
// ^ = start of string
// $ = end of string

/^Hello/.test('Hello World');   // true
/^Hello/.test('Say Hello');     // false

/World$/.test('Hello World');   // true
/World$/.test('World is big');  // false

// Exact match (start + end)
/^Hello World$/.test('Hello World');      // true
/^Hello World$/.test('Hello World!');     // false
/^Hello World$/.test('Say Hello World');  // false

// Validate format exactly
const emailPattern = /^[\w.-]+@[\w.-]+\.\w{2,}$/;
emailPattern.test('user@contoso.com');  // true
emailPattern.test('invalid email');     // false

Word boundaries:

## Python
import re

## \b = word boundary (between \w and \W)




## \B = non-word boundary

text = "The cat in the cathedral"





## Match whole word 'cat'
re.search(r'\bcat\b', text)  # Matches 'cat' (standalone)
re.search(r'\bcat\b', 'cathedral')  # None (part of word)





## Find all whole words
words = re.findall(r'\b\w+\b', "Hello, world! How are you?")
print(words)  # ['Hello', 'world', 'How', 'are', 'you']





## Replace whole word only
result = re.sub(r'\bcat\b', 'dog', text)
print(result)  # "The dog in the cathedral"

Groups and Capturing

Parentheses for Grouping

Capturing groups:

// JavaScript
// ( ) = capturing group

const text = "John Doe (555-1234)";
const pattern = /(\w+) (\w+) \((\d{3}-\d{4})\)/;
const match = text.match(pattern);

console.log(match[0]);  // "John Doe (555-1234)" - full match
console.log(match[1]);  // "John" - first capture group
console.log(match[2]);  // "Doe" - second capture group
console.log(match[3]);  // "555-1234" - third capture group

// Named capturing groups (ES2018)
const namedPattern = /(?<firstName>\w+) (?<lastName>\w+) \((?<phone>[\d-]+)\)/;
const namedMatch = text.match(namedPattern);

console.log(namedMatch.groups.firstName);  // "John"
console.log(namedMatch.groups.lastName);   // "Doe"
console.log(namedMatch.groups.phone);      // "555-1234"

Python named groups:

## Python
import re

## Named groups with ?P<name>
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, 'Date: 2025-05-15')





print(match.group('year'))   # '2025'
print(match.group('month'))  # '05'
print(match.group('day'))    # '15'

## Access as dictionary
print(match.groupdict())




## {'year': '2025', 'month': '05', 'day': '15'}





## Extract email components
email_pattern = r'(?P<user>[\w.-]+)@(?P<domain>[\w.-]+)\.(?P<tld>\w+)'
email_match = re.search(email_pattern, 'user@contoso.com')





print(email_match.group('user'))    # 'user'
print(email_match.group('domain'))  # 'example'
print(email_match.group('tld'))     # 'com'

C# named groups:

// C#
using System.Text.RegularExpressions;

var pattern = @"(?<area>\d{3})-(?<exchange>\d{3})-(?<number>\d{4})";
var match = Regex.Match("555-123-4567", pattern);


if (match.Success)
{
```text
Console.WriteLine(match.Groups["area"].Value);      // "555"
Console.WriteLine(match.Groups["exchange"].Value);  // "123"
Console.WriteLine(match.Groups["number"].Value);    // "4567"```
}

Non-capturing groups:

// (?: ) = non-capturing group (for grouping without capturing)

// Without non-capturing group
const withCapture = /(\d{3})-(\d{3})-(\d{4})/.exec('555-123-4567');
console.log(withCapture);  // ['555-123-4567', '555', '123', '4567']

// With non-capturing group
const withoutCapture = /(?:\d{3})-(\d{3})-(\d{4})/.exec('555-123-4567');
console.log(withoutCapture);  // ['555-123-4567', '123', '4567']

// Useful for alternation
/(https?|ftp):\/\//.test('https://contoso.com');  // true
/(?:https?|ftp):\/\//.test('ftp://files.com');    // true

Lookaheads and Lookbehinds

Zero-Width Assertions

Positive lookahead (?=):

// JavaScript
// (?= ) = positive lookahead (match if followed by pattern)

// Password must contain digit
/^(?=.*\d).{8,}$/.test('password123');  // true
/^(?=.*\d).{8,}$/.test('password');     // false

// Password must contain uppercase AND lowercase AND digit
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/.test('Pass1234');  // true
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/.test('password1'); // false

// Extract word before comma
/\w+(?=,)/.exec('apple,banana,orange');  // ['apple']

Negative lookahead (?!):

## Python
import re

## (?! ) = negative lookahead (match if NOT followed by pattern)





## Find 'q' not followed by 'u'
pattern = r'q(?!u)'
re.findall(pattern, 'Iraq Qatar queue')  # ['q'] (only in Iraq)





## Username: letters/digits, but cannot start with digit
username_pattern = r'^(?!\d)[a-zA-Z0-9_]{3,16}$'
re.match(username_pattern, 'user123')   # Valid
re.match(username_pattern, '123user')   # None (starts with digit)

Positive lookbehind (?<=):

## Python
## (?<= ) = positive lookbehind (match if preceded by pattern)





## Find price (digits after $)
pattern = r'(?<=\$)\d+(?:\.\d{2})?'
re.findall(pattern, 'Items: $19.99, $5, $150.00')




## ['19.99', '5', '150.00']


![['19.99', '5', '150.00']](/images/articles/programming-languages/2025-05-05-regular-expressions-mastery-across-languages-sec40-generic.jpg)

## Extract @mentions (alphanumeric after @)
mentions_pattern = r'(?<=@)\w+'
text = "Hello @alice and @bob_123!"
re.findall(mentions_pattern, text)  # ['alice', 'bob_123']

Negative lookbehind (?<!):

// C#
using System.Text.RegularExpressions;

// (?<! ) = negative lookbehind (match if NOT preceded by pattern)

// Find digits not preceded by $
var pattern = @"(?<!\$)\d+";
var matches = Regex.Matches("Price: $100 and 50 items", pattern);
// Matches: "100" in "$100" is skipped, "50" is matched

foreach (Match match in matches)
{
```text
Console.WriteLine(match.Value);  // "50"```
}

Practical Examples

Email Validation

Basic email pattern:

// JavaScript
const emailPattern = /^[\w.-]+@[\w.-]+\.\w{2,}$/;

// Valid emails
emailPattern.test('user@contoso.com');      // true
emailPattern.test('john.doe@company.org');  // true
emailPattern.test('test_123@sub.domain.co.uk');  // true

// Invalid emails
emailPattern.test('invalid');               // false
emailPattern.test('@contoso.com');          // false
emailPattern.test('user@');                 // false

// More comprehensive email validation
const strictEmail = /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

Phone Number Formats

Multiple formats:

## Python
import re

def validate_phone(phone):
```text
"""Validate US phone number in various formats."""
patterns = [
    r'^\d{3}-\d{3}-\d{4}$',           # 555-123-4567
    r'^\(\d{3}\) \d{3}-\d{4}$',       # (555) 123-4567
    r'^\d{10}$',                       # 5551234567
    r'^\+1-\d{3}-\d{3}-\d{4}$',       # +1-555-123-4567
]

return any(re.match(pattern, phone) for pattern in patterns)

Test

print(validate_phone('555-123-4567')) # True print(validate_phone('(555) 123-4567')) # True print(validate_phone('5551234567')) # True print(validate_phone('invalid')) # False

Extract and normalize phone numbers

def extract_phone(text):

"""Extract phone number and normalize to XXX-XXX-XXXX format."""
pattern = r'(?:\+1[-.]?)?\(?(\d{3})\)?[-. ]?(\d{3})[-. ]?(\d{4})'
match = re.search(pattern, text)
if match:
    return f'{match.group(1)}-{match.group(2)}-{match.group(3)}'
return None

print(extract_phone('Call me at (555) 123-4567')) # '555-123-4567' print(extract_phone('Phone: 555.123.4567')) # '555-123-4567'


## URL Parsing

**Extract URL components:**





```javascript
// JavaScript
const urlPattern = /^(https?):\/\/([^:\/\s]+)(?::(\d+))?(\/[^\s]*)?$/;

const url = 'https://contoso.com:8080/path/to/page?query=value';
const match = url.match(urlPattern);

if (match) {
```javascript
console.log('Protocol:', match[1]);  // 'https'
console.log('Domain:', match[2]);    // 'contoso.com'
console.log('Port:', match[3]);      // '8080'
console.log('Path:', match[4]);      // '/path/to/page?query=value'```
}

// Extract all URLs from text
const text = "Visit https://contoso.com or http://test.org for more info";
const urls = text.match(/https?:\/\/[^\s]+/g);
console.log(urls);  // ['https://contoso.com', 'http://test.org']

Data Extraction

Parse log files:

## Python
import re
from datetime import datetime

log_pattern = r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<message>.*)'

log_lines = [
```text
'2025-05-05 14:30:00 [INFO] Application started',
'2025-05-05 14:30:15 [ERROR] Database connection failed',
'2025-05-05 14:30:20 [WARN] Retrying connection',```
]

for line in log_lines:
```text
match = re.match(log_pattern, line)
if match:
    timestamp = datetime.strptime(match.group('timestamp'), '%Y-%m-%d %H:%M:%S')
    level = match.group('level')
    message = match.group('message')
    print(f'{level}: {message} at {timestamp}')


**Extract data from HTML:**

```csharp
// C#
using System.Text.RegularExpressions;

// Extract all links from HTML
var html = @"
```text
<a href='/home'>Home</a>
<a href='https://contoso.com'>Example</a>
<a href='/contact'>Contact</a>```
";

var linkPattern = @"<a\s+href=['""]([^'""]+)['""]>([^<]+)</a>";
var matches = Regex.Matches(html, linkPattern);

foreach (Match match in matches)
{
```text
var url = match.Groups[1].Value;
var text = match.Groups[2].Value;
Console.WriteLine($"{text}: {url}");```
}
// Output:
// Home: /home
// Example: https://contoso.com
// Contact: /contact

String Replacement

Find and replace:

// JavaScript
// Simple replacement
'hello world'.replace(/world/, 'JavaScript');  // 'hello JavaScript'

// Global replacement (all occurrences)
'foo bar foo'.replace(/foo/g, 'baz');  // 'baz bar baz'

// Case-insensitive replacement
'Hello WORLD'.replace(/world/gi, 'JavaScript');  // 'Hello JavaScript'

// Replacement with capturing groups
const date = '2025-05-15';
const formatted = date.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1');
console.log(formatted);  // '05/15/2025'

// Replacement with function
const text = 'Total: $100, Tax: $8, Shipping: $5';
const doubled = text.replace(/\$(\d+)/g, (match, amount) => {
```text
return '$' + (parseInt(amount) * 2);```
});
console.log(doubled);  // 'Total: $200, Tax: $16, Shipping: $10'

Python substitution:

## Python
import re

## Simple substitution
re.sub(r'apple', 'orange', 'I like apple pie')  # 'I like orange pie'





## Using captured groups
text = 'Name: John Doe, Age: 30'
result = re.sub(r'Name: (\w+) (\w+)', r'\2, \1', text)
print(result)  # 'Name: Doe, John, Age: 30'





## Substitution with function
def uppercase_match(match):
```text
return match.group().upper()

text = 'hello world from python' result = re.sub(r'\b\w+\b', uppercase_match, text) print(result) # 'HELLO WORLD FROM PYTHON'

Remove HTML tags

html = '

Hello world!

' clean = re.sub(r'<[^>]+>', '', html) print(clean) # 'Hello world!'


## Language-Specific Features

### JavaScript Flags





```javascript
// i = case-insensitive
/hello/i.test('HELLO');  // true

// g = global (find all matches)
'foo bar foo'.match(/foo/g);  // ['foo', 'foo']

// m = multiline (^ and $ match line boundaries)
const text = 'Line 1\nLine 2';
text.match(/^Line/gm);  // ['Line', 'Line']

// s = dotAll (. matches newlines)
/hello.world/s.test('hello\nworld');  // true

// u = unicode
/\u{1F600}/u.test('😀');  // true

// y = sticky (matches at exact position)
const pattern = /foo/y;
pattern.lastIndex = 4;
pattern.test('foo foo');  // true (matches at position 4)

Python re Module

import re

## Compile pattern for reuse
pattern = re.compile(r'\d+')
pattern.findall('123 abc 456')  # ['123', '456']





## Verbose mode (comments and whitespace ignored)
email_pattern = re.compile(r'''
```text
[\w.-]+    # username
@          # at symbol
[\w.-]+    # domain
\.         # dot
\w{2,}     # TLD```
''', re.VERBOSE)





## Methods
re.search(pattern, string)   # Find first match
re.match(pattern, string)    # Match at start
re.findall(pattern, string)  # Find all matches (list)
re.finditer(pattern, string) # Find all matches (iterator)
re.sub(pattern, repl, string)  # Replace
re.split(pattern, string)    # Split by pattern

C# Regex Options

Figure: Visual Studio C# – CodeLens, refactoring, and build output.

using System.Text.RegularExpressions;





// RegexOptions enumeration
var pattern = @"hello";

// Case-insensitive
Regex.IsMatch("HELLO", pattern, RegexOptions.IgnoreCase);

// Multiline
var text = "Line 1\nLine 2";
Regex.Matches(text, @"^Line", RegexOptions.Multiline);

// Compiled (faster for repeated use)
var compiled = new Regex(@"\d+", RegexOptions.Compiled);

// Timeout (prevent catastrophic backtracking)
var regex = new Regex(@"a+b+c+", RegexOptions.None, TimeSpan.FromSeconds(1));

Best Practices

Start Simple: Begin with basic patterns, add complexity gradually
Test Thoroughly: Use regex testers (regex101.com, regexr.com)
Use Non-Capturing Groups: (?:) when you don't need to capture
Avoid Greedy Quantifiers: Use lazy quantifiers (.*?) for HTML/XML
Escape Metacharacters: Always escape . $ ^ * + ? { } [ ] \ | ( )
Comment Complex Patterns: Use verbose mode in Python, comments in code

Architecture Decision and Tradeoffs

When designing software development solutions with Programming Languages, consider these key architectural trade-offs:

Approach	Best For	Tradeoff
Managed / platform service	Rapid delivery, reduced ops burden	Less customisation, potential vendor lock-in
Custom / self-hosted	Full control, advanced tuning	Higher operational overhead and cost

Recommendation: Start with the managed approach for most workloads and move to custom only when specific requirements demand it.

Validation and Versioning

Last validated: April 2026
Validate examples against your tenant, region, and SKU constraints before production rollout.
Keep module, CLI, and SDK versions pinned in automation pipelines and review quarterly.

Security and Governance Considerations

Apply least-privilege access using RBAC roles and just-in-time elevation for admin tasks.
Store secrets in managed secret stores and avoid embedding credentials in scripts or source files.
Enable audit logging, data protection policies, and periodic access reviews for regulated workloads.

Cost and Performance Notes

Define budgets and alerts, then monitor usage and cost trends continuously after go-live.
Baseline performance with synthetic and real-user checks before and after major changes.
Scale resources with measured thresholds and revisit sizing after usage pattern changes.

Official Microsoft References

https://learn.microsoft.com/
https://learn.microsoft.com/azure/
https://learn.microsoft.com/power-platform/
https://learn.microsoft.com/microsoft-365/

Public Examples from Official Sources

These examples are sourced from official public Microsoft documentation and sample repositories.
Documentation examples: https://learn.microsoft.com/training/
Sample repositories: https://github.com/microsoft
Prefer adapting these examples to your tenant, subscriptions, and governance requirements before production use.

Key Takeaways

Character classes ([a-z], \d, \w) match specific character sets
Quantifiers (*, +, ?, {n,m}) control repetition
Anchors (^, $, \b) match positions, not characters
Groups () capture submatches, (?:) groups without capturing
Lookaheads/lookbehinds (?=, ?!, ?<=, ?<!) enable zero-width assertions
Named groups improve readability and maintenance

Next Steps

Learn atomic groups (?>...) for performance optimization
Explore Unicode properties (\p{L}, \p{N}) for international text
Master conditional patterns (?(condition)yes|no)
Study catastrophic backtracking and prevention strategies

Additional Resources

Match patterns, not headaches.

Regular Expressions Mastery Across Languages