Regular Expressions

A regular expression (or regex) is a tiny pattern language for describing text. You use it to ask questions like "does this string contain a number?", to find parts of a string, or to replace matching chunks. Ruby has regex built right into the language.

Writing a Regex

The short form uses forward slashes:

$ irb
>> /cat/
=> /cat/
>> /cat/.class
=> Regexp
>> exit

/cat/ is a Regexp literal. You can also build one with Regexp.new('cat'), but the slash form is what you will see almost everywhere.

Matching a String

The =~ operator returns the position of the first match, or nil:

$ irb
>> /cat/ =~ 'my cat sleeps'
=> 3
>> /dog/ =~ 'my cat sleeps'
=> nil
>> exit

If you only want a yes or no answer, match? is clearer:

$ irb
>> 'my cat sleeps'.match?(/cat/)
=> true
>> 'my cat sleeps'.match?(/dog/)
=> false
>> exit

If you want the match itself, use match. It returns a MatchData object or nil:

$ irb
>> 'my cat sleeps'.match(/cat/)
=> #<MatchData "cat">
>> 'my cat sleeps'.match(/dog/)
=> nil
>> exit

Metacharacters

A regex is only useful when the pattern is more flexible than a plain string. Metacharacters give you that flexibility. The most useful ones for a beginner:

.

any single character except a newline

*

the preceding item, zero or more times

+

the preceding item, one or more times

?

the preceding item, zero or one time

\d

a digit (0 to 9)

\w

a word character (letter, digit, underscore)

\s

a whitespace character (space, tab, newline)

[abc]

any one character from the set

^

start of the string or line

$

end of the string or line

|

alternative (either / or)

A few examples in irb:

$ irb
>> 'abc123'.match?(/\d/)
=> true
>> 'abc'.match?(/\d/)
=> false
>> 'hello world'.match?(/^hello/)
=> true
>> 'hello world'.match?(/world$/)
=> true
>> 'cat or dog'.match?(/cat|dog/)
=> true
>> exit

Curly braces set an exact count:

$ irb
>> '555-1234'.match?(/\d{3}-\d{4}/)
=> true
>> 'abc'.match?(/\d{3}/)
=> false
>> exit

Agentic Coding Tip: ^ / $ Anchor to Line, Not String

This is one of the most-missed points about Ruby regex, and an AI agent will almost always get it wrong on the first try. In Ruby, ^ matches the start of a line, not the start of the string. $ matches the end of a line. This matters the moment the input can contain a newline.

$ irb
>> "admin\nhacker".match?(/^admin$/)
=> true
>> exit

That string is not admin. It’s admin, then a newline, then hacker. If you use /^admin$/ to decide whether to grant privileges, an attacker who can inject a newline into the input has just bypassed your check.

The anchors you almost always want are \A (start of string, ignoring newlines) and \z (end of string, likewise):

$ irb
>> "admin\nhacker".match?(/\Aadmin\z/)
=> false
>> "admin".match?(/\Aadmin\z/)
=> true
>> exit

Claude will happily write /^admin$/ for a validation because that’s what the regex tutorial on the agent’s training corpus showed. The resulting code passes every test a beginner writes, and is exploitable. This same trap is why Rails' validates :email, format: { with: /\A[^@\s]@[^@\s]\z/ } uses \A / \z rather than ^ / $.

Rule to add to your project’s CLAUDE.md:

When a Ruby regex is used for validation or authorization
(as opposed to finding a pattern inside larger text), anchor
with `\A` / `\z`, never `^` / `$`. `\A` / `\z` mean "start
and end of the whole string." `^` / `$` mean "start and end
of any line," which newlines in the input can bypass. Reserve
`^` / `$` for text processing where line boundaries are the
explicit intent.

Capture Groups

Parentheses around part of a pattern create a capture group. The matched text is available through MatchData:

$ irb
>> md = /(\d{4})-(\d{2})-(\d{2})/.match('Today is 2026-04-19.')
=> #<MatchData "2026-04-19" 1:"2026" 2:"04" 3:"19">
>> md[0]
=> "2026-04-19"
>> md[1]
=> "2026"
>> md[2]
=> "04"
>> md[3]
=> "19"
>> exit

md[0] is the whole match, md[1] is the first group, and so on.

Regex in String Methods

Regex really shines when combined with String methods like gsub, sub, scan, and split:

$ irb
>> 'Call 030-123456 or 040-789'.scan(/\d+/)
=> ["030", "123456", "040", "789"]
>> 'hello world'.gsub(/l/, 'L')
=> "heLLo worLd"
>> 'one, two,three , four'.split(/\s*,\s*/)
=> ["one", "two", "three", "four"]
>> exit

That last one is a practical little gem: it splits on commas while tolerating any amount of whitespace around them.

Case-Insensitive Matching

A modifier letter after the closing slash changes how the pattern behaves. The most useful is i for case-insensitive:

$ irb
>> 'Hello World'.match?(/hello/)
=> false
>> 'Hello World'.match?(/hello/i)
=> true
>> exit

Escaping Special Characters

If you want to match a literal . or ? or (, escape it with a backslash:

$ irb
>> 'example.com'.match?(/example\.com/)
=> true
>> 'examplexcom'.match?(/example\.com/)
=> false
>> exit

Without the backslash, . would match any character at all, and 'examplexcom' would also match, which is almost never what you want.

A Practical Example

A common beginner task: pull all hashtags out of a tweet-like string.

$ irb
>> text = 'Learning #ruby and #rails, loving #programming!'
=> "Learning #ruby and #rails, loving #programming!"
>> text.scan(/#\w+/)
=> ["#ruby", "#rails", "#programming"]
>> exit

That one line would take a dozen in most other languages.

Regex is a deep topic. You do not need to memorize everything. Learn the handful of metacharacters on this page, and reach for a cheat sheet (or a site like https://rubular.com) for anything more complex.