Regular Expressions
A regular expression (or regex) is a tiny pattern language for describing text. You use it to ask questions like "does this string contain a number?", to find parts of a string, or to replace matching chunks. Ruby has regex built right into the language.
Writing a Regex
The short form uses forward slashes:
$ irb
>> /cat/
=> /cat/
>> /cat/.class
=> Regexp
>> exit
/cat/ is a Regexp literal. You can also build one with
Regexp.new('cat'), but the slash form is what you will see almost
everywhere.
Matching a String
The =~ operator returns the position of the first match, or nil:
$ irb
>> /cat/ =~ 'my cat sleeps'
=> 3
>> /dog/ =~ 'my cat sleeps'
=> nil
>> exit
If you only want a yes or no answer, match? is clearer:
$ irb
>> 'my cat sleeps'.match?(/cat/)
=> true
>> 'my cat sleeps'.match?(/dog/)
=> false
>> exit
If you want the match itself, use match. It returns a MatchData
object or nil:
$ irb
>> 'my cat sleeps'.match(/cat/)
=> #<MatchData "cat">
>> 'my cat sleeps'.match(/dog/)
=> nil
>> exit
Metacharacters
A regex is only useful when the pattern is more flexible than a plain string. Metacharacters give you that flexibility. The most useful ones for a beginner:
|
any single character except a newline |
|
the preceding item, zero or more times |
|
the preceding item, one or more times |
|
the preceding item, zero or one time |
|
a digit (0 to 9) |
|
a word character (letter, digit, underscore) |
|
a whitespace character (space, tab, newline) |
|
any one character from the set |
|
start of the string or line |
|
end of the string or line |
|
alternative (either / or) |
A few examples in irb:
$ irb
>> 'abc123'.match?(/\d/)
=> true
>> 'abc'.match?(/\d/)
=> false
>> 'hello world'.match?(/^hello/)
=> true
>> 'hello world'.match?(/world$/)
=> true
>> 'cat or dog'.match?(/cat|dog/)
=> true
>> exit
Curly braces set an exact count:
$ irb
>> '555-1234'.match?(/\d{3}-\d{4}/)
=> true
>> 'abc'.match?(/\d{3}/)
=> false
>> exit
Capture Groups
Parentheses around part of a pattern create a capture group. The
matched text is available through MatchData:
$ irb
>> md = /(\d{4})-(\d{2})-(\d{2})/.match('Today is 2026-04-19.')
=> #<MatchData "2026-04-19" 1:"2026" 2:"04" 3:"19">
>> md[0]
=> "2026-04-19"
>> md[1]
=> "2026"
>> md[2]
=> "04"
>> md[3]
=> "19"
>> exit
md[0] is the whole match, md[1] is the first group, and so on.
Regex in String Methods
Regex really shines when combined with String methods like gsub,
sub, scan, and split:
$ irb
>> 'Call 030-123456 or 040-789'.scan(/\d+/)
=> ["030", "123456", "040", "789"]
>> 'hello world'.gsub(/l/, 'L')
=> "heLLo worLd"
>> 'one, two,three , four'.split(/\s*,\s*/)
=> ["one", "two", "three", "four"]
>> exit
That last one is a practical little gem: it splits on commas while tolerating any amount of whitespace around them.
Case-Insensitive Matching
A modifier letter after the closing slash changes how the pattern
behaves. The most useful is i for case-insensitive:
$ irb
>> 'Hello World'.match?(/hello/)
=> false
>> 'Hello World'.match?(/hello/i)
=> true
>> exit
Escaping Special Characters
If you want to match a literal . or ? or (, escape it with a
backslash:
$ irb
>> 'example.com'.match?(/example\.com/)
=> true
>> 'examplexcom'.match?(/example\.com/)
=> false
>> exit
Without the backslash, . would match any character at all, and
'examplexcom' would also match, which is almost never what you
want.
A Practical Example
A common beginner task: pull all hashtags out of a tweet-like string.
$ irb
>> text = 'Learning #ruby and #rails, loving #programming!'
=> "Learning #ruby and #rails, loving #programming!"
>> text.scan(/#\w+/)
=> ["#ruby", "#rails", "#programming"]
>> exit
That one line would take a dozen in most other languages.
| Regex is a deep topic. You do not need to memorize everything. Learn the handful of metacharacters on this page, and reach for a cheat sheet (or a site like https://rubular.com) for anything more complex. |