Python Regex Cheat Sheet

Basic Pattern Syntax

Pattern Meaning:

.   Any character except newline
^   Start of string
$   End of string
*   0 or more repetitions
+   1 or more repetitions
?   0 or 1 repetition
{n} Exactly n repetitions
{n,}    n or more
{n,m}   Between n and m

Character Classes

Pattern Meaning:

[abc]   a, b, or c
[^abc]  NOT a, b, or c
[a-z]   Lowercase letters
[A-Z]   Uppercase letters
[0-9]   Digits
[a-zA-Z0-9_]    Alphanumeric + underscore

Special Sequences

Pattern Meaning:

\d  Digit (0–9)
\D  Non-digit
\w  Word char (a-z, A-Z, 0-9, _)
\W  Non-word
\s  Whitespace
\S  Non-whitespace
\b  Word boundary
\B  Not a word boundary

Quantifiers (Greedy vs Lazy)

Pattern Meaning:

*   Greedy (max match)
*?  Lazy (min match)
+?  Lazy version of +
{n,m}?  Lazy range

Example:

re.findall(r"<.*?>", " ")
# ['', '']

Groups & Capturing

Pattern Meaning:

(abc)   Capturing group
(?:abc) Non-capturing group
(?Pabc)   Named group
\1  Backreference
(?P=name)   Named backreference

Example:

m = re.search(r"(?P\w+)", "hello")
m.group("word")  # 'hello'

Alternation

Pattern Meaning:

`a  b`

Example:

re.findall(r"cat|dog", "cat dog bird")
# ['cat', 'dog']

Lookarounds

Pattern Meaning:

(?=...) Positive lookahead
(?!...) Negative lookahead
(?<=...)    Positive lookbehind
(?

Example:
re.findall(r"\w+(?=!)", "Hi! Hello!")
# ['Hi', 'Hello']



Flags



Flag
Meaning




re.IGNORECASE / re.I
Case-insensitive


re.MULTILINE / re.M
^ and $ per line


re.DOTALL / re.S
. matches newline


re.VERBOSE / re.X
Allow comments



Example:
re.search(r"hello", "HELLO", re.I)



Common re Functions
re.search()
Find first match anywhere:
re.search(r"\d+", "abc123")


re.match()
Match at start only:
re.match(r"\d+", "123abc")


re.findall()
Return all matches:
re.findall(r"\d+", "a1b2c3")
# ['1', '2', '3']


re.finditer()
Iterator of match objects:
for m in re.finditer(r"\d+", "a1b2"):
    print(m.group())


re.sub()
Replace matches:
re.sub(r"\d+", "#", "a1b2")
# 'a#b#'


re.split()
Split by regex:
re.split(r"\s+", "a b   c")
# ['a', 'b', 'c']


re.compile()
Precompile pattern:
pattern = re.compile(r"\d+")
pattern.findall("123 abc 456")



Escaping Special Characters
These must be escaped with \:
. ^ $ * + ? { } [ ] \ | ( )


Example:
re.findall(r"\.", "a.b.c")



Raw Strings (Important in Python)
Always use r"" for regex:
r"\n"   # newline in regex
"\\n"   # same but harder to read

---

###Practical Patterns

Email (simple):
[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+


URL:
https?://[^\s]+


Integer:
^-?\d+$


Floating number:
^-?\d+(\.\d+)?$


Whitespace trim:
^\s+|\s+$


---

###Common mistakes in python re
- re.match() ≠ re.search()
- Lookbehinds must be fixed-width 
- Use re.escape() for dynamic input: 
    - re.escape("a+b")  # a\+b

---

###Mini Reference Summary
\d \w \s      → digit, word, space
^ $           → start, end
* + ?         → repeat
() (?:)       → groups
|             → OR
(?=) (?!)     → lookahead
(?<=) (?

Flag	Meaning
re.IGNORECASE / re.I	Case-insensitive
re.MULTILINE / re.M	^ and $ per line
re.DOTALL / re.S	. matches newline
re.VERBOSE / re.X	Allow comments

Basic Pattern Syntax

Character Classes

Special Sequences

Quantifiers (Greedy vs Lazy)

Groups & Capturing

Alternation

Lookarounds

Flags

Common re Functions

re.search()

re.match()

re.findall()

re.finditer()

re.sub()

re.split()

re.compile()

Escaping Special Characters

Raw Strings (Important in Python)

Join the Newsletter