Open Source Blog
News about Google's open source student programs and software releases
RE2: a principled approach to regular expression matching
Thursday, March 11, 2010
Regular expressions
are one of computer science's shining examples of the benefits of good computer science theory. They were originally developed by theorists as a way to describe infinite sets, but
Ken Thompson
introduced them to programmers as a way to describe text patterns in his implementation of the text editor
QED
for
CTSS
.
Dennis Ritchie
followed suit in his own implementation of QED, for
GE-TSS
. Thompson and Ritchie would go on to create Unix, and they brought regular expressions with them. By the late 1970s, regular expressions were a key feature of the Unix landscape, in tools such as ed, sed, grep, egrep, awk, and lex. They remain a key feature of the open source landscape today, in those venerable Unix tools and at the core of new languages like Perl, Python, and JavaScript.
The feature-rich regular expression implementations of today are based on a backtracking search with a
potential for exponential run time
and unbounded stack usage. At Google, we use regular expressions as part of the interface to many external and internal systems, including
Code Search
,
Sawzall
, and
Bigtable
. Those systems process large amounts of data; exponential run time would be a serious problem. On a more practical note, these are multithreaded C++ programs with fixed-size stacks: the unbounded stack usage in typical regular expression implementations leads to stack overflows and server crashes. To solve both problems, we've built a new regular expression engine, called
RE2
, which is based on automata theory and guarantees that searches complete in linear time with respect to the size of the input and in a fixed amount of stack space.
Today, we released RE2 as an open source project. It's a mostly drop-in replacement for PCRE's C++ bindings
and is available under a BSD-style license. See the
RE2 project page
for details.
By Russ Cox, Software Engineering Team
Labels
gsoc
423
releases
182
conference
90
gci
80
ghop
55
meetups
49
Linux
28
GSoC Meetups
25
Python
25
students
25
project hosting
24
open source release
23
hackathon
21
App Engine
16
C++
16
Git
14
OSCON
14
JavaScript
13
library
13
Eclipse
12
KDE
12
games
12
GNOME
11
student programs
11
testing
11
Android
10
R
10
BSD
9
Go
9
Java
9
accessibility
9
education
9
security
9
Chrome
8
HTML5
8
Subversion
8
awards
8
maps
8
Chromium
7
GSoC 10 Things
7
Google Cloud Platform
7
Google Earth
7
Selenium
7
database
7
fonts
7
licensing
7
usability
7
Django
6
Google I/O
6
Samba
6
contest
6
documentation
6
events
6
machine learning
6
science
6
Dart
5
Free Software Foundation
5
GCC
5
Gerrit
5
Haskell
5
government
5
standards
5
statistics
5
Creative Commons
4
GNU
4
GitHub
4
Perl
4
mentors
4
mobile
4
protocol buffers
4
season of usability
4
webdriver
4
BioJS
3
C
3
CSS
3
Google Compute Engine
3
JSON
3
Mercurial
3
PHP
3
Unicode
3
fun propulsion lab
3
internationalization
3
open data
3
patents
3
profiles
3
translation
3
Haiku
2
Kubernetes
2
Objective-C
2
deep learning
2
hardware
2
ios
2
k8s
2
metabrainz
2
research
2
time zones
2
wrap-up
2
.NET
1
ADC
1
BigQuery
1
C#
1
FOSSASIA
1
Firebase
1
GIS
1
Google App Engine
1
HUES
1
LIDAR
1
NRNB
1
Neural Networks
1
OpenMRS
1
PowerShell
1
Ruby
1
SCoRe
1
SLAM
1
Science Journal
1
ZuriHac
1
algorithms
1
angular
1
artificial intelligence
1
audio
1
australia
1
bazel
1
big data
1
cardboard
1
clojure
1
compression
1
computer vision
1
dataset
1
debugging
1
energy
1
functional programming
1
gcloud
1
gmail
1
images
1
language
1
lisp
1
logo
1
making
1
melange
1
musicbrainz
1
natural language
1
nmap
1
peer bonus program
1
performance
1
report card
1
robotics
1
sugar labs
1
ui automation
1
virtual reality
1
webvr
1
zopfli
1
Archive
2016
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Subscribe
Google Summer of Code
on
Google Code-in
on
Follow @gsoc
Visit
Google Open Source Programs Office
for more information.