Regex Patterns

This module contains a modest but growing collection of useful regular expressions, useful for extracting things like URLs, monetary values, and more.

Data:

URL_REGEX

A compiled regular expression for extracting (probably) valid URLs.

DOMAIN_REGEX

A compiled regular expression for extracting domains from URLs.

HTTP_REGEX

A compiled regular expression for finding HTTP/S prefixes.

US_DOLLAR_REGEX

A compiled regular expression finding USD monetary amounts.

TITLEWORD_REGEX

A compiled regular expression for finding basic title-cased words.

NUMBER_REGEX

A compiled regular expression for finding raw numbers.

NONALPHA_REGEX

A compiled regular expression for finding non-alphanumeric values.

URL_REGEX = re.compile('((?:https?:\\/\\/(?:www\\.)?)?[-a-zA-Z0-9@:%._\\+~#=]{1,4096}\\.[a-z]{2,6}\\b(?:[-a-zA-Z0-9@:%_\\+.~#?&//=]*))')

A compiled regular expression for extracting (probably) valid URLs.

DOMAIN_REGEX = re.compile('(?:http[s]?\\:\\/\\/)?(?:www(?:s?)\\.)?([\\w\\.\\-]+)(?:[\\\\\\/](?:.+))?')

A compiled regular expression for extracting domains from URLs. Can be useful in a pinch but we recommend using the pewtils.http.extract_domain_from_url() instead.

HTTP_REGEX = re.compile('^http(?:s)?\\:\\/\\/')

A compiled regular expression for finding HTTP/S prefixes.

US_DOLLAR_REGEX = re.compile('(\\$(?:[1-9][0-9]{0,2}(?:(?:\\,[0-9]{3})+)?(?:\\.[0-9]{1,2})?))\\b')

A compiled regular expression finding USD monetary amounts.

TITLEWORD_REGEX = re.compile('\\b([A-Z][a-z]+)\\b')

A compiled regular expression for finding basic title-cased words.

NUMBER_REGEX = re.compile('\\b([0-9]+)\\b')

A compiled regular expression for finding raw numbers.

NONALPHA_REGEX = re.compile('[^\\w]')

A compiled regular expression for finding non-alphanumeric values.