Tuesday, January 25, 2011

Regular Expression

my_column has values like Example: AA001
Query to list any character not matching the format of ([A-Za-z][A-Za-z][0-9][0-9][0-9]) will be listed

select my_column from my_table where my_column !~ '([A-Za-z][A-Za-z][0-9][0-9][0-9])';

Matching a list of characters

Unlike the previous examples, these next regular expressions carry out queries using a pattern defined in a "class," a list of characters enclosed by brackets, [ ]. Think of a class as a simplified way of expressing logical ORs within a SQL statement.



Study the results returned by the queries:

SELECT record FROM myrecords WHERE record ~ '[a]';
SELECT record FROM myrecords WHERE record ~ '[A]';
SELECT record FROM myrecords WHERE record ~* '[a]';
SELECT record FROM myrecords WHERE record ~ '[ac]';
SELECT record FROM myrecords WHERE record ~ '[ac7]';
SELECT record FROM myrecords WHERE record ~ '[a7A]';
SELECT record FROM myrecords WHERE record ~* '[ac7]';

What happens when you search for a character, the letter z, that is not present in any string in the table?

SELECT record FROM myrecords WHERE record ~ '[z]';
SELECT record FROM myrecords WHERE record ~ '[z7]';

This statement excludes all strings with the characters 4 OR a:

SELECT record FROM myrecords WHERE record !~ '[4a]';

The use of the dash, -, in the class implies a search range--that is, to return all strings having any digits between 1 and 4:

SELECT record FROM myrecords WHERE record ~ '[1-4]';

Here are a few more range examples:

SELECT record FROM myrecords WHERE record ~ '[a-c5]';
SELECT record FROM myrecords WHERE record ~* '[a-c5]';
SELECT record FROM myrecords WHERE record ~ '[a-cA-C5-7]';

Matching two or more characters in a pattern

The previous examples made pattern searches one character or class at a time. However, you can use many classes in a pattern. These three statements return the same string from the example table:

SELECT record FROM myrecords WHERE record ~ '3[a]';
SELECT record FROM myrecords WHERE record ~ '[3][a]';
SELECT record FROM myrecords WHERE record ~ '[1-3]3[a]';

These two statements also return the same records:

SELECT record FROM myrecords WHERE record ~ '[23][a]';
SELECT record FROM myrecords WHERE record ~ '[2-3][a]';

Always keep in mind the potential for confusion. This next returns nothing because there is no such character string, ac, in any of the records in the example table:

SELECT record FROM myrecords WHERE record ~ '[a][c]';

However, introducing a range for each class returns the records:

SELECT record FROM myrecords WHERE record ~ '[a-b][b-c]';

Excluding strings in a class

To create a class of characters to exclude, insert the caret, ^, immediately after the left bracket of a class. Inserting at any other point other than as the first character in the class means you're looking for the caret in a string rather than excluding the class' character pattern.

Notice that queries can still return strings containing those excluded characters. Class exclusions prevent explicit searches using those characters.

This next statement excludes all digits from 0 to 9 from the target search. In other words, this expression returns strings that don't include digits.

SELECT record FROM myrecords WHERE record ~ '[^0-9]';

Making choices

The pipe | denotes alternation. In other words, it's a logical OR for pattern searches. Suppose you want to return all records with strings that begin with a or end with c:

SELECT record FROM myrecords WHERE record ~ '^a|c$';

This statement returns strings beginning either with a or 5, or ending with c:

SELECT record FROM myrecords WHERE record ~ '^a|c$|^5';

This next example performs a targeted search excluding digits and all lowercase letters. Control characters, spaces, and uppercase letters can all appear in the search pattern:

SELECT record FROM myrecords WHERE record ~ '[^0-9|^a-z]';

Repeating characters (quantifiers)

Sometimes you need to find strings that have repeated sets of the same character(s). You can find them with quantifiers--represented by the characters *, ?, and + as well as by digits enclosed within braces { }.

To find a sequence of 0 or more matches, use *:

SELECT record FROM myrecords WHERE record ~ 'a*'';

To find a sequence of one or more matches, use +:

SELECT record FROM myrecords WHERE record ~ 'b+';

To find a sequences of zero or one occurrence, use ?:

SELECT record FROM myrecords WHERE record ~ 'a?';

To find a sequences of exactly # matches, use {#}:

SELECT record FROM myrecords WHERE record ~ '[0-9]{3}';

To find a sequences of exactly # or more matches, use {#,}:

SELECT record FROM myrecords WHERE record ~ '[0-9]{4,}';

To find a sequences of # through ## (inclusive) matches, where # does not exceed ##, use {#, ##}:

SELECT record FROM myrecords WHERE record ~ '[a-c0-9]{2,3}';

1 comment:

Samir Sinha said...

Hi,
How to find the exact string search using regular expression as I want to search values which starts exactly with 'd-sam' and ends with a numeric value as 'd-sam1' or 'd-sam2' not as 'd-same'.
Thanks
Samir