Site Moved

This site has been moved to a new location - Bin-Blog. All new post will appear at the new location.

Bin-Blog

Case Conversion using Regular Expressions in Perl

As most of you are already aware, Perl has very powerful regular expression support. You can do things with regular expression in perl that cannot be done in any other language. One good example for this is case conversion using regular expression in perl. You can match a string and change its case when you are printing it back. For example you can convert 'case conversion using regular expressions in perl' to 'Case Conversion Using Regular Expressions In Perl' using just a regular expression.

I recently had the need for this when I was converting one of my old sites, BinnyVA, from pure HTML site to a site with PHP backend. I also wanted to make the code valid when I was converting it - and this involved make the tags and attributes lowercase. For this job, I turned to an old trusted friend of mine - Perl.

I have never used this particular feature of regular expression before - so I had to search for some time to find the answer. Unfortunately, I did not find any articles on this topic - the best I could find was a small reference within a tutorial about perl. So, I am creating a post for this topic - to aid future searchers in the same quest.

As an example, let us take this sentence.

the baby's blood type? human, mostly.

We want to convert it to title case ie.

The Baby's Blood Type? Human, Mostly.

The regular expression to do this is...

s/(\b)([a-z])/\1\u\2/g;

This is what it does...

s/
(\b) #Get word's preceding char must be a word boundary(\b)
([a-z]) #Get the first lowercase letter after the word boundary
/  #The Replacements...
\1  #Put the word boundary back in.
\u\2 #'\u' uppercases the next character
/g;

The full program looks like this...

#!/usr/bin/perl
$_ = 'the baby\'s blood type? human, mostly.';

s/(\b)([a-z])/\1\u\2/g;

print;

Try converting

computer, did we bring batteries? computer?

to

COMPUTER! Did we bring batteries? computer?

We can do this using the regexp...s

(computer), ([a-z])/\U\1\E! \u\2

See the part \U\1\E? This will uppercase every character from the \U escape to \E. In this case, It uppercases the full word 'COMPUTER'.

You can use the following escape sequences to change the case.

\l (Small L)
Lowercase next character
\u
Uppercase next character
\L
Lowercase until \E
\U
Uppercase until \E
\E
End case modification

I tired to do in in other languages like PHP and found it impossible. What about your favorite language? Can you convert the case of a string using just a regular expression?

Filed Under...

3 Comments:

Nithin Raghuveer said...

How about:

s/([a-z])(([a-z]|[A-Z])+)/sprintf("%s%s", uc("$1"),lc("$2"))/ge;

OR

s/(\S)([\S]+)/sprintf("%s%s", uc("$1"),lc("$2"))/ge;

It converts to titlecase:
1. miXEd cASe eNtrIes and multiple words with mixed case
2. In the second expression, entries beginning with numbers are allowed

Unknown said...

GNU sed supports case conversions: see The s Command in the GNU sed Manual.

outis said...

Note that "\b" is a zero-width assertion; it matches between characters, rather than a character, hence there's no need to capture it. In regex-speak, it's equivalent to:

(?:(?<=\W)(?=\w)|(?<=\w)(?=\W))

As for title case, Nithin Raghuveer's examples can be simplified using "\L":

s/\b(\w)(\w+)/\u$1\L$2/g;