Site Moved

This site has been moved to a new location - Bin-Blog. All new post will appear at the new location.

Bin-Blog

Case Conversion using Regular Expressions in Perl

As most of you are already aware, Perl has very powerful regular expression support. You can do things with regular expression in perl that cannot be done in any other language. One good example for this is case conversion using regular expression in perl. You can match a string and change its case when you are printing it back. For example you can convert 'case conversion using regular expressions in perl' to 'Case Conversion Using Regular Expressions In Perl' using just a regular expression.

I recently had the need for this when I was converting one of my old sites, BinnyVA, from pure HTML site to a site with PHP backend. I also wanted to make the code valid when I was converting it - and this involved make the tags and attributes lowercase. For this job, I turned to an old trusted friend of mine - Perl.

I have never used this particular feature of regular expression before - so I had to search for some time to find the answer. Unfortunately, I did not find any articles on this topic - the best I could find was a small reference within a tutorial about perl. So, I am creating a post for this topic - to aid future searchers in the same quest.

As an example, let us take this sentence.

the baby's blood type? human, mostly.

We want to convert it to title case ie.

The Baby's Blood Type? Human, Mostly.

The regular expression to do this is...

s/(\b)([a-z])/\1\u\2/g;

This is what it does...

s/
(\b) #Get word's preceding char must be a word boundary(\b)
([a-z]) #Get the first lowercase letter after the word boundary
/  #The Replacements...
\1  #Put the word boundary back in.
\u\2 #'\u' uppercases the next character
/g;

The full program looks like this...

#!/usr/bin/perl
$_ = 'the baby\'s blood type? human, mostly.';

s/(\b)([a-z])/\1\u\2/g;

print;

Try converting

computer, did we bring batteries? computer?

to

COMPUTER! Did we bring batteries? computer?

We can do this using the regexp...s

(computer), ([a-z])/\U\1\E! \u\2

See the part \U\1\E? This will uppercase every character from the \U escape to \E. In this case, It uppercases the full word 'COMPUTER'.

You can use the following escape sequences to change the case.

\l (Small L)
Lowercase next character
\u
Uppercase next character
\L
Lowercase until \E
\U
Uppercase until \E
\E
End case modification

I tired to do in in other languages like PHP and found it impossible. What about your favorite language? Can you convert the case of a string using just a regular expression?

Filed Under...

Read More...

Inline Perl code in HTML

I would always recommend PHP over Perl when it comes to web development. PHP was created with just one purpose - Web Development. Perl, on the other hand, can be used in many different ways - one of which is web development. I began using Perl much before I began using PHP. I have been using PHP for just over a year but I have been using perl for at least 3 years now. I have even written a tutorial on CGI using the Perl language.

One of the main advantage of using PHP is that you can embed PHP code inside HTML files - just give it the extension '.php' and the server will parse it and serve the resulting page. In Perl, you will have to embed HTML code in perl's print statements - this is a much tougher approach.

Solution

There are some programs like BML which will let you embed Perl script into HTML files - LiveJournal uses this approach. But the trouble with this approach is that you will have to have access to the server's 'httpd.conf' file. You will have to configure the server to parse all the files with the '.bml' extension with a script provided by you.

I tried to solve this problem a few months ago - I made a small script called H2P that will parse all file with the extension '.perl'. This don't need any server side configuration so the installation process was much simpler. The only necessary thing was '.htaccess' and the 'mod_rewrite' Apache module. Unfortunately, this project failed and failed miserably. And since I had the PHP option to fall back on, I did not go to great lengths to try to fix it.

Even thought that project was a failure, I created this little gem. This function will let you execute a small bit of Perl code embedded inside a HTML file like this...


<p>This is just html code. <?perl print "This is Perl code."; ?>
Html code once again.</p>

The final output will be like this.


<p>This is just html code. This is Perl code.
Html code once again.</p>

Code


sub showFile {
 my $file = shift; #The first argument is the file name.
 open(IN,$file) or die "Can't read display page '$file' : $!";
 my @lines = <IN>;
 close IN;

 my $count = 0;
 
 for(my $i=0; $i<scalar(@lines); $i++) {
  my $line = $lines[$i];
  
  if($line =~ /<\?perl/i) {
   #If there is a perl code embedded in this, get the code to the finishing point
   my $code = "";
   my $j;
   for($j=$i; $j<scalar(@lines); $j++) {
    my $this_line = $lines[$j];
    $code .= $this_line;
    last if($this_line =~ /\?>/);
   }
   if($code =~ /(.*)<\?perl(.+?)\?>(.*)/is) { #Take out the code.
    my $text_before_code = $1;
    $code = $2;
    my $text_after_code = $3;
    
    print $text_before_code;
    
    #Execute the code.
    eval $code;
    die $@ if $@; #Die if there was an error.
 
    print $text_after_code;
   }
   $i = $j;
 
  } else {
   print $line;
  }
 }
}

#Print the contents of a file after parsing it.
showFile('file.html');

Problems

No Automatic Parsing
Does not parse the files with a specific extension - you will have to call the file via the 'showFile(file_name)' function.
Local scope
The code within the <?perl ... ?> is 'eval'ed. So all the variables that was created in that block exists only within that block. If you have used the 'use strict;' option(and you must), this will limit you very much. If you mush use a variable from another block, you will have to make it a global variable using 'use vars qw($variable_name);'. Still, better than nothing.

Filed Under...

Read More...

SedGUI progam is Complete

Remember the SedGUI script we talked about eariler? I have finished the program. I know that it is a bit late, but think of it as my christmas present for you.

This is a GUI version of the Sed(Stream EDitor) program in Unix. Sed is a very powerful and very useful program - that said, it is very user unfriendly. So this is a GUI version for the Sed progarm - you get the functionality of Sed but with a better interface.

The program is in Perl Tk - around 900 lines long. It is fairly flexible but kind of slow. Download SedGUI from my Bin-Co website and try it out. Remember it is a Beta release, so if you notice any problems with it, please let me know.

Filed Under...

Categories :
Technorati Tags:
Read More...

Learning Sed - and Making SedGUI

I am trying to learn Sed(Stream EDitor). Sed is basically a small(but powerful) utility which you can use to filter files based on regular expressions. You can use it to

  • Find the lines that match the given regular expression.
  • Replace an expression with another.
  • Get the file contents from one point to another - the points can be a line number or even a regular expression.
  • And much more...

I am using the tutorial at http://www.grymoire.com/Unix/Sed.html. There is also a nice collection of tutorials at http://sed.sourceforge.net/grabbag/tutorials/.

Anyway since I know Regular Expression, I did not have any trouble learning the stuff. The trouble is using it. Don't get me wrong - Sed is a very powerful and very useful program - that sed(sorry, couldn't resist), it is very user unfriendly. If you know sed, you know that it has no GUI - it is a program that can only be run from command line. You define the regexps that must be matched, the file to be used and call the program. You can either see the results on screen or save it to another file. If you are saving it to another file - like with this command... sed 's/BEGIN/begin/' <old >new you have no clue whether any thing was changed. Another problem is that if there are multiple commands you wish to execute, the code quickly becomes un-handleable.

Despite all this issues, I could not abandon Sed - it is very useful for me. So I have decided to make my own little Sed like tool - SedGUI. Since this involves a lot of Regular Expressions, Perl is the obvious choice for the language. I will use Perl Tk to create a small GUI for the program. I have started the work on this program - I already have a working prototype. It is no where near release stable - so don't wait around for the download link. This is what I have done till now... It is nowhere near complete - I have just had one day of coding. I will let you know as soon as I release a beta version. Till then, use Sed.

Filed Under...

Categories : Technorati Tags:
Read More...

Perl Source Compressor for JavaScript

I created this script in the middle of the last project - Blogger Post Calendar. I wanted to speed up the loading of the external javascript file - so I created a script that would automatically compress javascript code. And perl was the obvious choice as its language.

The code can be downloaded from the JavaScript Source Compressor page.

Features

  • Removes Comments - both //... and /* ... */ style comments.
  • Removes Whitespaces - newlines, spaces and tabs.
  • Replaces all varable names with a shorter version(a,b,...,z,aa,ab,...). This feature is optional

perl SourceCompressor.pl --help

SourceCompressor is a perl script that can be used to compress the Javascript source file to a much smaller file. Replaces large variable names with smaller, generic names and removes unwanted white spaces.

Usage

perl compressor.pl <js_file.js> [options]
Example :
perl compressor.pl settings.js -display -verbose -change-name

Command Line Options

The first argument MUST be the file name of the JavaScript file.
-verbose           Show details of what is going on.
-readable          Use \\n instead of ';' as the command seperator.
-change-name  Rename the variables to a shorter version. Enabled by default.
-remove-vars   Remove the 'var' keyword - this is asking for trouble.
-print                 Outputs the compressed data instead of writing it to a file.

To Do

  • BUG: Changes the text if the variable name appears in the coments/strings/other places.
  • The variables appearing more times must have shorter number of chars.
  • Change the variables that don't use the 'var' decleration method. - Is this possible?
  • Implement Unremoveable comments - needed for copyright statements.
  • Change the names of the functions/classes too?

Similar Projects

Read More...

Subscribe to : Posts