Wednesday, June 19, 2013

Useful Regular Expressions for Notepad++

Adding a semicolon to each line where it doesn't end with semicolon

Adding a particular line before each line

I had a requirement where in the document I had lines like
D:\abc\yogesh.sql
And the requirement was to convert it into


EXECUTING yogesh.sql
D:\abc\yogesh.sql



Let me explain the regular expression
((.+)\\(.+))
\1 will represent the complete selected line i.e. D:\abc\yogesh.sql
\2 will represent the matter selected by first .+ i.e. D:\abc
\3 will represent the matter selecte dby second .+ i.e. yogesh.sql

Append * to end of each line

Remove * from end of each line

Change MM by dd or dd by MM in a date


You can interchange \1, \2 and \3 as per need.

Remove C++ comments from the file

This will remove the single line comments from the file
Removing the multiline comments is a 3 step process (Do the following)
Change the \r\n to a custom string of your choice (which you think is not already present in the file)

Now you have to find the string between /* and */ and replace it with empty string.

Please note that regular expressions are greedy, they'll try to match as much as they can.
For example

/th.*s/ -- This will start matching from where it finds th and it matches till the last available s.
If the string is this string matches the values
It'll match the complete string, where as the requirement was to match the only word this
http://www.regular-expressions.info/javascriptexample.html

Solution : try this /th[^s]s/ or /th.*?s/
Use the following to remove multiline comments.


Let me explain the regular expression
\/       # a forward slash
\*       # followed by an asterisk
[^\*\/]+ # keep matching until you find */ i.e. end of multiline comment.
\*\/     # This is to include */ as well in the selection.

Another way to achieve the same effect is
\/\*.*?\*\/
// Non-greedy quantifiers have the same syntax as regular greedy ones, 
// except with the quantifier followed by a question-mark. 
// For example, a non-greedy pattern might look like: "/A[A-Z]*?B/". 
// In English, this means "match an A, followed by only as many capital 
// letters as are needed to find a B."

// One little thing to look out for is the fact that 
// the pattern "/[A-Z]*?./" will always match zero capital letters. 
// If you use non-greedy quantifiers, watch out for matching 
// too little, which is a symmetric danger.

\/       # a forward slash
\*       # followed by an asterisk
.*?\*\/  # match only that many characters that are required to find a */

Now replace mynextlineReplacementString with \r\n (while selecting Extended mode)

Removing the preceding numbers from each line


Removing the lines which do not contain dot (.) character


Let me explain the regular expression
^            # starts the line.
([^\.]+)     # 1 or more characters (other than dot)
$            # ends the line.

5 comments:

  1. nice explanation but very complex to understand :)

    ReplyDelete
    Replies
    1. Well, that depends, I find it easy...You must know the basics of regex.
      If you don't know the basics of regex, don't try to understand, simply use it in notepad++ :)

      Delete
  2. Works great - you just saved me a lot of work. Thanks!

    ReplyDelete
    Replies
    1. You can suggest more scenarios, which can be useful. I can add more here...

      This can be really helpful to people who don't know regular expressions.

      Delete
  3. How would delete just the ending semi-colon in this string without disturbing the rest of the string?:
    9;10;2009;Hamilton Bulldogs;Rockford IceHogs;

    Thanks!

    ReplyDelete