A useful Regular Expression rule


Make a folder called "Domain" and give it this rule. Then, for every item that has a URL associated with it in the "Net Location" folder, the "Domain" value will be the root of that URL, so that, for example, all stories clipped from a particular newspaper will have the domain "www.guardian.co.uk". The underlying URL will still be in the "Net Location" folder, so that double-clicking on the text will still take you there. But the Domain is often much more informative to human eyes.

The auto-assign rule for the "Domain" folder is this:

++:!+-:{1}:[Net Location]~~='http://([^/]+).*'

Reading left to right:
++: says "This is a slang rule"
!+- says "Run it after every item change"
{1} says "Set the value of this folder to whatever ends up between the first set of brackets in the regular expression that follows"

The Regular Expression itself is the last bit of the rule:
which extracts from an entry in the "Net Location" folder everything after the initial
http://
up to the next forward slash.

This is because [^/] means "Any character but a forward slash" and following it with a plus sign means "Any consecutive run of characters that are not forward slashes". Wrapping the whole lot in brackets means "Remember this bit for later use" - so that it can be copied into the domain name folder. Finally, the ".*" just means "anything at all up to the end of the line". In this instance it is thrown away; in other rules, it might be used.

Further, rather sillier example


We could change this rule so that it captured both the domain and the filename from the original URL.

We already know that the domain is everything between
http://
and the next forward slash.
Let's assume, then, that the filename is everything after the last forward slash in the URL.
So we modify the capturing regex
[Net Location]~~='http://([^/]+).*'
so that the end reads ([^/]+) .+/ (.+)
The new, bold part of the expression is everything between the first and the last forward slashes. It is written differently from the first part because we wanted the first regex to stop matching at the first slash it came to; we want this one to stop matching at the last slash. I hope the difference is clear. In this imaginary url:
http://thefirstbit/could/be/followed/by/many/slashes/before/theurl
([^/]+) matches thefirstbit and the following .+/ matches /could/be/followed/by/many/slashes/before/
Finally, wrapping the end of the Regular expresion in brackets makes it available just like the first part, the domain. Since the Domain was {1}, the filename will be {2}

So, we can rewrite the whole rule as ...

++:!+-:'From a file called {2} found in {1}':[Net Location]~~='http://([^/]+).+/(.+)'

which would give a column which showed, rather verbosely, more detail about where a text snippet had come from.

Contributor


Taken from the Yahoo Groups Post "Very Simple Rule Example" by
Andrew Brown