Skip to main content.
home | support | download

Back to List Archive

Re: Using ExtractPath to Assign a Property

From: Bob Stewart <rstewart(at)>
Date: Sat Dec 15 2001 - 17:18:09 GMT
At 02:27 PM 12/14/01 -0800, you wrote:
>At 02:09 PM 12/14/01 -0800, Bob Stewart wrote:
>>I am using ExtractPath to assign a searchable metaname and this works fine:
>>[form config file:]
>>MetaNames forum
>>ExtractPath forum regex !^.*/forums/load/([^/]+)/.*$!$1!
>>Is there any way to assign extracted path info to a property?
>No, not right now.  Not a bad idea.  I guess I figured one would just use
>the same regular expression in the CGI script to extract out that path part
>from the swishdocpath property.
>It will be a little work (more finding the time).  The code that does
>ExtractPath -- index_path_parts() -- would need to get passed info on the
>file entry so the property is added to the correct file.  But once that's
>done, then it's just a matter of calling addDocProperty().  A few lines of
>code, that's all.
>How bad do you need it?  Can you use a regexp in your CGI to pull out the


It really isn't a big deal, I wouldn't bother. I'm using the "-S prog" option to load the data through a script, so it isn't a big deal to have it create a meta tag and insert it into the HTML before passing it to Swish. I want to be able to sort on this property.

BTW, that option, "-S prog", makes a big difference. Also the "-e", economy mode, option. I was able to index about 40,000 pages on a fairly busy server that has only 256MB of RAM.

>BTW - since you are using ExtractPath, please offer any suggestions on
>improvements.  I've never been thrilled about having to match the entire
>path, but I'm not so sure if that bothers others.

I think it works fine, I don't think it matters that it has to match the entire  path. 

One thing that might be clearer in the documentation is that it seems like after a successful match, subsequent ExtractPath statements for the same metaname are ignored.

Say I have two paths:
..and I want the metaname "forum" set to foo for both.

This seems to be the way that works:

MetaNames forum
ExtractPath forum regex !^.*/forums/load/([^/]+)/.*$!$1!
ExtractPath forum regex !^.*/forums/([^/]+)/.*$!$1!

Even though /forums/load/foo/bar.html would match the second one, it's ignored.
That's fine, it just wasn't clear to me from the docs so I had to try it each way.

Overall, the documentation is very well done. Especially considering the newness of many of the features. 

The one feature I would find most valuable is incremental indexing. Parts of my site are updated constantly and so it needs to be reindexed daily. 

Thanks again!

  Bob Stewart  

  The Virtual Mirror
Received on Sat Dec 15 17:18:16 2001