Re: Using ExtractPath to Assign a Property

Date: Sat Dec 15 2001 - 17:18:09 GMT
Date: Sat Dec 15 2001 - 17:18:09 GMT
>>I am using ExtractPath to assign a searchable metaname and this works fine:
>>[form config file:]
>>MetaNames forum
>>ExtractPath forum regex !^.*/forums/load/([^/]+)/.*$!$1!
>>Is there any way to assign extracted path info to a property?
>No, not right now.  Not a bad idea.  I guess I figured one would just use
>the same regular expression in the CGI script to extract out that path part
>from the swishdocpath property.
>It will be a little work (more finding the time).  The code that does
>ExtractPath -- index_path_parts() -- would need to get passed info on the
>file entry so the property is added to the correct file.  But once that's
>done, then it's just a matter of calling addDocProperty().  A few lines of
>code, that's all.
>How bad do you need it?  Can you use a regexp in your CGI to pull out the


It really isn't a big deal, I wouldn't bother. I'm using the "-S prog" option to load the data through a script, so it isn't a big deal to have it create a meta tag and insert it into the HTML before passing it to Swish. I want to be able to sort on this property.

BTW, that option, "-S prog", makes a big difference. Also the "-e", economy mode, option. I was able to index about 40,000 pages on a fairly busy server that has only 256MB of RAM.

>BTW - since you are using ExtractPath, please offer any suggestions on
>improvements.  I've never been thrilled about having to match the entire
>path, but I'm not so sure if that bothers others.

I think it works fine, I don't think it matters that it has to match the entire  path. 

One thing that might be clearer in the documentation is that it seems like after a successful match, subsequent ExtractPath statements for the same metaname are ignored.

Say I have two paths:
..and I want the metaname "forum" set to foo for both.

This seems to be the way that works:

MetaNames forum
ExtractPath forum regex !^.*/forums/load/([^/]+)/.*$!$1!
ExtractPath forum regex !^.*/forums/([^/]+)/.*$!$1!

Even though /forums/load/foo/bar.html would match the second one, it's ignored.
That's fine, it just wasn't clear to me from the docs so I had to try it each way.

Overall, the documentation is very well done. Especially considering the newness of many of the features. 

The one feature I would find most valuable is incremental indexing. Parts of my site are updated constantly and so it needs to be reindexed daily. 

Thanks again!

  Bob Stewart  

  The Virtual Mirror
