Well, this has perplexed me for about 4 hours. Briefly, your regex
didn't work for me.
I had been running swish-e-2.5.1-2004-06-24, when I posted my question.
I've used a succession of ExtractPath statements, allowing me to strip
the property down to '?999'. When I try to remove '?':
ExtractPath uid1 remove /?
I get a segmentation fault -- ./swish-index: line 5: 1503 Segmentation
fault
I haven't had any luck quoting or unquoting the '?' -- this gives me a
regex compilation error when I try to spider.
So, I backed down to 2.4.2, but my content (a pdf download referenced
by _uid1=999) would not get indexed, with user supplied function #1
death 'Can't locate object method "export_to_level" via package
"MP3::Tag" at /usr/local/lib/swish-e/perl/SWISH/Filter.pm line 662.
So I tried today's daily, but got failures indexing regular files:
Can't call method "as_string" on an undefined value at
/usr/local/lib/swish-e/spider.pl line 780.
Suggestions welcome.
On Thursday, October 28, 2004, at 12:22 PM, Bill Moseley wrote:
> On Thu, Oct 28, 2004 at 12:18:35PM -0700, Bill Conlon wrote:
>> Given a url such as:
>>
>> http://domain.com/appfile.ext?_uid1=999
>>
>> I want to store the number 999 in the index.
>
> Look at man SWISH-CONFIG. I didn't try this, but something like:
>
> ExtractPath uid1 regex !^.+uid1=(\d+)$!$1!
>
>
>
>>
>> So I set up spider.config:
>>
>> PropertyNames uid1
>> ExtractPath uid1 remove "http://domain.com/appfile.ext"
>> ExtractPath uid1 remove uid1=
>>
>> First ExtractPath gives ?_uid1=999
>> Second ExtractPat gives ?_999
>>
>> How do I remove ?_
>>
>> thx
>>
>>
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>
Received on Thu Oct 28 14:30:17 2004