Sorry, forgot the obvious:
SWISH-E 2.4.3
IndexFile /u/snipped/privdata/live/search/
standard.index
IndexReport 0
IndexDir spider.pl
SwishProgParameters default http://www.snipped.com/
MetaNames swishdocpath keywords author description
PropertyNames keywords author description
IndexOnly .html
NoContents
.zip
.gz
.Z
.sit
.cpt
.jpg
.jpeg
.gif .xbm .au .mov .mpg .mp2 .mp3 .dir .drx .ra .rpm .ram .pdf .ps
IgnoreLimit 90 50
IgnoreWords CVS
On Jul 11, 2008, at 1:11 AM, Jo Rhett wrote:
> So while debugging a different problem I looked at my httpd logs and
> realized something I'd apparently missed before. The swish-e spider
> is looping over the same files dozens and dozens of times, each time
> with different query arguments. Because all of the links on the site
> contain a query_string containing the page they came from and a unique
> id for the visitor (and a dynamic toolbar has links to every page),
> this means that each page is indexed N-1 times, where N is the number
> of pages on the site.
>
> Is there an option to tell the swish spider to ignore the query string
> when considering URLs? I realize that this would be inappropriate
> for many sites, but it is essential for this site, so an option would
> be very useful.
>
> --
> Jo Rhett
> Net Consonance : consonant endings by net philanthropy, open source
> and other randomness
>
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source
and other randomness
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Jul 11 04:16:40 2008