thanks for your patience...modifying test_url and adding this line to
spider.config:
my ($filter_sub, $response_sub ) = swish_filter();
*has certainly resolved* the multiple error msgs/unreadable indexing
output, with indexing progressing smoothly UNTIL it stalls (twice now) at
a particular .pdf file, with the final lines of screen output:
vvvvvvvvvvvvvvvv HEADERS for
http://localhost:104/Annette/Nehos%20bill%20may.pdf vvvvvvvvvvvvvvvvvvvvv
---- Request ------
GET http://localhost:104/Annette/Nehos%20bill%20may.pdf
Accept-Encoding: gzip, x-gzip, deflate
From: swish(at)not-real.user.failed.to.set.email.invalid
Referer: http://localhost:104/Annette/
User-Agent: swish-e http://swish-e.org/
---- Response ---
Status: 200 OK
Date: Sat, 17 Mar 2012 13:32:15 GMT
Accept-Ranges: bytes
ETag: "3a40ada-303d5-4abc872fbe5f7"
Server: Apache
Content-Length: 197589
Content-Type: application/pdf
Last-Modified: Wed, 31 Aug 2011 07:55:17 GMT
Client-Date: Sat, 17 Mar 2012 13:32:15 GMT
Client-Peer: 127.0.0.1:104
Client-Response-Num: 7
^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +Fetched 4 Cnt: 802 GET
http://localhost:104/Annette/Nehos%20bill%20may.pdf 200 OK
application/pdf 197589 parent:http://localhost:104/Annette/ depth:4
?Testing 'filter_content' user supplied function #1
'http://localhost:104/Annette/Nehos%20bill%20may.pdf'
Warning: Unknown header line: '/html>Path-Name:
http://localhost:104/Annette/Nehos Bill_files/' from program spider.pl
err: External program failed to return required headers Path-Name:
.
SO I have swish_2.index.prop.temp and swish_2.index.temp, only...very
disappointing...to be almost there, but not quite!
What can I do?
Michael
Would it help to run spider.pl and capture to a file? Then later index
the
files in that output file.
On Sat, Mar 17, 2012 at 8:42 AM, Peter Karman <peter(at)not-real.peknet.com> wrote:
> Dr Michael Daly wrote on 3/16/12 6:55 PM:
> > I have tried multiple combinations to exclude zip files with the words
> > 'log of hours' as part of the file name:
> > --
> > Bad directive on line #40 of file...web_2.conf: FileRules
> filename
> > regex /^log.hours.\.zip$/i
>
> according to the documentation, FileRules is only available for the -S
> fs
> feature. Are you trying to use it with the spider?
>
> http://swish-e.org/docs/swish-config.html#filerules
>
> See my other email about modifying test_url in your spider config.
>
> --
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
> _______________________________________________
> Users mailing list
> Users(at)not-real.lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>
--
Bill Moseley
moseley(at)not-real.hank.org
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Mar 17 2012 - 14:00:42 GMT