Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Not working: FileRules filename (regex or contains)

From: Dr Michael Daly <"Dr>
Date: Sun, 18 Mar 2012 00:50:31 +1100 (EST)
thanks for your patience...modifying test_url and adding this line to
spider.config:
my ($filter_sub, $response_sub ) = swish_filter();
*has certainly resolved* the multiple error msgs/unreadable indexing
output, with indexing progressing smoothly UNTIL it stalls (twice now) at
a particular .pdf file, with the final lines of screen output:

vvvvvvvvvvvvvvvv HEADERS for
http://localhost:104/Annette/Nehos%20bill%20may.pdf vvvvvvvvvvvvvvvvvvvvv

---- Request ------
GET http://localhost:104/Annette/Nehos%20bill%20may.pdf
Accept-Encoding: gzip, x-gzip, deflate
From: swish(at)not-real.user.failed.to.set.email.invalid
Referer: http://localhost:104/Annette/
User-Agent: swish-e http://swish-e.org/


---- Response ---
Status: 200 OK
Date: Sat, 17 Mar 2012 13:32:15 GMT
Accept-Ranges: bytes
ETag: "3a40ada-303d5-4abc872fbe5f7"
Server: Apache
Content-Length: 197589
Content-Type: application/pdf
Last-Modified: Wed, 31 Aug 2011 07:55:17 GMT
Client-Date: Sat, 17 Mar 2012 13:32:15 GMT
Client-Peer: 127.0.0.1:104
Client-Response-Num: 7

^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^

>> +Fetched 4 Cnt: 802 GET 
http://localhost:104/Annette/Nehos%20bill%20may.pdf  200 OK
application/pdf 197589 parent:http://localhost:104/Annette/ depth:4
?Testing 'filter_content' user supplied function #1
'http://localhost:104/Annette/Nehos%20bill%20may.pdf'

Warning: Unknown header line: '/html>Path-Name:
http://localhost:104/Annette/Nehos Bill_files/' from program spider.pl
err: External program failed to return required headers Path-Name:
.

SO I have swish_2.index.prop.temp and swish_2.index.temp, only...very
disappointing...to be almost there, but not quite!

What can I do?
Michael



Would it help to run spider.pl and capture to a file?  Then later index
the
files in that output file.



On Sat, Mar 17, 2012 at 8:42 AM, Peter Karman <peter(at)not-real.peknet.com> wrote:

> Dr Michael Daly wrote on 3/16/12 6:55 PM:
> > I have tried multiple combinations to exclude zip files with the words
> > 'log of hours' as part of the file name:
> > --
> > Bad directive on line #40 of file...web_2.conf:        FileRules
> filename
> > regex /^log.hours.\.zip$/i
>
> according to the documentation, FileRules is only available for the -S
> fs
> feature. Are you trying to use it with the spider?
>
>  http://swish-e.org/docs/swish-config.html#filerules
>
> See my other email about modifying test_url in your spider config.
>
> --
> Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
> _______________________________________________
> Users mailing list
> Users(at)not-real.lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>



--
Bill Moseley
moseley(at)not-real.hank.org
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Mar 17 2012 - 14:00:42 GMT