On Tue, Jul 05, 2005 at 07:42:57AM -0700, McQuiggin, Kevin wrote:
> My error in writing the URL, I have the correct syntax in the links!
Error in writing in the email or in the file you are indexing? Couldn't be in
your email since you would have followed these instructions carefully:
http://swish-e.org/docs/install.html#when_posting_please_provide_the_following_information_
and only cut-n-pasted your examples. ;)
So, what's the question? How to use file:// URLs?
moseley@bumby:~/apache$ cat index.html
<html>
<head><title>doc1</title>
</head>
<body>
<a href="file:///home/moseley/apache/test.pdf">testpdf</html>
</body>
</html>
moseley(at)not-real.bumby:~/apache$ SPIDER_DEBUG=url,failed /usr/local/lib/swish-e/spider.pl default file:///home/moseley/apache/index.html >/dev/null
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
-- Starting to spider: file:///home/moseley/apache/index.html --
>> +Fetched 0 Cnt: 1 GET file:///home/moseley/apache/index.html 200 OK text/html 126 parent: depth:0
>> +Fetched 1 Cnt: 2 GET file:///home/moseley/apache/test.pdf 200 OK application/pdf 1636685 parent:file:///home/moseley/apache/index.html depth:1
Summary for: file:///home/moseley/apache/index.html
Connection: Close: 1 (0.5/sec)
Connection: Keep-Alive: 1 (0.5/sec)
Total Bytes: 43,932 (21966.0/sec)
Total Docs: 2 (1.0/sec)
Unique URLs: 2 (1.0/sec)
application/pdf->text/html: 1 (0.5/sec)
text/html: 1 (0.5/sec)
moseley(at)not-real.bumby:~/apache$ FILTER_DEBUG=1 /usr/local/lib/swish-e/spider.pl default file:///home/moseley/apache/index.html >/dev/null
[...]
>> Starting to process new document: application/pdf
++Checking filter [SWISH::Filters::Doc2txt=HASH(0x84e7138)] for application/pdf
++Checking filter [SWISH::Filters::Doc2html=HASH(0x84e65f0)] for application/pdf
++Checking filter [SWISH::Filters::Pdf2HTML=HASH(0x84f2e8c)] for application/pdf
++ application/pdf *WAS* filtered by SWISH::Filters::Pdf2HTML=HASH(0x84f2e8c)
Final Content type for file:///home/moseley/apache/test.pdf is text/html
>Filter SWISH::Filters::Pdf2HTML=HASH(0x84f2e8c) converted from [application/pdf] to [text/html]
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Tue Jul 5 11:01:42 2005