On Tue, Jan 06, 2004 at 10:17:06AM -0800, Kaplan, Andrew H. wrote:
> I have set up our webserver such that the swish.cgi page comes up when
> a person wants to retrieve a document. When the text is entered the
> results screen does appear with the appropriate links to the documents
> in question. However, users are unable to access the documents.
Seems like if they can't be accessed then they are not appropriate
> The results screen does show the names of the files with their extensions, ie:
> pdf, doc, etc. Immediately under
> the files the word NULL appears in parentheses.
That NULL is in the FAQ. See the swish.cgi docs.
> The information about the file
> including its modification date,
> size, and path also appears. Clicking on the file causes the error screen
> Not Found -- The requested url was not found on this
> to appear.
Well, that's just a web server issue -- you have to make sure the paths
point to the right locations.
You can rewrite the the path when indexing (in the swish-e config file)
with ReplaceRules, and you can also prepend text to each path by a
setting the the swish.cgi config file.
> The files that are being indexed are either Adobe pdf, MS-Word doc, MS-Excel
> xls, and htm documents. They all have
> spaces between the words in their titles. The server itself has the catdoc,
> xls2csv, and xpdf programs installed.
Space between their words in their "titles"? Or do you mean file names. I suspect you
mean file names. You don't give much details so I can't know for sure, but here's
an example of indexing files with a space:
Notice that the href is correct:
moseley@bumby:~/apache$ echo "hello" > "file with space.txt"
moseley@bumby:~/apache$ swish-e -i "file with space.txt" -v0
moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/swish.cgi?query=hello | grep txt
<dt>1 <a href="file%20with%20space.txt">file with space.txt</a> <small>-- rank: <b>1000</b></small></dt>
<tr><td><small>Document Path:</small></td><td><small> <b>file with space.txt</b></small></td></tr>
> What do I need to do to correct this problem? Thanks.
Something like the above few lines that demonstrate the problem.
Here's another example with spidering:
moseley@bumby:~/apache$ cp test.pdf "test pdf with spaces.pdf"
moseley(at)not-real.bumby:~/apache$ /usr/local/lib/swish-e/spider.pl default http://localhost/apache/test%20pdf%20with%20spaces.pdf | swish-e -S prog -i stdin -v0
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
Summary for: http://localhost/apache/test%20pdf%20with%20spaces.pdf
Total Bytes: 12,593 (12593.0/sec)
Total Docs: 1 (1.0/sec)
Unique URLs: 1 (1.0/sec)
moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/swish.cgi?query=the | grep pdf
<dt>1 <a href="http://localhost/apache/test%20pdf%20with%20spaces.pdf">http://localhost/apache/test pdf with spaces.pdf</a> <small>-- rank: <b>1000</b></small></dt>
<tr><td><small>Document Path:</small></td><td><small> <b>http://localhost/apache/test pdf with spaces.pdf</b></small></td></tr>
Received on Tue Jan 6 21:55:26 2004