Skip to main content.
home | support | download

Back to List Archive

Re: Unable to retrieve documents

From: Stephen Terapak <terapaks(at)not-real.terapak.com>
Date: Wed Jan 07 2004 - 15:07:34 GMT
did you try modifying the path in the config file for the indexing using
this command:

ReplaceRules replace "/homepages/3/d90791125/htdocs/wsc90791133/amazon/html"
"ht
tp://www.speedydvds.com"

I could not get to the docs either till i put that in -- fully documented in
the directions.

hope that helps.

steve
----- Original Message ----- 
From: "Kaplan, Andrew H." <AHKAPLAN@PARTNERS.ORG>
To: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
Sent: Wednesday, January 07, 2004 9:55 AM
Subject: [SWISH-E] Re: Unable to retrieve documents


> I modified the swish.cgi and swish.conf files and I have made some
progress.
>
> The links no longer have the NULL statement. However, the files are still
> inaccessible. When I check the URL for the file, it indicates the file is
> in the cgi-bin directory when in reality it is in the documenation
> directory.
> The swish.cgi file is located in the cgi-bin directory, and the swish.conf
> file is in the documentation directory.
>
> When I created the index, I was in the documentation directory, and the
> syntax
> that was used was the following: /usr/local/bin/swish-e -c swish.conf -v
3.
>
> I've included the two files in this e-mail.
>
> The 'spaces' that I mentioned in the previous e-mail refer to the
filenames.
> For
> example, one file that has been indexed is:
>
> Windows Workstation Environment Variables for IDL.pdf
>
>
>
>
> -----Original Message-----
> From: swish-e@sunsite.berkeley.edu
> [mailto:swish-e@sunsite.berkeley.edu]On Behalf Of Bill Moseley
> Sent: Tuesday, January 06, 2004 4:55 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: Unable to retrieve documents
>
>
> On Tue, Jan 06, 2004 at 10:17:06AM -0800, Kaplan, Andrew H. wrote:
> > I have set up our webserver such that the swish.cgi page comes up when
> > a person wants to retrieve a document.  When the text is entered the
> > results screen does appear with the appropriate links to the documents
> > in question.  However, users are unable to access the documents.
>
> Seems like if they can't be accessed then they are not appropriate
> links.
>
>
> > The results screen does show the names of the files with their
extensions,
> ie:
> > pdf, doc, etc. Immediately under
> > the files the word NULL appears in parentheses.
>
> That NULL is in the FAQ.  See the swish.cgi docs.
>
>
> > The information about the file
> > including its modification date,
> > size, and path also appears. Clicking on the file causes the error
screen
> >
> > Not Found -- The requested url was not found on this
> > server
> > to appear.
>
> Well, that's just a web server issue -- you have to make sure the paths
> point to the right locations.
>
> You can rewrite the the path when indexing (in the swish-e config file)
> with ReplaceRules, and you can also prepend text to each path by a
> setting the the swish.cgi config file.
>
> > The files that are being indexed are either Adobe pdf, MS-Word doc,
> MS-Excel
> > xls, and htm documents. They all have
> > spaces between the words in their titles. The server itself has the
> catdoc,
> > xls2csv, and xpdf programs installed.
>
> Space between their words in their "titles"?  Or do you mean file names.
I
> suspect you
> mean file names.  You don't give much details so I can't know for sure,
but
> here's
> an example of indexing files with a space:
>
>
> Notice that the href is correct:
>
> moseley@bumby:~/apache$ echo "hello" >  "file with space.txt"
>
> moseley@bumby:~/apache$ swish-e -i "file with space.txt" -v0
>
> moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/swish.cgi?query=hello
|
> grep txt
>         <dt>1 <a href="file%20with%20space.txt">file with space.txt</a>
> <small>-- rank: <b>1000</b></small></dt>
> <tr><td><small>Document Path:</small></td><td><small> <b>file with
> space.txt</b></small></td></tr>
>
>
> > What do I need to do to correct this problem? Thanks.
>
> Something like the above few lines that demonstrate the problem.
>
> Here's another example with spidering:
>
> moseley@bumby:~/apache$ cp test.pdf "test pdf with spaces.pdf"
>
> moseley@bumby:~/apache$ /usr/local/lib/swish-e/spider.pl default
> http://localhost/apache/test%20pdf%20with%20spaces.pdf | swish-e -S
prog -i
> stdin -v0
> /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
>
> Summary for: http://localhost/apache/test%20pdf%20with%20spaces.pdf
> Total Bytes: 12,593  (12593.0/sec)
>  Total Docs:      1  (1.0/sec)
> Unique URLs:      1  (1.0/sec)
>
> moseley(at)not-real.bumby:~/apache$ GET http://localhost/apache/swish.cgi?query=the |
> grep pdf
>         <dt>1 <a
>
href="http://localhost/apache/test%20pdf%20with%20spaces.pdf">http://localho
> st/apache/test pdf with spaces.pdf</a> <small>-- rank:
> <b>1000</b></small></dt>
> <tr><td><small>Document Path:</small></td><td><small>
> <b>http://localhost/apache/test pdf with spaces.pdf</b></small></td></tr>
>
>
>
>
>
> -- 
> Bill Moseley
> moseley@hank.org
>
>
>
>
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
Received on Wed Jan 7 15:07:43 2004