I definitely checked. I've ran and re-ran the search, changing only the
use_cookies line, and it either works (indexes the PDF fine) or breaks
(as below) depending on the existence of that line.
I've tried adding another PDF, even though I know the original is fine,
and it breaks as well depending on the case above.
What sense of this to make, I don't know.
[mailto:email@example.com] On Behalf Of Bill Moseley
Sent: Tuesday, December 06, 2005 4:58 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: duplicate entries in DB after regex performed on
On Tue, Dec 06, 2005 at 04:30:41PM -0500, Chad Day wrote:
> http://dev.website.org/index.php?option=content&task=view&id=5 - Using
> HTML2 parser - (39 words)
> d= - Using HTML2 parser - (33 words)
> Error: May not be a PDF file (continuing anyway)
> Error (0): PDF file is damaged - attempting to reconstruct xref
> Error: Couldn't find trailer dictionary
> Error: Couldn't read xref table
That just looks like a broken pdf. Did you check?
> http://dev.website.org/files/Joomla Quick Start 1.0.pdf - Using HTML2
> parser - (no words indexed)
> d=9 - Using HTML2 parser - (33 words)
> If I remove the use_cookies => 1, line from my spider.conf, it works
> fine and I return to having the issue of the PHPSESSIDs.
My guess is that with cookies you are indexing different files -- or
your site has some kind of problem.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Wed Dec 7 06:44:42 2005