Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Swish-e not indexing doc or PDF files

From: William M Conlon <bill(at)not-real.tothept.com>
Date: Wed Feb 27 2008 - 07:21:10 GMT
are you using xPDF?  Adobe keeps changing the format of the pdf file,  
and as I recall, xPDF will not read the latest versions of Adobe PDF  
documents.  We save all of our pdfs as version 4 or 5 I think.

Bill


On Feb 26, 2008, at 10:17 PM, Liam Buchanan wrote:

> Hi,
> I just tried indexing a pdf from a url (.cfm page) link on the local
> machine and got these errors:
>
> Accept-Ranges: bytes
> ETag: "0315492fcfec21:58ec"
> Server: Microsoft-IIS/5.0
> Content-Length: 2660012
> Content-Type: application/pdf
> Last-Modified: Thu, 10 Apr 2003 01:00:26 GMT
> Client-Date: Wed, 27 Feb 2008 06:14:24 GMT
> Client-Peer: 127.0.0.1:5865
> Client-Response-Num: 1
> X-Powered-By: ASP.NET
>
> ^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>>> +Fetched 1 Cnt: 2 GET
> http://172.16.100.241/dsdweb/v3/guis/templates/content
> /Errmsg.pdf  200 OK application/pdf 2660012
> parent:http://172.16.100.241/dsdweb/
> v3/guis/templates/content/testpage.cfm depth:1
> http://172.16.100.241/dsdweb/v3/guis/templates/content/testpage.cfm -
> Using HTML
> 2 parser -  (54 words)
> http://172.16.100.241/dsdweb/v3/guis/templates/content/Errmsg.pdf -
> Using HTML2
> parser - Error (0): PDF file is damaged - attempting to reconstruct  
> xref
> table..
> .
> Error: Top-level pages object is wrong type (null)
> Error: Couldn't read page catalog
>  (no words indexed)
>
> -------------
>
> Can anyone suggest the issue here?
>
> Thanks !!!
>
>
>
> -----Original Message-----
> From: users-bounces@lists.swish-e.org
> [mailto:users-bounces@lists.swish-e.org] On Behalf Of Peter Karman
> Sent: Saturday, 23 February 2008 3:45 AM
> To: Swish-e Users Discussion List
> Subject: Re: [swish-e] Swish-e not indexing doc or PDF files
>
>
>
> On 02/20/2008 07:50 PM, Liam Buchanan wrote:
>> Hi,
>> Thanks for the information.
>> I tried to do a trace but it didn't come up with anything unusual.
>>
>> Below is my spider.pl file conf
>> Please let me know if there is anything in there I am missing or
>> should be taken out. The proxy reference needs to be in there for it
> to work.
>
>
> I guess I'm not following you. The example you gave works? What is a
> 'trace'?
>
> --
> Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>
> ---------------------------------------------------------------------- 
> ------
> Unless stated otherwise, this email, together with any attachments, is
> intended for the named recipient(s) only and may contain privileged  
> and
> confidential information. If received in error, you are asked to  
> inform the
> sender as quickly as possible and delete this email and any copies  
> of this
> from your computer system network.
>
> If not an intended recipient of this email, you must not copy,  
> distribute or
> take any action(s) that relies on it; any form of disclosure,  
> modification,
> distribution and/or publication of this email is also prohibited.
>
> Unless stated otherwise, this email represents only the views of  
> the sender
> and not the views of the Queensland Government.
> ---------------------------------------------------------------------- 
> ------
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 27 02:21:20 2008