Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] XML parsing not returning Title

From: Robinson Craig <Craig.Robinson(at)>
Date: Tue Dec 04 2007 - 01:41:07 GMT
>On 11/28/2007 08:18 AM, Peter Karman wrote:
>> On 11/27/2007 04:56 PM, Robinson Craig wrote:
>>> I've run the same config and files with 2.4.5 (current stable 
>>> release) installed on a DEV box (in readiness for deployment out to 
>>> PROD), with the same result (incidentally with no parsing errors).
>> what indexing method (-S) are you using? Can you paste the exact 
>> command you are using to index?
>nevermind. I see the issue.
>You need to add:
> PropertyNameAlias swishtitle title
>to your config.
>The HTML parser knows about the special '<title>' tagset and uses that 
>for the swishtitle property. The XML parser doesn't know about it. 
>Since you are indexing .pdf files with the HTML parser (is that what 
>you really want?), the .pdf docs get the title magic, but the .html 
>docs (or anything else parsed with the XML parser) needs a little help
knowing which tag to use as the swishtitle.
>Peter Karman  .  peter(at)  .

Hi all,

Further information on this issue: Looks like my problem is a version
issue. I have successfully compiled, installed the latest stable
versions of Swish-e, Xpdf on a Solaris 10 machine (SWISH-E 2.4.5,
pdftotext version 3.02), and it is behaving as expected when using the
exact same config file as my current installation (SWISH-E 2.2.3,
pdftotext version 3.01). Looks like I'll have to dig deeper to see what
the differences actually...but a step in the right direction at least.

Nevertheless, my question still stands: is there a "standard" way of
indexing PDF content and metadata?

Cheers, Craig


The information in this email together with any attachments is
intended only for the person or entity to which it is addressed
and may contain confidential and/or privileged material.
Any form of review, disclosure, modification, distribution
and/or publication of this email message is prohibited, unless
as a necessary part of Departmental business.
If you have received this message in error, you are asked to
inform the sender as quickly as possible and delete this message
and any copies of this message from your computer and/or your
computer system network.

Users mailing list
Received on Mon Dec 3 20:41:14 2007