Re: [swish-e] XML parsing not returning Title

From: Robinson Craig <Craig.Robinson(at)>
Date: Tue Dec 04 2007 - 01:41:07 GMT
>On 11/28/2007 08:18 AM, Peter Karman wrote:
>> On 11/27/2007 04:56 PM, Robinson Craig wrote:
>>> I've run the same config and files with 2.4.5 (current stable 
>>> release) installed on a DEV box (in readiness for deployment out to 
>>> PROD), with the same result (incidentally with no parsing errors).
>> what indexing method (-S) are you using? Can you paste the exact 
>> command you are using to index?
>nevermind. I see the issue.
>You need to add:
> PropertyNameAlias swishtitle title
>to your config.
>The HTML parser knows about the special '<title>' tagset and uses that 
>for the swishtitle property. The XML parser doesn't know about it. 
>Since you are indexing .pdf files with the HTML parser (is that what 
>you really want?), the .pdf docs get the title magic, but the .html 
>docs (or anything else parsed with the XML parser) needs a little help
knowing which tag to use as the swishtitle.
Peter Karman  .  peter(at)

Hi all,

Further information on this issue: Looks like my problem is a version
issue. I have successfully compiled, installed the latest stable
versions of Swish-e, Xpdf on a Solaris 10 machine (SWISH-E 2.4.5,
pdftotext version 3.02), and it is behaving as expected when using the
exact same config file as my current installation (SWISH-E 2.2.3,
pdftotext version 3.01). Looks like I'll have to dig deeper to see what
the differences actually...but a step in the right direction at least.

Nevertheless, my question still stands: is there a "standard" way of
indexing PDF content and metadata?

Cheers, Craig


