I would approach this a different way.
I would take the content of my PDF and the content of the related .bib
file and create a virtual XML (or HTML) file for handing to swish-e -S
prog .
e.g.:
<bibdoc>
<bibinfo>
<tag>content</tag>
<tagN>contentN</tag>
</bibinfo>
<doc>
[output of pdftotext here]
</doc>
</bibdoc>
(use HTML tags if indexing as HTML).
Then you could configure each of your bibtex fields as metanames and
properties and search/retrieve bibtex info by specific field.
>
> Hi all,
>
> I recently learned about swish-e and have started using it today. The
> problem
> I am faced with is to search in scientific articles:
>
> - their full text, usually in .pdf format
> - their bibliographic data in BiBTeX format
>
> the way I have organized things is to give each article a separate .bib
> file
> and a .pdf file and create a single index. That way I can just:
>
> swish-e -w "Einstein 2005" | grep -i bib
>
> to find all .bib files with the words 'Eindstein 2005' in it. Fine. After
> browsing the documentation I learned that wildcards are only supported at
> the
> ending of a word and that is rather annoying. Suppose, for example, I want
> to
> search those .bib files where this Einstein figure is an editor. That is,
> match lines like:
>
> editor = "Einstein",
>
> in my .bib files. I had hoped that
>
> swish-e -w "editor.*einstein"
>
> would work but that's not the case obviously. I've browsed the web a bit
> but
> haven't found a satisfying solution yet. Is anyone here using swish-e to
> index
> BiBTeX data already? Thoughts on how to deal with this kind of searchers?
> Any
> help is greatly appreciated.
>
> Cheers,
>
> Bas
>
>
> --
> <Bas.vanGils@cs.ru.nl> - GPG Key ID: 2768A493 -
> http://www.cs.ru.nl/~basvg
> Radboud University Nijmegen Institute for Computing and Information
> Sciences
>
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Thu Dec 1 07:23:25 2005