I would approach this a different way.
I would take the content of my PDF and the content of the related .bib
file and create a virtual XML (or HTML) file for handing to swish-e -S
[output of pdftotext here]
(use HTML tags if indexing as HTML).
Then you could configure each of your bibtex fields as metanames and
properties and search/retrieve bibtex info by specific field.
> Hi all,
> I recently learned about swish-e and have started using it today. The
> I am faced with is to search in scientific articles:
> - their full text, usually in .pdf format
> - their bibliographic data in BiBTeX format
> the way I have organized things is to give each article a separate .bib
> and a .pdf file and create a single index. That way I can just:
> swish-e -w "Einstein 2005" | grep -i bib
> to find all .bib files with the words 'Eindstein 2005' in it. Fine. After
> browsing the documentation I learned that wildcards are only supported at
> ending of a word and that is rather annoying. Suppose, for example, I want
> search those .bib files where this Einstein figure is an editor. That is,
> match lines like:
> editor = "Einstein",
> in my .bib files. I had hoped that
> swish-e -w "editor.*einstein"
> would work but that's not the case obviously. I've browsed the web a bit
> haven't found a satisfying solution yet. Is anyone here using swish-e to
> BiBTeX data already? Thoughts on how to deal with this kind of searchers?
> help is greatly appreciated.
> <Bas.vanGils@cs.ru.nl> - GPG Key ID: 2768A493 -
> Radboud University Nijmegen Institute for Computing and Information
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Thu Dec 1 07:23:25 2005