It's 20MB of TXT files. I'm at home so I can't access
any of them to show you. They're 14 years of press
releases, and all of these text files follow the same
format. The first line (unless there are empty lines)
is the author. Further down, there's something like
RELEASE: or EDITOR'S NOTE: and the document title
I wrote a Perl script that reads the STDIN of the text
file, grabs the author and title, and prints (STDOUT,
right?) an HTML page with the document as the html
<Title> and the author as Metadata.
Reasons for doing this... The documents need to be
searchable for an author, or within an author's
documents. I understand how this is done on the search
cgi script. Also, the cgi results page need to have
the document's title shown, not the filename (I think
txt files show the filename.txt).
I figured converting to HTML would be the best way to
achieve this. When I tested my script, I did...
$ perl gettxttitle.pl < 1996-032.txt
and the output was the HTML that I was after. However,
when I plug it into my SWISH-E settings, it hangs on
the first file as if it's taking a long time to
If I need to supply more information, I can provide
more examples at work tomorrow.
Thanks SO much for the help!
--- Bill Moseley <email@example.com> wrote:
> On Mon, Aug 09, 2004 at 07:52:59AM -0700, Alan Ivey
> > How does a SWISH-E FileFilter need to be written?
> > I mean is, how should I handle the input and
> > Is it STDIN and STOUT or what?
> > I'll post my script if I need to put it's pretty
> > and I figured I'd spare all of you. :)
> What are you trying to filter?
> Bill Moseley
> Unsubscribe from or help with the swish-e list:
> Help with Swish-e:
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Received on Mon Aug 9 12:12:49 2004