Re: Ignore question

From: Gentile, Jeff
Date: Thu Feb 13 2003 - 19:43:14 GMT
	Thanks for your response. This is exactly the answer that I was looking for...
The is almost exactly what I would use, slightly modified to process only
my technote directory, and provides me an easy way to handle the file attachments that
I allow, including pdfs nicely enough, and exclude those I can't index, like data captures.


Funny enough, I spent some time before starting this project under it's current architecture 
looking at a db, mod_perl, mason, templating, or even PHP. I didn't go these routes for a 
couple reasons... I didn't know perl a month ago, and wanted to learn it for it's other 
uses besides cgi/php type applications. Mod_perl and mason seemed like more then I needed 
to learn initially, and would slow my go live date, although I may end up there in the future. 
I didn't go with templating since it seemed easier for me to pick a style, develop, and 
template later if I wanted to change the interface. The html is quite simple (1 table 
format, and some forms) and only spiffed up by a CSS.

That all being said, I will likely be going to a database before any of these options,
however, right now, believe it or not, the 107 technotes that are applicable to our new
product set that this is for take up a whopping So I guess I have some time 
before I need to look at putting everything in the database, and handling the attachments
in some serialized fashion...

Next phase is to learn GraphViz to get my front-end flowchart dynamic so I can allow
folks to do child add/delete functions...

Again, thanks for your help, It looks like I'm off to the races now.


Re: Ignore question

On Thu, 13 Feb 2003, Gentile, Jeff wrote:

> I am using SWISH to search a knowledge base (read: text files) for my
> support department that has a cgi/perl front end...  all html is
> within the script.

As your first post you should know that I ramble on about on and off-topic
things. So...

Think about moving to something like apache/mod_perl + mason (if you think
page-centric is the way to go) or + Template-Toolkit (which I feel is more
data / code driven).  Or even PHP for quick development and much faster
processing than perl/CGI.

> The main page is a image-mapped flow chart, each
> box leading to a "leaf" page pertaining to that (sub)category. Each
> leaf has various description fields that are associated with the
> category by filename.

Also sounds like you need a real database instead.

> Question:
> I am trying to get SWISH to Ignore the header (first 5 lines) of the
> tech notes. However, even if there was a feature that was the reverse
> of "TruncateDocSize" to allow me to skip the first 5 lines, that
> wouldn't work, because of the "description" files that do not have
> this header and are associated by name.

Take a look at the prog-bin directory in the distribution.  What you do is
write a simple program that reads and parses your tech notes and only
passes to swish the data you want indexed.  prog-bin/ is a very
simple example.

Back to my off-topic comments, if you were using a templating system like
Template-Toolkit that separates the code from the output you would already
have (or maybe you do) a module that fetches and parses the notes into a
nice perl data structure.  That module could be used with your
presentation layer (using Template-Toolkit) to generate HTML, or text, or
a form for editing, or by a small program to use while indexing with

Bill Moseley
