IMO the Filter directives should not include a content type.
There should be a special directive.
(please see thread dated of 2000-07-18)
>IMO we need a Conf directive, like:
> ContentType .php3$ HTML
> ContentType .html$ HTML
> ContentType .html. HTML
> ContentType .txt$ TEXT
> ContentType .pdf$ TEXT (returned by filter)
> ContentType .xml XML
>
>
>Also a vice versa config would be possible (maybe better):
>
> NoContents .avi .mpeg .wav .some-junk # only path will be
stored...
> IndexContents HTML .html .htm .shtml .htm. .html. .shtml. #index as
HTML
> IndexContents XML .xml
> IndexContents WAP .wap .wml
> IndexContents TXT .txt .txt.
> IndexContents TXT .pdf .poc .dot .xls # (filters are returning
TXT)
>
> FileFilter .doc doc-filter.sh
> FileFilter .dot doc-filter.sh
> FileFilter .pdf pdf-filter.sh
> FileFilter .xls xls-filter.sh
>
>This would make "IndexOnly" obsolete and would result in a redesign of
> the index/parser engine... (would be a major change...). But if this
> is done in a modular design, new parser engines could be installed
> in the future. So it could be easy to decide to
> add a new parser engine (e.g. for WAP files) or to handle this via
> external filters.
-----Original Message-----
From: jmruiz@boe.es [mailto:jmruiz@boe.es]
Sent: Monday, August 21, 2000 10:22 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: ishtml()
Hi David
On 18 Aug 2000, at 19:57, David Norris wrote:
> I think ishtml() might qualify as a bug. It doesn't seem to help
> anything. Do you see any problems with assuming everything to be HTML?
> No one seems to mention whether they think it is good or bad. As
> SWISH-E becomes more powerful I think assuming plain text is very
> limiting.
>
I totally agree. I am thinking on adding more directives to the config
file. On of them could be:
DefaultFileType Value
Possible values are: txt, html, xml, wap ...
If Value is html, ishtml() can always return 1.
To maintain backwards compatibility, the default value should be txt
> For example, all filters are assumed to be text. I am using many
> filters which return HTML.
>
In the same way. We can extend FileFilter to:
FileFilter <file-ext> <filter-program> <file-type>
If no file-type is given, then DefaultFileType should be used
>
> I plan to spend some time on the stemmer.c and soundex.c this weekend.
> I have been busy during the week.
Good luck.
cu
Jose
----------------------------------------------------------------------
This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !
* * *
Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
----------------------------------------------------------------------
Received on Mon Aug 21 04:49:44 2000