Bill Moseley wrote:
>
> So, if you are getting that then maybe your version of the filter
> is not looking at the correct content type?
>
> Did you try running
>
> swish-filter-test -verbose http://local.dev.port.com/pdf/real_ccr.pdf
>
> Interesting. Filters can get disabled if they abort (by calling die).
> In Filter.pm it does this:
>
> That traps an exception in the individual filter. Are you seeing that
> warning? It would give an error message. And then after that point
> the filter would not be used.
>
> If that's what is happening then that error message would be very
> helpful.
I've run the index twice (with the max file size bumped down to 1MB) and
the failure message appeared in same spot.
http://local.dev.port.com/portnyou/agendas/publ040826.asp
- Using HTML2 parser -
-Skipped http://local.dev.port.com/pdf/audi_shee_040722.pdf
due to 'filter_content' user supplied function #1 death
'Skipping http://local.dev.port.com/pdf/audi_shee_040722.pdf
due to content type: application/pdf may be binary'
I've run the swish-filter-test against the first PDF to fail with
"death" and the PDF that was filtered just before the first failure,
both filtered successfully.
I did not find any error messages regarding the filter being disabled.
You can test the above PDF by using test.portofoakland.com instead of
local.dev.port.com.
P.S. I'm still unable to get the Descriptions to work for non-PDF pages.
I've spidered the site with PDF filtering off via the test_url option
and I can't get the descriptions to appear. There must be something
weird about our HTML pages in order to mess up the indexer.
e.g.
bat file:
"C:\Program Files\SWISH-E\swish-e.exe"
-S prog -v 3
-c "C:\Program Files\SWISH-E\indexes\Port\port.config"
-f "C:\Program Files\SWISH-E\indexes\Port\index.swish-e"
port.config:
DefaultContents HTML*
StoreDescription HTML* <body> 320
IndexDir perl.exe
TmpDir "C:\\Progra~1\\SWISH-E\\indexes\\Tmp\\"
SwishProgParameters
"C:\\Progra~1\\SWISH-E\\lib\\swish-e\\spider.pl"
default "http://local.dev.port.com"
ReplaceRules remove http://local.dev.port.com
You can run this against the test.portofoakland.com after dumbing down
the test_url to skip pdfs then run a search against the create index
file. I still get no descriptions.
Received on Fri Sep 24 13:39:23 2004