Hi Klaus,
At 02:03 AM 12/19/2001 -0800, Klaus Hollenbach wrote:
>Is it somehow possible to make swishe not index php-code in a html-file?
>I switched off the IndexComments directive, but the php-code seems still
>be indexed.
"not index"? What version are you running? 2.1-dev, I hope.
It's more helpful if you post examples. I can't make swish index either
php or asp. Wouldn't you want to spider so that ASP/php can fill in your
pages so you index what people see?
Anyway:
> cat 1.php
<html><head><title>PHP Test</title></head>
<body>
<?php echo "Hello World<p>"; ?>
</body></html>
> cat c
Defaultcontents HTML
> ./swish-e -c c -i 1.php -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'php' Pos:1 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'test' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Indexing done!
> cat c
Defaultcontents HTML2
ParserWarnLevel 9
> ./swish-e -c c -i 1.php -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'php' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'test' Pos:3 Stuct:0x7 ( HEAD TITLE FILE )
1.php:3: error: htmlParseStartTag: invalid element name
<?php echo "Hello World<p>"; ?>
^
Indexing done!
>
>Is there a similar directive that avoids indexing of non-html-code? E.g.
>everything that is enclosed within asp-style tags
>( <% ... %> ) ?
> cat 1.html
<title>titleword</title>
foo
<% comment %>
> cat c
Defaultcontents HTML
> ./swish-e -c c -i 1.html -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'titleword' Pos:1 Stuct:0x3 ( TITLE FILE )
Adding:[1:swishdefault(1)] 'foo' Pos:2 Stuct:0x1 ( FILE )
Indexing done!
> cat c
Defaultcontents HTML
IndexComments yes
> ./swish-e -c c -i 1.html -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'titleword' Pos:1 Stuct:0x3 ( TITLE FILE )
Adding:[1:swishdefault(1)] 'foo' Pos:2 Stuct:0x1 ( FILE )
Indexing done!
> cat c
Defaultcontents HTML2
> ./swish-e -c c -i 1.html -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'titleword' Pos:2 Stuct:0x7 ( HEAD
TITLE FILE )
Adding:[1:swishdefault(1)] 'foo' Pos:5 Stuct:0x9 ( BODY FILE )
Indexing done!
> cat c
Defaultcontents HTML2
IndexComments yes
> ./swish-e -c c -i 1.html -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'titleword' Pos:2 Stuct:0x7 ( HEAD
TITLE FILE )
Adding:[1:swishdefault(1)] 'foo' Pos:5 Stuct:0x9 ( BODY FILE )
Indexing done!
Now enable libxml2 parser warnings:
> cat c
Defaultcontents HTML2
ParserWarnLevel 9
IndexComments yes
> ./swish-e -c c -i 1.html -v 0 -T indexed_words
Indexing Data Source: "File-System"
Adding:[1:swishdefault(1)] 'titleword' Pos:2 Stuct:0x7 ( HEAD
TITLE FILE )
1.html:3: error: htmlParseStartTag: invalid element name
<% comment %>
^
Adding:[1:swishdefault(1)] 'foo' Pos:5 Stuct:0x9 ( BODY FILE )
Indexing done!
Bill Moseley
mailto:moseley@hank.org
Received on Wed Dec 19 14:14:24 2001