On Mon, May 19, 2008 at 03:55:15PM +0530, Saubhagya Srivastava wrote:
> I suppose the http://validator.w3.org/. validates for XHTML, my HTML files
> can easily open in browser and the tags are intact. The
> http://validator.w3.org/.gives an error even for many sites which are
> perfectly running.
Browsers are forgiving. Doesn't mean your markup is not right.
> 1. Output not readable, do we have some program to read that output?
> 2. Cannot index website, says : No such file or directory .
That means you are not telling it a valid path.
> If possible send me a complete example.
You already have a complete example. Did you check the examples in
I believe the INSTALL document has a few examples. The list archives
have many, I'm sure.
> PROBLEM DESCRIPTION is as follows :
> C:\SWISH-E\bin>swish-e.exe -c Config_http.txt
> Indexing Data Source: "File-System"
> Indexing "http://www.download.com"
> Warning: Invalid path 'http://www.download.com': No such file or directory
You need to read the documentation again. You are telling Swish to
index a file called http://www.download.com. You don't have a *file*
called that on your computer.
> Another config.txt is attached that indexes from html files but it gives
> some tags errors like " ; " is expected at some places in html file, whereas
> the html file is perfect and opens in browser. Following is the output:
The html file is not perfect because it's giving you those messages:
> Guide.html - Using DEFAULT (HTML2) parser - D:HTML_files/Guide.html:1374:
> error: htmlParseEntityRef: expecting ';'
> write << SettingsVersion << volume <<
< without the ";" is not a valid entity.
But, that's not preventing swish from indexing (you will just end up
with a word spelled "lt" in your index.
I'd recommend validating and fixing your html. But if you don't feel
able then you can change the logging level to suppress that error.
See "ParseWarnLevel" in SWISH-CONFIG.
> *In this case the output generated is not readable; it has something like
> Japanese characters. This is the main problem.*
What output? Maybe your terminal's charset does not match the output
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Users mailing list
Received on Mon May 19 18:27:59 2008