The Texinfo software produces invalid HTML documents. The following gives some detail.
In around , i have created a HTML version of the GNU Emacs Lisp Reference Manual for my website, with corrections to the HTML and cleaned up CSS, so that the HTML are valid HTML documents and the CSS is handcrafted for better online presentation.
In the process of creating this cleaned up HTML version, there are several problems i found generated by the texinfo software outputing to HTML. Here's a summary. I hope others who want to convert GNU docs to valid HTML might benefit, or that texinfo developers might fix these problems.
Problems with texinfo generated HTML, with respect to HTML 4 transitional:
<p><hr></div>, which is invalid.
<meta http-equiv="refresh" content="0; url=Autoload.html#autoload%20cookie">. The file needs to have the proper DTD declaration, header tags, etc. Large number of these file's names starts with “Definition-of-”.
Problems with respect to HTML4 strict:
<ol type=1 start=1>should just be
<ol type=1 start=0>is not supported in HTML 4 strict. Starting with counter 0 is a bit complicated in HTML 4 strict with CSS and not well supported by browsers. Best to just use
<ol start="0">and use HTML 4 lose DTD.
</p></blockquote>but missing a opening
blockquoteshould have block-level tags. For example,
<blockquote><p>…</p></blockquote>is valid, but
<blockquote>…</blockquote>is not valid. In the elisp manual, this is violated often in places where there's the string
<b>Common Lisp note:</b>.
</dl></p></blockquote>, which is invalid. (about 100 files)
<div align="center">. In HTML 4 strict, there is no “align” attribute.
<table summary=""><tr align="left"><td valign="top" width="5%"></td><td valign="top" width="30%">
General HTML issues
<div class="block">with CSS
In the elisp manual (one node per HTML page, roughly 850 HTML pages), there are 70 (local) links to other GNU documents. The local links are nice in that they provide cross-reference, but if one hosts only the elisp doc, all these local links will be dead.
Therefore, it would be nice, to have perhaps at texinfo level to embed markers to links that cross-ref to external docs, or perhaps at the HTML conversion level to provide a option to filter local links, so that local links can replaced as non-links (such as “See Emacs manual node on Abbrev”) or full http links to the right uri at gnu.org.
Vast majority of the 70 local links in the elisp doc are references to Emacs doc, but there are 6 that refers to widget, ses, cl, libc.
I presume that people who wishes to host GNU doc do not want to host the entire set of GNU's documentation.
Also, the texinfo still use the convention of backtick
` and straight single quote
' to emulate curly quotes
“ ” or
‘ ’. It also use other ASCII kludge such as
=> instead of
⇒. The ability to display these chars has been widely available on commercial platforms since mid 1990s, and on Linuxes since about 2003 or so (emacs itself support Unicode to a practical degree since emacs 21, released in 2001). It is perhaps time to update GNU doc convention to utf8 and use the proper characters.