Programing: GNU Texinfo Problems; Invalid HTML
The Texinfo software produces invalid HTML documents. The following gives some detail.
In around , i have created a HTML version of the GNU Emacs Lisp Reference Manual for my website, with corrections to the HTML and cleaned up CSS, so that the HTML are valid HTML documents and the CSS is handcrafted for better online presentation.
The version can be seen here: GNU Emacs Lisp Reference Manual. The story of my motivation is documented here (warning: rant): A Record of Frustration in IT Industry.
In the process of creating this cleaned up HTML version, there are several problems i found generated by the texinfo software outputing to HTML. Here's a summary. I hope others who want to convert GNU docs to valid HTML might benefit, or that texinfo developers might fix these problems.
Problems with texinfo generated HTML, with respect to HTML 4 transitional:
- there's no doctype declaration.
- when there's a footnote, it is generated as
<p><hr></div>, which is invalid.
- There are about 44 files, whose sole content is a meta redirect like this:
<meta http-equiv="refresh" content="0; url=Autoload.html#autoload%20cookie">. The file needs to have the proper DTD declaration, header tags, etc. Large number of these file's names starts with “Definition-of-”.
Problems with respect to HTML4 strict:
<ol type=1 start=1>should just be
<ol type=1 start=0>is not supported in HTML 4 strict. Starting with counter 0 is a bit complicated in HTML 4 strict with CSS and not well supported by browsers. Best to just use
<ol start="0">and use HTML 4 lose DTD.
- Sometimes there's
</p></blockquote>but missing a opening
- In HTML 4 strict, content inside
blockquoteshould have block-level tags. For example,
<blockquote><p>…</p></blockquote>is valid, but
<blockquote>…</blockquote>is not valid. In the elisp manual, this is violated often in places where there's the string
<b>Common Lisp note:</b>.
- Many blockquotes have this line
</dl></p></blockquote>, which is invalid. (about 100 files)
- A few files have this line
<div align="center">. In HTML 4 strict, there is no “align” attribute.
- The “width” attribute is not supported in
<table summary=""><tr align="left"><td valign="top" width="5%"></td><td valign="top" width="30%">
General HTML issues
- the CSS is plastered into every page. It should be one CSS file instead.
- it should declare utf8 as the charset. (so that it doesn't need to do a lot HTML character encoding)
- The blockquote tag is heavily used, multiple times on almost every page, for indenting purposes. This should not be done so. Use this instead:
<div class="block">with CSS
In the elisp manual (one node per HTML page, roughly 850 HTML pages), there are 70 (local) links to other GNU documents. The local links are nice in that they provide cross-reference, but if one hosts only the elisp doc, all these local links will be dead.
Therefore, it would be nice, to have perhaps at texinfo level to embed markers to links that cross-ref to external docs, or perhaps at the HTML conversion level to provide a option to filter local links, so that local links can replaced as non-links (such as “See Emacs manual node on Abbrev”) or full http links to the right uri at gnu.org.
Vast majority of the 70 local links in the elisp doc are references to Emacs doc, but there are 6 that refers to widget, ses, cl, libc.
- problem link: elisp/Buttons.html ../widget/index.html
- problem link: elisp/Customization-Types.html ../widget/index.html
- problem link: elisp/Defining-New-Types.html ../widget/index.html
- problem link: elisp/Function-Safety.html ../ses/index.html
- problem link: elisp/Lisp-History.html ../cl/index.html
- problem link: elisp/Locales.html ../libc/Locales.html
- problem link: elisp/Time-Parsing.html ../libc/Formatting-Calendar-Time.html
I presume that people who wishes to host GNU doc do not want to host the entire set of GNU's documentation.
Use of ASCII
Also, the texinfo still use the convention of backtick
` and straight single quote
' to emulate curly quotes
“ ” or
‘ ’. It also use other ASCII kludge such as
=> instead of
⇒. The ability to display these chars has been widely available on commercial platforms since mid 1990s, and on Linuxes since about 2003 or so (emacs itself support Unicode to a practical degree since emacs 21, released in 2001). It is perhaps time to update GNU doc convention to utf8 and use the proper characters.
- Emacswiki Problems (problems and suggestions for emacs wiki)
- Problems of Emacs's Manual (suggestions on the emacs manual)
- HTML Correctness and Validators
- Google Earth KML Validation Fuckup
- Programing: Google and Amazon Generates Invalid HTML