Emacs GNU Texinfo Problems; Invalid HTML
The Texinfo software produces invalid HTML documents. The following gives some detail.
In around , i have created a HTML version of the GNU Emacs Lisp Reference Manual for my website, with corrections to the HTML and cleaned up CSS, so that the HTML are valid HTML documents and the CSS is handcrafted for better online presentation.
The version can be seen here: GNU Emacs Lisp Reference Manual. The story of my motivation is documented here (warning: rant): A Record of Frustration in IT Industry.
In the process of creating this cleaned up HTML version, there are several problems i found generated by the texinfo software outputing to HTML. Here's a summary. I hope others who want to convert GNU docs to valid HTML might benefit, or that texinfo developers might fix these problems.
Problems with texinfo generated HTML, with respect to HTML 4 transitional:
- there's no doctype declaration.
- when there's a footnote, it is generated as
<p><hr></div>, which is invalid.
- There are about 44 files, whose sole content is a meta redirect like this:
<meta http-equiv="refresh" content="0; url=Autoload.html#autoload%20cookie">. The file needs to have the proper DTD declaration, header tags, etc. Large number of these file's names starts with “Definition-of-”.
Problems with respect to HTML4 strict:
<ol type=1 start=1>should just be
<ol type=1 start=0>is not supported in HTML 4 strict. Starting with counter 0 is a bit complicated in HTML 4 strict with CSS and not well supported by browsers. Best to just use
<ol start="0">and use HTML 4 lose DTD.
- Sometimes there's
</p></blockquote>but missing a opening
- In HTML 4 strict, content inside
blockquoteshould have block-level tags. For example,
<blockquote><p>…</p></blockquote>is valid, but
<blockquote>…</blockquote>is not valid. In the elisp manual, this is violated often in places where there's the string
<b>Common Lisp note:</b>.
- Many blockquotes have this line
</dl></p></blockquote>, which is invalid. (about 100 files)
- A few files have this line
<div align="center">. In HTML 4 strict, there is no “align” attribute.
- The “width” attribute is not supported in
<table summary=""><tr align="left"><td valign="top" width="5%"></td><td valign="top" width="30%">
General HTML issues
- the CSS is plastered into every page. It should be one CSS file instead.
- it should declare utf8 as the charset. (so that it doesn't need to do a lot HTML character encoding)
- The blockquote tag is heavily used, multiple times on almost every page, for indenting purposes. This should not be done so. Use this instead:
<div class="block">with CSS
In the elisp manual (one node per HTML page, roughly 850 HTML pages), there are 70 (local) links to other GNU documents. The local links are nice in that they provide cross-reference, but if one hosts only the elisp doc, all these local links will be dead.
Therefore, it would be nice, to have perhaps at texinfo level to embed markers to links that cross-ref to external docs, or perhaps at the HTML conversion level to provide a option to filter local links, so that local links can replaced as non-links (such as “See Emacs manual node on Abbrev”) or full http links to the right uri at gnu.org.
Vast majority of the 70 local links in the elisp doc are references to Emacs doc, but there are 6 that refers to widget, ses, cl, libc.
- problem link: elisp/Buttons.html ../widget/index.html
- problem link: elisp/Customization-Types.html ../widget/index.html
- problem link: elisp/Defining-New-Types.html ../widget/index.html
- problem link: elisp/Function-Safety.html ../ses/index.html
- problem link: elisp/Lisp-History.html ../cl/index.html
- problem link: elisp/Locales.html ../libc/Locales.html
- problem link: elisp/Time-Parsing.html ../libc/Formatting-Calendar-Time.html
I presume that people who wishes to host GNU doc do not want to host the entire set of GNU's documentation.
Use of ASCII
Also, the texinfo still use the convention of backtick
` and straight single quote
' to emulate curly quotes
“ ” or
‘ ’. It also use other ASCII kludge such as
=> instead of
⇒. The ability to display these chars has been widely available on commercial platforms since mid 1990s, and on Linuxes since about 2003 or so (emacs itself support Unicode to a practical degree since emacs 21, released in 2001). It is perhaps time to update GNU doc convention to utf8 and use the proper characters.
- Emacswiki Problems (problems and suggestions for emacs wiki)
- Problems of Emacs's Manual (suggestions on the emacs manual)
- HTML Correctness and Validators
- Google Earth KML Validation Fuckup
- Programing: Google and Amazon Generates Invalid HTML
- Simple Changes Emacs Should Adopt
- Why Emacs Keys are Painful
- Problems of the Scratch Buffer
- M-key Notation vs Alt+key Notation
- Menu Problem
- Mode Line Problem
- cua-mode Problems
- kill-buffer Induces Buffer Accumulation
- Emacs Form Feed ^L
- Inconsistency of Search Features
- Single Key to Delete Whole Line
- Emacs HTML Mode Sucks
- Emacs Does Not Support Viewing Images Files In Windows
- Emacs Spell Checker Problems
- Adopt HTML as Texinfo Replacement
- Support HTML Mail
- Problems of “man” Command
- Emacs Lisp Mode Syntax Coloring Problem
- Emacs Ahk Mode Problems
- Problems of Emacs's Manual
- Problems of Emacs's Manual; Examples
- Emacs: Have You Read Emacs Manual?
- Elisp: Ban Syntax Table
- Emacs: Make elisp-index-search use Current Symbol
- Emacs: Usability Problems of Mode Documentation
- Emacs GNU Texinfo Problems; Invalid HTML
- A Record of Frustration in IT Industry; Disappearing FSF URLs, 2006
- Emacs Manual Node Persistency Issues
- Emacs: dired-do-query-replace-regex Replace ALL (fixed)
- Problems of Emacs Supporting Obsolete Systems
- Elisp: Function to Copy/Delete a Dir Recursively (fixed)
- Thoughts on Common Lisp Scheme Lisp Based Emacs
- Text Editors Popularity and Market Research
- Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)
- GNU Emacs Development Inefficiency
- Emacs Dev Inefficiency and Emacs Web 2.0?
- Letter-Case Commands Usability Problems
- Emacs: Select Line, between Quotes, Extend Selection
- Emacs: isearch Current Word
- Suggestions on Line Wrap Commands
- Emacs: Single Key to Delete Whole Line
- Emacs Undo and Emacs Cult Problem