HTML5 or XHTML
1. Introduction HTML has been revised to include more support for rich internet applications (RIA), graphics and video, mobile computing, and other recent developments. The new version is known as HTML5 and also includes better support for sections, sidebars, etc. (semantic markup). The revision of HTML was developed by the WHATWG or Web Hypertext Applications Technology Working Group. http://www.whatwg.org/ (Just to make life a little more complicated, HTML5 may not include all the new features of the now abandonded XHTML2.0.)
The new version is called HTML5 and it may be 'served' or 'serialized' as either HTML (which, of course, is more forgiving) or as XHTML (more complex, but also has support for namespaces, MathML etc.) These two ways are referred to as HTML5 and XHTML. The effort to revise XHTML to XHTML2 was aborted in 2010.
HTML5 is now (just include the <!doctype html> tag), supported for the most part in all modern browsers, but some browsers are not yet
supporting all of it. (A feature of HTML5 becomes the
standard when it is supported by at least two browsers, so not all browsers need support all features.)
The best places to check about support of specific features
of HTML5 and CSS3 is CanIUse.com and the best set of tables are at
QuirksMode.com with separate tables for
mobile browsers.
The next sections of this document tell you what you need to know for HTML5 pages, for XHTML pages, and provide a list of references. That list is valuable for those who are must use XHTML (e.g. for ChemML, ebXML and other applications which need XML) or who need to work on a site written in XHTML. For the most part, everyone else is using HTML5. A more complete set of references is available on my Web Centric Resources page
2. HTML5 documents begin
<!DOCTYPE html>
<html lang='en'>
You may write either <!DOCTYPE or <!doctype in the first tag, but (unlike HTML4.01) you must include a doctype. This will also guarantee that your browser will use the most recent version of HTML.
Anything transmitted with the MIME type text/html will be rendered as HTML5. The w3c recommends this for most authors as it will be compatible with older browsers. (see section 1.4.1 in http://www.w3.org/TR/html5/introduction.html ).
The lang attribute is optional and is specified in the html tag as lang='en'. (If not specified, it defaults to the lang value of the parent.)
The charset is specified as the FIRST element after <head>
<meta charset ="utf-8">
3. XHTML documents begin
<html
xmlns='http://www.w3.org/199/xhtml'
xml:lang='en'>
Anything transmitted with MIME type application/xhtml+xml or application/xhtml or application/xml will be processed with an XML processor in the web — i.e. rendered as XHTML. (See same reference.)
In HTML5, the DOM now is more than a way to manipulate the page (an API); each element in the DOM now has a meaning or semantics attached to it.
The lang attribute is mandatory and is specified in the tag as xml:lang='en'
The w3c recommends that you do NOT include the processing instructions for XHTML.
<?xml
version="1.0" encoding="UTF-8" ?>
This is because some user agents will render (produce) this. Lacking this line, the charset will default to UTF-8 encoding (or possibly UTF-16), which is just fine.
Note: You will need the processing instruction for XML documents — see Unit 4 — Ch. 7 of this course.
Please also see references on doctypes and xml processors.
4. References:
http://www.w3.org/TR/html5/
has the current (June 2008) working draft for HTML5 . A later version (August 2008) is at http://www.whatwg.org/specs/web-apps/current-work/
and this version includes a good description of how all the various HTMLs and XHTMLs are related.
http://www.w3.org/MarkUp/xhtml-roadmap/
has the plan for all XHTML modifications — especially see table near the bottom
http://www.w3.org/TR/html5-diff/ has the differences between HTML4.01 and HTML5, including information on the new elements in HTML5 in the Language section. (Also see the first reference or http://www.ibm.com/developerworks/xml/library/x-html5/ for a sophisticated introduction or http://www.runwalsoft.com/blog/?p=15 for a gentler version. The wiki on these differences is at http://wiki.whatwg.org/wiki/HTML_vs._XHTML )
http://simon.html5.org/html5-elements
has all elements and attributes of HTML5 (click on the item of interest in the
left column), but has no revision date, so I don't know if it is staying
current or not.
http://www.whatwg.org/
is the group developing X/HTML5. They
run a wiki at http://wiki.whatwg.org/wiki/Main_Page
and http://wiki.whatwg.org/wiki/FAQ
http://xhtml.com/en/future/conversation-with-x-html-5-team/
is a gentle introduction to HTML5. (User agent means things like browsers.)
http://xhtml.com/en/future/x-html-5-versus-xhtml-2/
explains the differences between HTML5 and XHTML2.0
http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html explains the HTML5 vs XHTML difference
http://meyerweb.com/eric/thoughts/category/tech/xhtml/ has Eric Meyer (the great guru of CSS) writings on XHTML and HTML5. He is always worth reading (e.g. http://meyerweb.com/eric/thoughts/2008/06/02/the-missing-link/ )
http://www.w3.org/QA/2002/04/valid-dtd-list.html has a list of valid doctype declarations — last updated in 2007, so it does NOT have the X/HTML5 recommendations. Note that the w3c now recommends that you NOT include the <?xml version ... > tag on your xhtml pages.
http://www.w3.org/QA/2008/03/html-charset.html has more information than you probably want to know about charsets and encoding.
http://www.w3.org/QA/2002/04/valid-dtd-list.html has a list (last revised in 2007) of all possible doctypes and which are recommended when. Because it is missing HTML5 I suggest you check periodically and NOT use the <?xml ... > line for XHTML documents.
Here is what the w3c has to say on this matter:
First of all they have a clear, short discussion of HTML, CSS and XHTML at
http://www.w3.org/standards/webdesign/htmlcss
A more technical discussion follows:
1.6 HTML vs XHTML
/This section is non-normative./
This specification defines an abstract language for describing documents and applications, and some APIs for interacting with in-memory representations of resources that use this language. /
The in-memory representation is known as "DOM HTML", or "the DOM" for short.
There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.
The first such concrete syntax is the HTML syntax. This is the format suggested for most authors. It is compatible with most legacy Web browsers. If a document is transmitted with an HTML MIME type
The second concrete syntax is the XHTML syntax, which is an application of XML. When a document is transmitted with an XML MIME type
The DOM, the HTML syntax, and XML cannot all represent the same content. For example, namespaces cannot be represented using the HTML syntax, but they are supported in the DOM and in XML. Similarly, documents that use the |noscript
http://dev.w3.org/html5/spec/Overview.html
My note: the XHTML5 name did not stick.