There Is No Document Outline Algorithm

I figured I would state the entire argument in the title. After all, as of this writing and the last seven-plus years, the statement is accurate as far as the browsers are concerned.

I am penning this as sort of a follow-up to my post from 2013, The Truth about “The Truth About Multiple H1 Tags”. Even after that post helped kick off an update to the W3C HTML5 specification, it is not reflected in current tutorials and informative pieces.

Bad Information Persists

The appeal of ignoring heading levels for developers and authors is pretty compelling when you do not know how or where that content will appear (or it appears in many different locations). In particular, the all-<h1> approach appeals to many for its simplicity.

It makes sense why a developer might see this advice, or hear about it through misinformed articles, and never look back. This advice is a free pass. Many content authors don’t even know where their content will appear, making an all-<h1> approach feel like the safe approach.

Unfortunately, despite all the activity in the standards world along with the lack of activity on the part of the browsers, many developers continue to be unaware that this imparts no benefits to users and even harms many of those users. I run into this repeatedly when I answer questions on Stack Overflow, when I talk to developers in real life, and even from generally trusted outlets:

This was part of the spec, and it was “revoked”, which is not a nice thing to do. And it was revoked not because they considered that it was a bad idea, but because of screen readers not implementing it correctly.

This is a common position, captured succinctly in this one example.

Disregarding the fact that it was never part of the final W3C spec, that the spec had a warning for three years, that nobody considered the algorithm a bad idea, that screen readers had nothing to do with it, and that browsers not implementing it is different from correctly implementing it, there is one statement that belies the issue at hand.

Not a nice thing to do is a value judgment. It presumes that the specification’s primary benefactors are developers when in reality it is about users. It also presumes that it is acceptable to give developers advice (that harms some users) that has never been supported.

Like it or not, browsers are not moving on this feature and citing the purely theoretical document outline does nothing to move it forward. We as developers need to resolve this while still making it easy for content authors.

Update: February 13, 2017

There is a new issue opened against the W3C specification to try to understand how the outline algorithm is supposed to work so a polyfill can be created. This is sometimes a first step to getting support built into browsers. Read more at the issue, Update outline algorithm #794.

One Alternative (in Two Parts)

The average web developer would rather not have to think about mapping the appropriate heading level for every potential re-use of content. Authors should not have to think about it at all.

Server Side

Way back in the early oughts (actually, 1999–2000) I wrote a CMS (Content Management System) based on delimited text files. It was a lark. I wanted to teach myself some programming skills and my brother needed a mini-CMS while he was overseas.

I quickly ran into the heading issue that HTML5 tried to solve — sometimes his content would be re-used elsewhere in the layout, and the headings would not make sense anymore. But I solved it. I solved it without any fancy frameworks or libraries or HTML5 retooling.

Every content container carried a variable (this was all server-side code). That variable was a number reflecting its nesting level on the page. That number was then used to replace the number in any <h#> levels in the content (the content was chunked enough that there was not more than one heading).

I carried that technique forward into projects on much beefier CMSes and never had to worry about training authors how to manage chunked content on their home pages (and similar chunked pages). The move to HTML5 never made me consider an all-<h1> solution, partly because I knew the outline was not supported.

Client Side

Since so much of the content on modern sites comes in via client-side scripting, the code would simply need to be updated to run in the browser — this assumes you don’t mind offloading simple processing to thousands of users across uncontrollable run-times. But then if you are relying on client-side scripts to render a page you have already made your decision.

The following embedded code shows an HTML document that uses an all-<h1> structure. With a (not production-ready) chunk of JavaScript and some custom data- attributes on the sectioning containers, I re-write the <h1>s to reflect a document outline appropriate for this content. I use some CSS generated content to include the heading level after the text of each heading so you can easily see which is which. If the script does not work, you will see black headings sans parentheses for all.

Conceivably you can let your authors continue an all-<h1> approach, while your templates just tweak the structure based on attributes you embed in the layout.

See the Pen Dynamic Heading Level Demo by Adrian Roselli (@aardrian) on CodePen.

You can see (steal, fix) the script at CodePen directly, or you can view it as a full page and make sure your assistive technology (such as a screen reader in this case) can navigate the corrected heading structure as you expect.

Another Alternative

We can work to get browsers to support another new element, the <h> element (proposed in April, which was probably based on Gez Lemon’s 2004 suggestion). Browsers would still need to implement some sort of document outline algorithm, but in this case a new element means no need to rewrite existing <h#> logic.

That will require the developer community to come together as it did for the <picture> element. It can be done, it just requires some effort.

Update: January 18, 2017

While the issue opened last April has since been closed (since it was about a language change in the spec), a new issue was opened specifically for adding an <h> element.

Minutiae

The statement in the title of this post is not new. It has been discussed for at least three years in standards bodies. It has been ignored by browsers for longer (more than seven years), though the browser bugs linked at the end are only a couple years old. Anyone who claims this is a recent change has not confirmed that with the W3C specification in two years.

The following links are just evidence I have needed to provide repeatedly to demonstrate these points. I guess they are more for me to easily reference from future Stack Overflow answers.

To recap, the Document Outline Algorithm was never a recommendation in a final W3C spec. There was a warning explicitly against authors relying on it, though the outline language was retained for browsers to understand how to implement support (eventually).

Regardless of whether you like the idea of the document outline algorithm, it does not reflect reality since no user agent supports it.

Update: January 23, 2018

WHATWG has maintained the fiction of the document outline despite no implementations. An issue against the spec was opened in 2015 to rectify this, where it has languished.

Until today. New conversation has started, with any eye toward accessibility. So there is promise we will either see a more workable proposal for browsers (to ignore?) and/or acknowledgment that the current WHATWG definition needs to be scrapped.

Update: March 1, 2019

I was made aware that MDN has an entire section on using the document algorithm outline, so I edited the page to drop a warning into it in three spots.

Update: October 15, 2019

An effort is underway at WHATWG to try to resurrect <hgroup> (Alternative take on hgroup #5002), an element dropped from the W3C version of HTML in 2013 and never supported in any browser. If you pay attention to the description, however, it is the latest effort to try to get a Document Outline Algorithm into the WHATWG HTML5 specification.

I made a pitch, which got some positive emoji responses, but a dismissive response from the OP:

Alternative proposal:

  1. Declare that <hgroup> on its own does nothing;
  2. Mint <h>.

Then <hgroup> does not modify <h#>, thereby leaving existing structures intact and not modifying author intent for explicitly-chosen heading levels.

Given use of <hgroup> I have seen in the wild, this approach will not make any existing heading structures less accessible.

If the effort here is to justify <hgroup>, and by extension try again at a Document Outline Algorithm, then let’s mint a new required child element, <h>. The <h> element can then get its nesting level from the algorithm proposed here.

This has the advantage of keeping existing heading parsing logic in place and compartmentalizing the logic of this new effort at a Document Outline Algorithm without blowing up 30 years of existing content and rules. It may also make uptake in user agents a bit easier to swallow.

We can lean on a previous effort to mint <h> to kick this off.

For the sub-heading concept, we can argue that any non-<h> non-phrasing-content-element child of <hgroup> is a de facto sub-head, whether it is a <div> or a <p> (probably more thought required there).

I suspect there will never be support for <h>.

Update: January 7, 2020

In a new post at Smashing Magazine, Why You Should Choose HTML5 <article> Over <section>, Bruce Lawson reaffirms that there is no document outline algorithm and that no, you should not pepper your page with <h1>s.

Update: January 25, 2020

Ongoing efforts at WHATWG to create a functional outline algorithm that browsers want to, and can, implement continues despite two years with no progress. Well, there was progress in so far as Mozilla tried, and failed, to implement the latest effort. Remember, W3C never had a document outline algorithm in a final spec, though WHATWG did (that email implies it was in a spec that browsers implemented), even though it never reflected reality.

It’s been 7 years of no browser support. This latest effort trying to resurrect <hgroup> as the new keeper of the Document Outline Algorithm may not end that drought.

Update: February 10, 2020

Steve Faulkner has wrapped up the history and current situation (as detailed in my January 25 update above) in his aptly-titled post A decade of heading backwards.

Update: April 6, 2022

Interestingly, the first version of HTML discussed, and dropped, heading levels that adapted to sectioning:

Should we support headers for which the level is implicitly defined by nestable section elements?*2 We could also support autonumbering of headers. Unfortunately, on further investigation these ideas proved trickier than thought at first, and so have been dropped from this draft.

2. For example with <H> for headers and <SECTION> for nestable sections.

Update: April 9, 2022

Thanks to Ramón Corominas’ memory, I was able to confirm that IE9 / JAWS 13 announced <h1>s in simple nested <section>s at an appropriate depth. Chrome 99 / JAWS 13 did not, Firefox 91 ESR refused to work with JAWS 13 at all. I did not record more complex constructs, but it started to fall apart pretty quickly as I adjusted the nesting to match things I have seen in real life. For timeline context, JAWS 13 was released in late 2011, while the algorithm was still a draft.

I found it! 🙂 It was in May 2012, and the combination was: Firefox 10/IE 9 + JAWS 13, but it only worked when using <h1> for every heading. <hgroup> had no support at all

If any <h2>-<h6> were used within a section, the level was incorrectly increased, and any headings with a calculated level higher than 6 were no more interpreted as headings

I recorded a video that uses this HTML, pulled from the WHATWG HTML specification examples for headings and sections:

 <h1>Apples</h1>
 <p>Apples are fruit.</p>
 <section>
  <h1>Taste</h1>
  <p>They taste lovely.</p>
  <section>
   <h1>Sweet</h1>
   <p>Red apples are sweeter than green ones.</p>
  </section>
 </section>
 <section>
  <h1>Color</h1>
  <p>Apples come in various colors.</p>
 </section>
JAWS 13 with Internet Explorer 9.

This is a case where IE9 was not exposing the nesting level information (IE9 does not expose heading semantics in the accessibility layer), but JAWS was using heuristics to try to support the draft specification.

Steve Faulkner gave some context:

The JAWS implementation was flawed and they couldn’t get it right, so they pulled it.

The JAWS implementation was sponsored by Rich S/IBM in discussion with me at the time

Despite one of the WHATWG HTML editors asserting this week that The problem is about the mismatch with accessibility tech, it looks like some accessibility tech tried to match the draft specification in 2011 and rolled it back.

This is all on the radar again since Léonie Watson is trying to get some help from WHATWG on publishing the January 2021 HTML Review Draft as a W3C Candidate Recommendation after Steve (and I along with others) raised an objection since it contains the fictional Document Outline Algorithm.

Update: April 18, 2022

Back in 2015, one of the WHATWG HTML contributors suggested a preference for removing the Document Outline Algorithm from the WHATWG HTML specification instead of adding a warning, but the editor at the time disagreed, stating a full re-write was necessary. Then 7 years of no movement from WHATWG.

Last week the now-current WHATWG HTML editor, after the failure of anyone to get the outline algorithm implemented in the last 7 years, pivoted back to the 2015 plan, though once again implying he would not do it.

So Steve Faulkner did. Steve filed WHATWG HTML PR #7829 removes outline algorithm . On Easter Sunday. If all goes well, maybe it won’t be another 7 years for this to be merged.

Steve also points out that User Agent default CSS style sheets do not visually honor the Document Outline Algorithm (something folks have incorrectly asserted for years):

See the Pen incomplete implementation of outline styles by steve faulkner (@stevef) on CodePen.

Update: July 1, 2022

The Document Outline Algorithm is now gone from the WHATWG HTML specification.

It took 6¾ years from when Steve Faulkner first opened the issue, with the intervening time seeing piles of evidence ignored, the backing of dozens of experts, spec editor gatekeeping, a pull request, and help shepherding it through the WHATWG process, but Steve pulled it off.

If you see any tools, editors, articles, “experts”, etc., pitching the Document Outline Algorithm, remind them they are wrong (and have been).

Update: July 7, 2022

Bruce has provided some context as well, which is far shorter than my stove-piped post, in Why the HTML Outlining Algorithm was removed from the spec – the truth will shock you!. This part hits home:

One of the reasons I liked having a W3C versioned specification for HTML is that it would reflect the reality of what browsers do on the date at the top of the spec. A living standard often includes things that aren’t yet implemented. And the worse thing about having zombie stuff in a spec is that lots of developers believe (in good faith) that it accurately reflects what’s implemented today.

With the version-less WHATWG spec, the update is only there if people remember to look. So it might be some time before folks believe it. Even after years of evidence.

Update: August 24, 2023

Don’t blame screen readers for this. As I noted above, a screen reader was the first to try to implement the document outline algorithm when the browsers would not. It was a screen reader that proved the algorithm was untenable.

Update: September 19, 2023

The 2019 suggestion from Mu-An Chiou for headinglevelstart has fresh activity coming out of TPAC. Scott mentioned current limitations in screen readers along with designating an upper limit.

Note that this in no way is an effort to try to revive the failed Document Outline Algorithm.

26 Comments

Reply

I had occasion to look back for on this already in the course of 24 hours or so and I didn’t find this in your links, so I’m including it in the comments just incase anyone is interested https://discourse.wicg.io/t/html5-h-custom-element/438/39

In response to Brian Kardell. Reply

Thanks. I totally spaced on that after you showed it to me. Appreciate you linking it.

Reply

In the interest of generating a solid outline for screen readers, I was wondering if it was okay practice to blend explicit els with role="heading" + aria-level="x", and then hide it for non-screenreaders; e.g.

    Main Site Navigation

The idea being to have presentational structures that don’t translate to screen readers without excluding their users from understanding. (also hopefully google doesn’t ding us)

Gregory; . Permalink
In response to Gregory. Reply

Short answer: No.

Longer answer: Not all screen reader users are blind.

Yet longer answer: If you can already put the appropriate heading level into an aria-level attribute, then you can use the correct <h#>, so this seems like a lot of extra effort for a potentially confusing (and SEO-risky) approach.

Reply

Dude, you’re a bozo. Plain and simple. Even if there’s “no such thing as document outline”, just by using a document outline can help tell you if the site is complete shit. You can actually tell by just visiting the site you want to outline. 99% of the time, it completely matches the shittiness that is displayed. Almost as bad as your writing. Almost.

In response to Alf. Reply

Alf, other than your points about me being a bozo and my poor writing, I am not sure I understand what you are saying. On the first two points I agree, though. So kudos on being right in your opening and close!

Reply

Hi Adrian,
I just recently found your blog and the information here is really helpful. Thank you so much.
In short, does it mean that there is always only one H1, which is the title, in a web page?
Chris

Chris Wong; . Permalink
In response to Chris Wong. Reply

Chris, yes, you have distilled my position well. Have a single <h1> per page, and that <h1> should correspond to the value of the <title> (excluding the site name, marketing tagline, etc).

Reply

Perhaps this is not the place nor time to ask (i preemptively apologize if so) but is it allowed to use multiple <header> elements on a page?
And should <section> elements be used?

In response to J Redhead. Reply

J Redhead, it is totally fine to use multiple <header>s on a page. Note that only the first instance of <header> under <body> will be considered a banner landmark. <section> elements are fine to use, but my rule of thumb is not to use them unless they would also get an appropriate <h#> heading.

Reply

Thanks a lot for the article, Adrian!

I am really confused when MDN says “the outline algorithm should not be used to convey document structure to users.” Does it mean that developers should not use html sectioning elements in their code to structure the html markup for better accessibility. Sorry, I am totally lost!

In response to SS. Reply

I added that to convey that the proposed Document Outline Algorithm as described in this post (where use of sectioning elements resets heading levels) should not be used, and that we should used the same best practices we have used for 20+ years.

Reply

Wow this article has been an exciting journey to read. Thanks you for your commitment to documenting this bizarre part of the web’s history that most people will not even notice.

Theo; . Permalink
Reply

If the document outline algorithm is now removed completely from the spec, is there any purpose to sectioning elements? Is there any context where they would be recommended, or used in preference to to tags? Aside from the fictional and now additionally removed document outline algorithm, do they have any semantic value?

Merce Lutzker; . Permalink
In response to Merce Lutzker. Reply

If the document outline algorithm is now removed completely from the spec, is there any purpose to sectioning elements?

Yes. Sectioning elements have worked for years to help users (screen reader users only so far) jump to regions or landmarks (see below) of a page. The WHATWG HTML specification addresses some of this in 4.3 Sections

Is there any context where they would be recommended, or used in preference to to tags?

I think you included some HTML elements in there, but did not escape the < and >, so I don’t know what they were. What I can say is that yes, the context for using them is marking up page landmarks — page header, footer, navigation, search, and main regions as well as rare cases of other named regions (landmarks). WebAIM has a very brief introduction to regions. TetraLogical goes into more detail on landmarks (I am using “named regions” interchangeably with “landmarks”). Léonie Watson demonstrates with a screen reader in this 2019 video.

Aside from the fictional and now additionally removed document outline algorithm, do they have any semantic value?

Again, without knowing which elements you mean, HTML regions have a ton of value. Named regions, <section> and <article> with accessible names, also have value but only if used sparingly.

In response to Adrian Roselli. Reply

Thank you! I’m, as you may be able to tell, a very naive dev at the moment, so this understanding helps me a lot. Rereading, my question seems sort of silly, but essentially, the big idea is that, while the sectioning elements added in HTML5 do not contribute to a document outline as originally specified, they are useful as landmarks?

Merce Lutzker; . Permalink
In response to Merce Lutzker. Reply

Yes.

Reply

There is no good reason to only limit yourself to a single H1 tag on your website. Instead, you should use H1, H2, etc. tags to correctly reflect the semantic structure of your content. If your content is such that it has more than one top-level heading (which is not uncommon), then you should use more than one H1 tag, as appropriate to correctly represent the semantic structure of your content.

In response to Ned. Reply

There is no good reason to only limit yourself to a single H1 tag on your website.

Users might disagree. SEO practitioners have definitely disagreed (but SEO, meh).

If your content is such that it has more than one top-level heading (which is not uncommon), then you should use more than one H1 tag, as appropriate to correctly represent the semantic structure of your content.

Generally you want your <h1> to correspond to the <title>. And since a single page should be about one primary topic, you should not need more than a single <h1>. At some level, this is about copywriting and not code.

In response to Adrian Roselli. Reply

The first h1 does not have to match the title of the web page, and there is no rule against using multiple h1’s when it makes semantic sense to do so.

E.g. A menu title should not be h2, because that would, semantically, connect it with the h1, which would be inaccurate. A menu is another h1, because it typically contains content unrelated to the article. E.g. First article should have an h1. If a page, for whatever reason, contains multiple articles, then they should each have their own h1. Of course you probably would not do that, but the example is valid and should demonstrate a point.

In response to JacobSeated. Reply

The first h1 does not have to match the title of the web page, […]

Correct, the <h1> is not required to match the <title>. Years (decades) of user testing by me and my contemporaries shows that is generally better for all users, however.

[…] and there is no rule against using multiple h1’s when it makes semantic sense to do so.

Also agreed, there is no rule against it. It’s just that it generally does not make “semantic” sense to do so. I put “semantic” in quotes because that term is doing a lot of work for what really comes down to, in this case, “thematic sense for a user within the context of the content on the page and extant on the site versus their experience and expectation from across the web.”

Reply

I am curious – this article now has some years on it, but is still referred to on MDN. Are there any news regarding the document outline algorithm? Have screen readers caught up?

In response to Brian. Reply

Brian, this article indeed has some years on it but was also updated in July 2022. That may account for why it is referenced on MDN. Also the fact that it covers the history of the document outline algorithm, conveys challenges inherent in the original proposal, reports when the document outline algorithm was cut from both W3C HTML and WHATWG HTML, and outlines how this is not a screen reader issue to address. Covering all that is outside the scope of MDN, but the material here can be useful for those who want to dive into it all.

Are there any news regarding the document outline algorithm?

Yes, it was removed from the WHATWG HTML specification on July 1, 2022.

Have screen readers caught up?

Screen readers were never responsible for implementing the document outline algorithm, as that comes from the browser. That being said, JAWS 13 tried it by overriding the browser and rolled it back because the algorithm proved to be unworkable.

You might note that each of those links points to content within this post, meaning it already answers your questions.

This link, however, points to a different post: Blaming Screen Readers ×5

Reply

Maybe I don’t understand what is talked about in this blogpost. I learnt HTML in school and HTML 5 was very new, Edge did not exist, Chrome was not well-known, so I learnt HTML 4 and we used numbered heading tags only.

This post made me curious and when I test the sectioning feature with only using headings in the MDN playground (Chromium), it will work like numbered headings . It also can be mixed, i.e. differently numbered headings in the same section are recognized as nested. Cool!

I find the archived outline algorithm hard to understand in the way it is presented, I did not bother understanding all of it, it sounds similar to a depth-first tree search, but I think, the behaviour as I see in the MDN playground could come close to what the original authors intended. I did not test it in other places though.

Christoph; . Permalink
Reply

I appreciate your article and the effort you’ve gone to to clarify all of this and counter some of the bad information that’s floating around.

That said, I’ll see your There Is No Document Outline Algorithm, and raise you My Home Page Does Not Have a Level One Heading. This is because my home page is not a document. I have been reading and reading on this subject and here is the set of rules I’ve come up with which I am apparently supposed to follow:

– each page should have exactly one level one heading
– the level one heading should reflect the overall purpose of the page itself
– the level one heading text should not be provided by the alt attribute of an image
– the level one heading should not be rendered invisible with CSS
– your web site must be accessible to non-sighted users, which means correctly using ancient h1 to h6 elements because browser vendors can’t be bothered to implement an algorithm for intelligent layouts
– your web site must be beautiful for sighted users
– if you use level two headings with no level one heading, the Earth will explode in a ball of fire
– every time you put more than one level one heading on a web page, a kitten dies

Trouble is, these rules are incoherent. They cannot all be followed simultaneously in many common cases (the preeminent example being an organization’s home page). Make enough rules, and eventually, everyone is just going to start behaving like a criminal.

I suppose I could put <h1>Information About My Business</h1> somewhere prominent on my web site. But you know what? I’m not gonna do that. Nobody is going to do that. Because it looks ridiculous. And why does it look ridiculous? Because documents and web pages are a Venn diagram, not a singular category. Some documents are not web pages; some web pages are not documents, and my business’ web site is not a document. It’s some combination of a business card, a mailbox, a phone book entry and a billboard. Hence, it doesn’t have a sensible document outline.

Now, of course I would still like to provide something approximating such an outline for use by people with screen readers, for example. As irritating as hN tags are, I will use them happily if it makes things more accessible for someone. What I won’t do is compromise the visual appeal of my web site for sighted users.

Forgive me if I’m jumping to conclusions here; perhaps you’re not on the “don’t hide your h1 tag with CSS and also don’t wrap it around your logo” train. But everyone else seems to be. Would you recommend doing one of these things? Or something else? Putting an h1 tag on my web site that is both semantically accurate and visible is simply out of the question.

Lincoln; . Permalink
Reply

Lincoln, those rules you have conveyed are more like potential constraints that may be imposed by a client or designer. A content-first approach can address a lot of these.

That being said, a couple notes:

  • Ideally the <h1> should correspond to the <title>
  • <h1><h6> are “ancient” because they were in the first cut of HTML and they work well; there really is no good reason for authors to still struggle with them given their ~30 years of documented use cases.

It may be because of my approach to web development (content first), but what you have outlined do not seem incoherent. They can generally be followed all at once, and when there are valid exceptions (often well documented) you can skip some of them.

As for your assertion that your business web site is not a document, each of those types of content you outlined are documents. Documents constrained by their prior medium but with different restrictions on the web.

Stating you won’t compromise the visual appeal of your site, however, presumes what you think looks good translates to usable for all your users, sighted or not. I maintain that a typical web site (excluding art sites, fan sites, weird sites, Homestar Runner, etc.) are there for an audience to use, not just as an edifice to design purity. As such, some user affordances are design constraints which should be embraced.

That being said, there are certainly cases where it is ok to break the rules — provided why they are broken is understood and has intent behind it. The home page of this site, for example, visually hides the <h1>.

Leave a Reply to J Redhead Cancel response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>