The basics of HTML and CSS are often the very first lessons in “teach yourself to code” tutorials. This makes sense, as these two languages build the basic visuals of a website and are much easier for a beginner to wrap their head around than, say, creating an array in ruby. They teach just enough to get yourself up and running so that you can move onto the fund stuff, but rarely move beyond the simplest of elements, much less teach us why semantic HTML is important. It makes sense to start here, but the fact that these tutorials typically only teach very basic HTML, combined with the fact that many developers never return to these “basics,” means that our knowledge stays, well, basic. I hate to admit that I’m one of these people - I’d rather dedicate my time to learning cool things like flex box (with my favorite frog-based game) and beautiful CSS gradients than learn about all the elements HTML5 gives us beyond header, footer and section. But alas, my mentor here at 8th Light has (I suspect) caught me using one too many div elements, and it’s high time I went back and augmented my HTML knowledge, which at this point has been the very minimum needed to hand-code a website. This might fly at some companies, but 8th Light is all about doing things right and not cutting corners, meaning it’s time for me to give up my constant use of divs and learn to do things properly now as an apprentice, before I ever have the chance to subject a client to an abundance of div tags.

Semantic HTML

Semantics is the study of the meaning of words and phrases in language, and semantic elements are simply elements that have a meaning. In HTML, a semantic element describes its meaning. For example, the header element is describing itself - “header” is both its name and its meaning. Alternatively, the element div is non-semantic, because “div” is not a word that tells us anything about its content. span is the other commonly used non-semantic HTML element.

Why Bother with Semantic HTML?

Using proper semantic elements in our HTML is important for a lot of reasons. First, it makes it easier for us to go back to and change things later. Also, since it’s easier to understand, a new person on the project can pick up the code they can learn it much faster. Second, it helps us when writing our CSS styles - we can change the look of the site without recoding all of the HTML.

In addition to helping human developers have an easier job, semantic HTML is also useful for browsers. Browser looks for semantic HTML, and when it finds it, two important things happen: first, visually impaired people are able to have speech browsers read the page to them properly. Using semantic HTML aids in accessibility. Second, search engines understand what the content is about and are able to rank your site more accurately. Semantic HTML helps increase SEO.

Non-semantic HTML

Non-semantic HTML elements include div and span.

div is short for “division,” as in, it divides your elements into chunks of code. A div creates a block-line element (so essentially there is a line-break before and after it) while span tags are inline and used for a small chunk of HTML inside a line (for example, a few words in a paragraph). I was unable to figure out the official meaning behind the term “span,” but think it might have to do with the literal meaning of “span,” which is “the full extent of something from end-to-end, as in “the bridge spans a quarter mile over the river.” This might make sense considering a span element is in-line and spans exactly the area we assign it to (as opposed to the block-element div, which has a line-break on either end, meaning it covers a bit more than the actual area we assign it to).

I made a quick codepen to show the advantage of using proper semantic HTML over divs. Other than being the right way to do things, better for screen readers and therefore accessibility, and SEO-aiding, it also results in less code, which is always better.

Categories of Semantic HTML

Basic Elements

html

The html element is the room of the document. All other elements have to be decedents of this element.

Document Metadata

Metadata contains information about the page (as opposed to everything that comes after in this list, which actually make something appear on the screen). Metadata includes information about styles, scripts, and data that help browsers and search engines render and use the page.

base

The base element specifies the base URL to use for all the relative URLs that are contained within a document. There’s only one base element per document.

head

The head element is used to hold the rest of the metadata within it, including the page’s title and links to its style sheets, scripts, and google fonts.

link

The link element shows relationships between the current document and an external resource and is what is commonly used to link to style sheets. This element is explained further later on.

title

The title element defines the title of the document, which is shown in the browser bar or page’s tab. The title can only contain text.

style

The style element contains the style information for a document and is expected to link to a page written in CSS.

meta

The meta element is for any metadata that can’t be represented in one of the other meta-related elements listed above. This often includes the line meta http-equiv="Content-Type" content="text/html; charset=UTF-8” which designates the character set, as well as the line meta name="viewport" content="width=device-width, initial-scale=1.0” which is necessary for CSS media queries (which are used for responsive design styles LINK) to be recognized.

Sectioning

Content section semantic HTML tags include body, header, footer, nav, section, h1, h2, h3, h4, h5, h6, article, aside and address.

These are very straightforward. The body tag encompasses the entire body of your document and is second only to the html tag. The header element specifies a header for your document or section, and the footer element species the footer. The section tag groups together a thematic group. h1, h2, h3, h4, h5, h6 are headings, with h1 being the most prominent, with the largest default style, with each heading decreasing in importance and size after h1. The nav element is short for “navigation” and defines a block of navigation links that will take the user to other main pages. An aside element definite content aside from the content it’s placed in. The information in an aside should be related to its surrounding content. The article tag specifies independent, self-contained content, like a news article, blog post, or comment. Finally, the address element supplies contact information for its nearest article or body ancestor.

While this is all pretty obvious, there are a few things that often get skipped over in basic HTML tutorials. First, it’s a common misconception that you should only use header, footer and aside once per page but that is not the case. W3schools states: “the header element should be used as a container for introductory content. You can have several header elements in one document.” They provide the following example:

header example

Also, the aside element is not only used for sidebars (though this example is so overused that you’d be forgiven for thinking this was the case). The aside tag is for a section of the page that has content that is tangentially related to the rest, but should be separate from that content. Examples outside of a sidebar include an author biography, profile information, and related links.

Finally, the article tag often gets replaced with the section tag due to misunderstanding about what the article tag is actually for, as well as confusion with these two tags in general. article is for more than just what we think of as an “article” (like a news article) but is for any independent, self-contained content, which also includes things like a blog post or a comment on an article or blogpost. On the other hand, section is for grouping distinct sections of content or functionality. While the exact difference is heavily debated, my understanding is that section is broader than article. This is a good example of how a section could hold the all the blog content, with an article used for each blog.

article example

Grouping Content

Again, fair few of these were new to me, despite the fact that I took a (paid, in-person) front-end dev class where I coded a few websites by hand. Lots of these get covered in the basic online tutorials, some of them don’t, and there are lots of interesting rules and abilities that the really basic ones have that the casual developer might not know about. Let’s dig into it.

p

A p tag makes a paragraph. This might be the most well-known content tag out there. But did you know that this is more structural than logical? List elements (ol and ul) cannot be the children of a p element. So if we had this code: p and ul example

only the first line, “I need groceries” would be red. In order for the entire thing to be red, we would have to replace the p elements with a div element. Alternatively, we could close the p tags and then write our CSS to style all the elements, but that would require more code.

It’s also important to remember that we shouldn’t use a p element if there’s something else that is more specific that fits, for example the address element to write an address.

hr

The hr tag is use to represent a paragraph-level thematic break, such as a transition to a new topic in a reference book or a scene change in a story. It results in a line across the width of the page and a blank line on either side. It does not have a closing tag.

This example:

hr example

Results in this:

hr example

The hr tag makes that line. Most people would probably want to create a border that they could style to the desired width and color, but this is also an option. Here’s what the line really looks like (I zoomed way in to be able to see it properly):

hr rendered zoomed in

pre

The pre tag represents a block of preformatted text. Essentially, you have have things like paragraphs and spacing that you would have in a text editor like Pages or Word be preserved in your code as if it were a standard text editor. For example, the spacing and layout of the words in the following poem is an important part of the poem and needs to be preserved. Instead of using all sorts of CSS to get it to look like this, we can just copy and paste into Sublime (my preferred text editor for writing code) surrounded by the pre tags. (This is also useful when you’re writing a code example and want the example to maintain the spacing/indenting rules of that language.) The html code looks like this (I simply copied text from my Pages document into Sublime, which kept all the weird spacing):

pre example

Results in this:

pre rendered

blockquote and cite

The blockquote tag represents a section that is quoted from another source. (Don’t get this confused with q tag, which you can read about in the next section.) The content of a blockquote must be quoted from another source, which can be cited with cite tag if possible. The cite tag represents the title of a work. An example of both together is:

blockquote example

Only works can be cited, and people are not works, so the cite tag in this case goes around the name of the book, not the author. It’s also important to note that a citation is not a quote. (Read about quotes in the q element, which are inline rather than block quotes, in the next section.) If you’re creating quote symbols in spoken text, you wouldn’t cite the quote. For example, this is wrong:

wrong cite examples

This is correct:

correct cite example

figure and figcaption

The best way to think about the figure tag is to think about a text book or manual that has different figures throughout the text. The figure could be an image, chart, graph, or some other type of information. Usually the author will discuss a top and write something like “see figure 4.1.” Figure 4.1 will reference this writing and also have a caption, like “a variety of method cards.” The figure is the figure and the caption is the figcaption.

figures elements are usually related to the surrounding flow and is self-contained. A figure should be self-contained so that it can easily be moved around, from a page to its own dedicated page to an appendix. Also, it’s recommended that figure elements be given their own name so that they can easily be referenced (“see figure 4.1”) as opposed to identified by where they are placed in the page (“see the figure to the right) in case the figure gets moved later. The first figcaption child represents the caption of the figure. If there’s no figcaption, there’s no caption.

For example:

figure and figcaption example

ol, ul and li

The ol element stands for “ordered list,” as in a list that is numbered, such as the steps in an experiment. ul stands for “unordered list,” such as a grocery list. li stands for “list item” and is the child of ol and ul. The li element's end tags can be omitted if the li element is followed by another li element or if there is no more content in the parent element.

An unordered list results in a list with bullet points. For example, this code:

ul example

Results with:

ul rendered

By default, the listed items in an ol element will be numbered starting with one. For example, this code:

ol example

Has this numbered result:

ol rendered

There is also the option to “reverse” the list. For example, if you were making a list of the top 5 best shows on Netflix right now, and wanted the list to start with 5 and end with 1, you could have:

ol reversed example

to result in:

ol reversed rendered

For even more control, you have give your li elements a value. The following code has the same result as the image above:

ol value example

dl, dt and dd

Since these html elements are not commonly taught, it’s best to just say they are for creating certain types of lists and start with an example.

The code:

dl dt dd example

Results in:

dl dt dd rendered

dl stands for “description list” and this element represents an list of things that are associated.

The dt element represents a term, name, or part of a description within a description list (the dl element). I couldn’t confirm this, but think that dt stands for “description term.”

The dd element represents a description, definition, or value.

The dl requires and opening and closing tag. A dt element, on the other hand, doesn’t need an end tag if it’s immediately followed by another dt element or by a dd element. The dd element’s doesn’t need a closing tag if it’s immediately followed by another dd element or by a dt element.

main

The main element is used to signify what the main content of a page is. It’s different from section, article and nav in that it’s not sectioning content. This means that it doesn’t contribute to the document outline. What it does do is help screen readers understand where the main content is. It came about because people thought that since we had a header, footer and aside, we should also have something that represents the main part of the content. Before this, people often created a div and gave it a class of “main” or “content.”

Text level semantics

a

The a element is most commonly seen immediately followed by the href attribute to create a link. In that case, the a is the hypertext anchor. The a element is also seen used as a placeholder for where a link could go later or in navigation link of the current page.

The a element can wrap around entire paragraphs, lists, tables, etc, as well as entire sections, so long as there aren’t any buttons, links or other interactive content within it.

em and i

The em element represents emphasis in content. The placement of emphasis on certain words in a sentence can change that sentence’s meaning, and the em tag achieves that stress in HTML. For example:

em example

puts the emphasis on “You’re.”

The em element is not the same as generic italics. For italics, i is used. The i element represents a span of text that carries a different mood or voice or indicates a different type of text, such as taxonomic designation, a technical term, a word or phrase from another language, or a thought.

Often, a class attribute is used on the i element to identify why the element is being used (for example, a word in a different language as opposed to a taxonomic term). This is done so that if it needs too be changed at a later date, the author doesn’t have to search the entire document and annotate each use.

Example of i with a class:

i example

Although both em and i result in the same style of slanted text, it’s important to use the right element, and a class name if necessary, for screen readers and for ease of change later on.

strong and b

Much like em and i, strong is to indicate strong importance, seriousness or urgency. On the other hand, the b element is used to represent a span of text to which attention needs to be drawn for utilitarian purposes but does not convey any extra importance. An example of this might be a keyword in a text. The b element is also often used with a class to identify why the element is being used and make changes later on easier. Again, although they both result in what we think of as “bold” text, it’s important to use whichever better fits the purpose for screen readers.

small

The small element represents small print and other side comments that should have a smaller text than the text around it.

s

The s element represents content that is no longer relevant or accurate, and results in a strikethrough of the text.

q

The q element represented quoted text. However, using q elements to mark up quotations is optional if you’d rather use explicit quotation punctuation. The q element should not be used for times when quotations would be used for something other than a quote, for example when someone is being sarcastic.

dfn

The dfm element represents the defining instance of a term. For example,

dfn example

In this case, “Irregardless” becomes italicized.

abbr

The abbr element represents an abbreviation or acronym. The explanation of the abbreviation/aconym is optional and is done with the title attribute. Doing this results in the explanation appearing the use hovers their mouse over the word. For example,

abbr-example

Results in a small box with the words “World Health Organization” appearing when the user hovers over “WHO”. (Unfortuantely, this was impossible to get a screenshot of, but run this through on your machine to see exactly what happens.)

ruby, rt and rp

The ruby tag specifies a ruby annotation. A ruby annotation is a small extra text that’s attached to the main text to explain its pronunciation or meaning. The rt element gives the information and rp (which is optional) defines what to show for browsers that don’t support ruby annotations. It’s common in Japanese text.

In this example, the character needs and explanation, so it is put in the rt tags.

ruby example

When rendered, it looks like this:

ruby rendered

There are lots of rules about how to handle things like compound words, nesting, ancestry, and other things, so if you’re going to use this, be sure to look at its HTML spec.

data

The data element represents its contents, along with a form of those contents in the value attribute that is machine-readable. The value attribute must be present and must be a representation of the element’s contents in a machine-readable format.

The data element is used when the computer needs some information but the user doesn’t. For example:

data example

In this example, the computer can understand the value given (which is the CPU code of the products). The user doesn’t necessarily want to see this, but it needs to be there so that the computer can do other things with it.

time

The time element is used to mark a time in the browser. It doesn’t show up as anything special to users (but that doesn’t mean it’s not still important!) The content in a time element can include dates, times, time zones and durations. We use the “datetime” attribute then specifying a date.

For example:

time example

code

The code element changes the way a word looks to indicate that it’s code and not normal text. It represents a fragment of computer code. (Note that there is no formal way to specify which language the code is written in.)

For example, writing:

code example

Creates this result:

code rendered

The “time” word looks different because it was in the code element, and this difference signifies to the user that this is fragment from a programming language.

var

The var element represents a variable. It could be a variable in a mathematical expression or programming language, or an identifier representing a constant, a symbol for a physical quantity, a function parameter or just a placeholder for a variable. Text inside var tags appear italicized to the user.

samp

The samp element represents sample output from a computer program. For example:

samp example

Results in:

samp rendered

kbd

The kbd variable is short for “keyboard” and represents keyboard input. This is useful when giving directions. The HTML spec gives the following example:

kbd example

Just like with the code element, the text inside of kbd has a different style when rendered:

kdb rendered

sub and sup

The sub element represents a subscript and the sup element represents a superscript.

For example,

sub and sup example

renders to:

sup and sub rendered u

The u element represents text with non-textual annotation. This could include labeling a piece of text as a proper nam in Chinese or labeling the text as misspelt. The u element makes text underlined. It’s important to not use this when a user might think that the underlined text is a link, as links are often underlined to show that they are clickable. Also, there’s usually some other markup element that’s more appropriate to use than u, such as em for emphasis. The u element is rarely used.

mark

The mark element represents text that is marked (highlighted) for reference purposes. It results in the text having a yellow background. For example,

mark example

Renders as:

code rendered

bdi and bdo

The bdi element represents text that needs to be isolated from its surroundings for the purposes of bidirectional text formatting. This is useful when working with user-generated content where we can’t be sure of the directionality, perhaps because it could be written in a language with a different character set. The example from the HTML spec uses a name written in Arabic:

bdi example

Because the bdi element was used, these names and post numbers output to an unordered list with bullet points as expected. But if the bdi tags weren’t there, the number “3” would jump to be next to the word “User.” This happens in the code, before anything is rendered to the user’s screen.

The bdo element allows the author to override the Unicode bidirectional algorithm by explicitly specifying a direction override for its children. This is done by specifying the “dir” attribute to equal “ltr” for left-to-right or “rtl” for right to left.

br

A br element represents a line break. It should only be used then the text actually calls for a line break, such as poems and addresses. It’s often used to create more space in styles, but the correct way to do this would be to use CSS to specify things like margin and padding. It should not be used for separating thematic groups in a paragraph.

wbr

The wbr represents a line break opportunity, as opposed to an actual line break. If you wanted to have a bunch of words all be together without spacing but still wanted them to be able to wrap in a readable fashion, you would use this.

For example,

wbr example

renders as

wbr rendered

but leaves the opportunities for line breaks to occur if the screen size is small enough to need it.

Links and Link Types

In HTML, link types indicate the relationship between two documents. One links to the other using a link element, and a element or an area element.

link

The link element specifies the relationship between the current document and external one. One of the most popular ways to use this is to import a CSS stylesheet to an index.html page. For example:

link rel="stylesheet" href=“CSS/style.css"

The “rel” (relationship) value in this case is “stylesheet” but it can be set to a lot of different values. For example, a real of “help” would indicate that the link is leading to a resource about the whole page. A value of “author” indicated that the page leads to information about the author or how to contact her and “license” leads to licensing information.

a

The a element is the anchor element, and defines a hyperlink to a location on any other page on the web or to a location on the same page. It can also be used to create an anchor point at various spots in a page so that links aren’t limited to connecting only to the top of the page. The HTML specs say this is obsolete, but with the current trend of “endless scroll” style pages, they’ve made a resurgence.

There are several attributes that can be given to the a element. The “href” attribute is required for linking to external pages. The “target” attribute is commonly used and dictates where to display the linked source. When “target” is set to “_blank” the link will open in a new tab.

area

The area element is less commonly used than link and a but still worth knowing about. It creates a clickable “hotspot” on an image. This must be done within the map element. On an image, an attribute called “usemap” creates an ID, and that ID is set with the coordinates of where in the image will be clickable. W3Schools has a great example of this with an image of a planet in space. The planet is clickable while the black space around it is not. This is done by setting the “usemap” attribute to an ID name, and then giving the ID a specific area of the image using the area element.

Conclusion

At the end of the day, learning all the nitty gritty of semantic HTML is dull, but extremely important. First because it’s key to screen readers and accessibility for people who are low-vision/blind, second because of factors like SEO, and third because if you want to write quality, sustainable code, you have to do things by the book. After spending a few days pouring over the HTML specs, I have a piece of advice: aid your understanding of some of the weirder elements with W3Schools. Towards the end of writing this post I had one of their “Try it!” windows open and was dumping the example from the specs into it to see what happened. A shortcoming of the specs is that they don’t actually show you what en example results in. The mark element is examplained as “marked or highlighted” but I didn’t realize until I actually rendered the example that the marked area literally has a bright yellow background. A detailed and technical explanation is all good and well, but be sure to actually run their examples or you might not actually understand what’s going on. Also, watch for some strangely political examples. There was a pro-atheism thread through the spec examples, as well as one that seems to be throwing shade at W3Schools. Regardless of all of this, understanding that HTML is a lot more than just headers and p tags is important, and spending a day reading the specs and testing out the examples will make you a much more solid front end developer.

</p>