HTML Basics

This page contains all 9 sections of the beginners tutorial on one page. If you want the other version, click here.

Click on a heading below to jump straight to the section you want, or scroll down and read it all in order.

  1. Introduction
  2. Page Layout
  3. Logical Text Commands
  4. Text Formatting
  5. Links
  6. Lists
  7. Images
  8. Tables
  9. Special Characters

1: Introduction

This tutorial is intended for those people who have little to no HTML coding experience. If you already understand the concepts presented here, jump to the intermediate tutorial.

All the beginners pages use only the tags taught in the beginners tutorial, so doing a 'view source' will not present you with anything not taught in this tutorial and will instead provide you with a real, working example of everything I cover here.

The extreme basics: HTML stands for HyperText Markup Language. An HTML page is essentially plain text with 'tags' marking off sections for the browser. Tags are enclosed in angle brackets: < >. There is usually (with a few exceptions) a starting tag and an ending tag: they differ only by a '/' before the name of the tag (the browser has to know which tag it's ending, after all!). The text to be modified goes between the starting and ending tags.

You can format your 'source code' any way you like: the browser will treat words as words and any number of spaces as a single space. To modify how the browser displays the text, you need the tags.

top


2: Page Layout

The very first element you'll see if you view the source of this page is:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

This specifies exactly which HTML specification I'm using, and where the program rendering the page should look if it's not familiar with that specification. The specification itself is designed for a program or programmer, and it takes a lot of work to get through. Most web authors don't need to know it exactly. This code also tells the validator which version it should be checking your page against.

Strictly speaking, this tag isn't necessary. But if you don't want to choke the validator, you'll need it.

After that, you need a specific set of tags:

<HTML>
<HEAD>
<TITLE> Insert your page title here </TITLE>
</HEAD>
<BODY>
The body of the page - the part that's displayed in the browser window
- goes here.
</BODY>
</HTML>

To break that down:
HTML: this tag surrounds the entire document. It basically tells the browser that everything between them is to be interpreted as an HTML document.
HEAD: this tag surrounds the heading information of the document. This includes the title of the page as well as more advanced tags, stylesheet definitions, and scripts. Basically, things that don't go into the browser window.
TITLE: this tag surrounds the text that you want to be displayed in the title bar of your browser window. This must be between the two HEAD tags.
BODY: this tag surrounds the part of the documents that is to be rendered and displayed in the main browser window.

Any time you want to put a note in your page, but don't want it to be displayed (for example, if you want to note where to change something later), you can put a comment into the text. Of course, anyone can see the comments just by doing a 'view source', so don't put anything you want to keep private in the comments!

A comment looks like this:

<!-- this is a comment. Well, really it's not, because you can see it ;-) -->

The special beginning and ending indicate that this is not a normal tag. If it weren't for the '!--' and '--' at the beginning and end, the browser would try to interpret it as a tag. Now normally if a browser encounters a tag it doesn't understand, it just ignores it, but it's still not a good idea to put unknown tags in your code.

Go on, start up your favourite text editor and try it out. (Not word processor, but text editor. If what you use doesn't produce plain-ASCII text you'll have trouble. If you're on windows, notepad works just fine.)

Type anything at random - spaces and newlines anywhere - and notice when you view the page that the browser interprets any number of spaces or newlines as a single space. This makes it possible to format your document in a manner easy to maintain, while not affecting how it looks. If you do a 'view source' on this page, you'll see a couple of comments. One of them is in the middle of a paragraph, but since the browser doesn't care about excess spaces, it renders it as though I had not put anything there.

top


3: Logical Text Commands

Why are they called 'logical' text commands? Well, that's to distinguish them from text formatting commands. HTML is designed to mark your pages according to what the structure of the content is, not what you want it to look like. Because, you see, you don't know - you have no way of knowing - what kind of browser is going to be rendering your page. It might be Netscape or IE at one of several screen resolutions - all of which changes your page layout. It might be a text-to-speech browser, which doesn't want to know how pretty your page is, but needs to know how it's logically laid out.

The very first 'logical' (or 'structural' if you prefer) tag that you should learn is the "Paragraph" tag.

<P> paragraph text goes here </P>

Strictly speaking, you don't need the closing tag - the browser will assume that you've ended one paragraph when it sees the beginning tag of the next one. Again, it's a good idea, just for consistency's sake.

The 'P' tag usually renders the text flush left with an empty line between paragraphs. It is not recommended to force an indent of your paragraphs by putting special 'whitespace' characters. Would you really want to type

<P>&nbsp;&nbsp;&nbsp;

at the beginning of every paragraph, just to get a three-space indent? (&nbsp; is the 'non-breaking space' character. It has its uses, which I'll get to later)

If you want a line break but you don't want the empty line, you should use the BR tag. This forces a line break at the given spot. (There is no closing tag)

<P>Start a paragraph<BR>
The second line, but still in the same paragraph</P>

Now, the large titles at the top of my pages are structural elements, called 'headers'. I don't care how exactly the browser renders them, as long as it indicates that text's importance somehow. In graphical browsers, that's usually done by increasing the size depending on the header level. Other types of browsers do it differently. For example, lynx centers header one, puts header two flush left, and indents headers three through six by an increasing amount. In the browser you are using to read this page, the different header levels render as follows:

Header level one, the most important

Header level two: first subheading

Header level three

Header level four

Header level five
Header level six

The code used is:

<H1> Header one text </H1>

and so on, changing the number in the tag for each level. Don't forget to close the tag with the right number, or your entire document will be formatted as a header!

So go on, put some paragraphs and headers into your test document. Save it and see how it looks.

Now, there are some other logical text markup tags that some people use for formatting. While the EM tag usually renders as <EM>italics</EM>, in a text-to-speech browser may change the tone of the generated voice - while format tags probably won't cause this effect. However the word 'italics' is displayed is how your browser renders the 'emphasis' tag. On lynx on my computer it's purple, and on Netscape it's italic.

Another tag is STRONG which usually renders as <STRONG>boldface</STRONG>. (On lynx, this also renders as purple)

The other basic logical text tags are CITE, BLOCKQUOTE, and Q.

Now, a quote (the Q tag) and a citation (the CITE tag) are used for short quotes inside the flow of the paragraph. For example, the W3C says in the <CITE>'document type definition'</CITE> that it <Q>excludes the presentation attributes and elements</Q> in favour of style sheets.

Now, the quote shown didn't appear to have any changes made to the formatting in my copy of Netscape, but lynx put double quotes around it. The citation showed up in Netscape as italics, and in lynx as purple. I can't test IE, because it doesn't run on Linux.

Alternatively, if you have a long quotation that you want to separate a bit more clearly from the rest of the text, you can use the BLOCKQUOTE tag.

<BLOCKQUOTE> Now if this were actually a quotation, I would have included a CITE tag either before or after this block, to give credit where credit is due. But this is just me rambling to fill up space, so no CITE is needed. </BLOCKQUOTE>

So now, if you look at the source to this page, you'll see that it's the BLOCKQUOTE tags that put in the indent there.

And there's one last logical text tag that I have to include, since I promised that I wouldn't use any tags in the tutorial that I don't describe - the CODE tag. I used this to show what was HTML code. On Netscape, it's displayed as a monospaced font (the 'i' and 'm' are the same width) and in lynx there's no special indication.

top


4: Text Formatting

There are actually only a few tags that affect only formatting: TT, I, B, BIG, and SMALL. Since you've reached this far you probably know that you need to surround the text you want affected like <TT>this for 'teletype' or monospaced font</TT>, <I>this for italic font</I>, <B>this for bold font</B>, <BIG>this for a bigger font</BIG>, <SMALL>and this for a smaller font.</SMALL>

Naturally, none of that rendered properly in lynx, because it has one size and one face for its fonts. Netscape did it properly, though. For the most part, you should probably use the logical tags rather than the formatting tags - EM and STRONG, CODE instead of TT, and so on (well, if you're formatting code with that tag). They'll usually look the same, and with the logical tags, a non-visual browser will know if the font changes are associated with anything important, or if they're just there to look pretty.

And there's one more tag used in formatting - although it doesn't have anything to do with text. That's the horizontal rule, or HR tag. It draws a horizontal line across your screen, like so:

<HR>


There have traditionally been a whole lot of things you could do with the HR tag - adjust the width and thickness, to name a few. That's all handled by stylesheets now, though.

You're probably wondering where the FONT tag is, aren't you? No, it's not in the intermediate or advanced section, because formatting tags like that are no longer a part of the HTML specification. To change fonts, you should use style sheets - this separates the logical markup from the formatting markup, and makes it a lot easier to maintain the pages. I'm covering beginner's style sheets in the intermediate section.

top


5: Links

In this section, there is only one tag to learn - plus a few elements for that tag, and a bit of theory. The A tag (for anchor) is what defines a hyperlink. One end of the hyperlink is the 'source' anchor. You see this as a link that you can click on. The other end of the hyperlink is either a file or another anchor. If it's a file, all you have to do is name it, and the browser and server will take care of everything. However, you can also create destination anchors at certain points inside a document, then point a source anchor at it. When you click on the source anchor, it will load the destination page (if necessary) and put you at the precise spot in the document that contains the destination anchor.

To define a source anchor, you need to include the 'href' element with the name of the destination. A link to the W3C would look like: <A href="http://www.w3.org/"> click here for the W3C</A>. Notice that it is only in the opening tag that you need to specify the elements - the closing tag only has an 'A' in it. This is true of all tags - most of them have optional elements, all of which are put in the opening tag.

If the destination is in the same directory as the source, your 'href' is a lot simpler - it's just the name of the file. I use this all over the place in these pages, because all the tutorial pages are in the same directory, plus the one directory for images. A link to another page in the same directory would look like: <A href="index.html"> click here for the tutorial index</A>

If your destination is a named anchor inside a file, there are two steps to carry out, instead of the one for each case listed above. First, you have to create the named destination anchor: <A name="somedestination"> this is the destination </A>. Then, you have to create the source anchor: <A href="filename.html#somedestination"> click here to go to the destination </A>.

This example assumes that your destination is in a different file in the same directory. If the destination is in the same file, you can leave off the 'filename.html' part and just put '#somedestination'. This method can be seen in use in the 'all on one page' beginners tutorial, as a table of contents. If the destination is not in the same directory, you can use the first method - the full domain name - and append '#somedestination' after the end of the filename.

top


6: Lists

There are two main kinds of lists that I'll be introducing here: ordered lists and unordered lists. Ordered lists display a number before each list item, and unordered lists display some sort of bullet, star, or dash before each list item.

Since this is best displayed using an example, I will jump straight to a demonstration:

A short unordered list
<UL>
<LI> list item number one
<LI> list item number two
<LI> list item number three
</UL>

A short unordered list

A short ordered list
<OL>
<LI> list item number one
<LI> list item number two
<LI> list item number three
</OL>

A short ordered list

A short ordered list

  1. list item number one
  2. list item number two
  3. list item number three

You may have noticed that the tags are mnemonics - OL for Ordered List, and so on. An awful lot of the tags are like that, so it's really not cryptic.

Naturally, you can put anything you want to as a list item. I used links as list items in the index for this tutorial, for example. If you're not sure how I did something, look at the source for the page! Because I'm only using tags I cover in the tutorial, you should be able to figure out what I'm doing pretty quick. So keep playing with your test HTML page. Come on, add a list or three to it. Try nesting them!

Testing nested links
<OL> Testing nested lists.
  <LI> Top level item one
  <LI> Top level item two
  <UL>
    <LI> nested item one - a sub-list of item two
    <LI> nested item two
  </UL>
  <LI> Top level item three
</OL>

Testing nested lists

  1. Top level item one
  2. Top level item two
  3. Top level item three

Just make sure to keep your start and end tags organised. This is where formatting the HTML source comes in handy. You may notice that in both my sample code and in my actual source, I indent each successive level on the list. This is so that it's immediately obvious at a glance which item belongs where.

top


7: Images

Images certainly make a web page prettier than plain text. But you have to balance a few things when putting images on your page. As I've mentioned before, some people use text-to-speech browsers. Some people use text-only browsers (like lynx). Some people have really slow modems. Some people turn off automatic image loading. At any rate, you have to make sure that clicking on pretty pictures isn't the only way to get around. A graphical toolbar with a text one right underneath it is a decent workaround to one of these issues.

The IMG tag is also one of the few that does not have a closing tag. Makes sense, because it's an item in itself, and not modifying the text it surrounds like a lot of other tags do. Like the A tag, there are elements required inside the tag. A typical use is as follows:

<IMG src="images/flag.gif" alt="canadian flag">

Which loads:

canadian flag

The src element stands for 'source' - the file containing the image. The alt element stands for 'alternate' - what the browser displays if it is incapable of using the picture. Always remember the alt element!

top


8: Tables

HTML tables stretch and shrink to fit their contents. There is rarely need to specify how big the table should be - especially since you don't know the screen size of the person viewing your page.

Tables are very easy to define. You simply put entries for the rows, and within each row you put entries for the data. The browser formats the table to fit what you put into it. Pretty simple, eh? Ok, example time:

row 1 col 1row 1 col 2row 1 col 3
rows 2,3 col 1row 2 col 2row 2 col 3
row 3 cols 2,3

So how did I do that? Well, here's the code:

<TABLE border=3>
<TR><TD>row 1 col 1</TD><TD>row 1 col 2</TD><TD>row 1 col 3</TD></TR>
<TR><TD rowspan=2>rows 2,3 col 1</TD><TD>row 2 col 2</TD><TD>row 2 col 3</TD></TR>
<TR><TD colspan=2>row 3 cols 2,3</TD></TR>
</TABLE>

And now to dissect that code. First, the TABLE tags. You need to define where the table starts and ends. Pretty obvious so far. There's also the element 'border' in the TABLE tag - this lets you specify the number of pixels to use as a border around your table. I put 3 in there so you could see how the table was laid out. If you want the borders to be invisible, put 0. I'd recommend putting a non-zero value in at first, and when you have all the rows and columns set up the way you want them, then you can change the border width. Otherwise it's a pain to figure out where you went wrong, because you can't see...

Second, each row is enclosed by TR tags (TR = table row). This tells the browser when to start a new row.

Third, inside each row are multiple TD tags (TD = table data). These define each cell. Between the starting and ending TD tags for each cell, you put the data you want displayed in the cell.

So far so good? Play with that for a bit.

Inside a TD tag, you can put the rowspan and colspan elements - one, both, or neither. They do exactly as they say - cause the cell in question to span multiple rows or columns. In the example above, because there is a cell which spans two rows starting in row 2, the first cell defined in row 3 isn't in column 1 but in column 2. The browser knows that row 3 column 1 is already taken, by the cell above. Same with colspan: the browser knows that column 3 is used by the cell starting in column 2 (row 3) so you don't have to specify anything for there, either.

But lynx apparently doesn't support rowspan and colspan, because the table wasn't rendered properly. That right there is a good argument against using tables for layout - when someone with lynx or a non-graphical browser (like that text-to-speech browser I keep talking about) goes to a page like that, it won't know what to make of it, and will probably be extra confusing for the reader.

top


9: Special Characters

There are lots of special characters that can be used in an HTML document that aren't available directly on the keyboard. Then there are some that are on the keyboard, but have special meaning in HTML (like < and >). To display these characters anyways, you have to type a code: < is &lt; and > is &gt;. Of course, to display that, I had to use the code for &, which is &amp;.

Anyhow, the special characters all start with an & and end with a ;. In between, there's a short mnemonic that describes the character: amp for the ampersand, lt for 'less than', gt for 'greater than', nbsp for 'non-breaking space'. The nbsp character is especially useful if you need a space but don't want it to ever make a line break there. For example, if you mention 'Mr. John Q. Public', you probably don't want a line break right after the 'Mr.' so you'd type the name 'Mr.&nbsp;John Q. Public'.

A list off the top of my head of special characters:

< &lt;
> &gt;
&&amp;
ç&ccedil;
é&eacute;
è&egrave;
"&quot;
å&aring;
ñ&ntilde;
æ&aelig;
à&agrave;
á&aacute;
ì&igrave;
í&iacute;

and so on. Some browsers are broken with respect to special characters, unfortunately. This is a problem with the browser, not the specification! Alternatively, there are number codes that correspond to these, but they're harder to remember. You can look those up as well, if you're so inclined.

By the way, don't worry about memorising these. If you need them, you can go look them up. That's what I do.

top


Janra's homepage | HTML Tutorial index | Beginners Tutorial index
Intermediate Tutorial - full text | Advanced Tutorial - full text

The OpenNIC You are using the legacy domain name, paradox.homeip.net, to access this site. This site is also available through the OpenNIC system as www.paradox.null. For more information, please visit The OpenNIC web site. To learn how to configure your computer for OpenDNS, please click here, or see this page to learn how to configure your name server.

This page was created by janra on 18 February, 2000.