The Paradox

The World Wide Web

The world wide web is a great way to find information, misinformation, and entertainment, presented to you by your web browser (mostly either Netscape or IE) and fetched from some computer that could be anywhere in the world. But how does your browser know where to go? It's actually not much more complicated than looking up someone's phone number in the telephone directory.

What you see...

When you go to a web page, you generally see something formatted, probably colourful, most likely with pictures and sometimes with fancy fonts, animations, and all that. But what the computer sees is quite different. Behind all those fancy fonts is plain old unformatted text, with a bunch of garbage inserted here and there. (You can take a look at what the computer sees by going up to the menu: View -> Page Source in Netscape and something similar in IE.) The browser reads the inserted 'tags' and interprets them as instructions on how to display the page - what font to use, where to put paragraph breaks, and so on.

The stuff mixed in with the text of the page is called HTML, or HyperText Markup Language. What this means is that the tags 'mark' certain areas of text and turn them into 'hypertext'. Now, instead of being a book that you read on your computer screen, if you see something that interests you and it's a 'hyperlink', you can select it and go read more about that subject immediately, instead of just adding it to your list of things to try and find at the library the next time you visit.

Originally, hypertext was used on university networks for scientific papers and the like, then it was used by students at those universities to put up personal pages, then eventually it was picked up by companies who treat it like an animated brochure. Along the way, it picked up a lot of extra features; fonts, pictures, tables, and the like, and it can be used to make very beautiful or very horrible pages.

How you get it...

When you select a hyperlink in your web browser, you tell your browser to go fetch that file and format it for you. Now, in order for your browser to fetch the file, it needs to know three things: where to go, what to ask for, and how to ask. To display the file, it needs to know how to read HTML, but I'm not going to talk about that here. Maybe later.

Where to go fetch it from

The link you click on is mostly taken up by the server name and path of the page you want. The server name is the name of the computer the page is on, and the path specifies where on that computer the page is stored. As an example, let's use the address of this page. Look up in the address bar of your browser; you should see something that looks like this: http://www.paradox.null/howitworks/theweb.html

The first four letters tell your browser what language to speak when it asks for the page; I'll talk about that shortly. The next part, 'www.paradox.null' is the name of the computer. Now first, your browser has to find that computer. But, computers actually find each other using a series of four numbers, so your browser has to translate it. To do this it contacts a name server, a computer that contains lists of domain names and the numeric addresses they point to. Now your browser would ask the name server, 'hey, who is this www.paradox.null guy, and where do I find it?'

But one of the beauties of the internet is that no one name server has to know every single address. Most likely, the first name server your browser asks will reply, 'I don't know, but that other name server over there knows about .null.' .null, in this case, is the most generic, or 'top', level of the domain name. So your browser will go over to this other name server, and ask it the same question: 'hey, who is this www.paradox.null guy, and where do I find it?' That name server also replies 'I don't know, but that other name server over there knows about .paradox.null.' Now we've gotten into the second level, but we're still not there yet. So your browser heads over to yet another name server, and asks it the same question. This time, it replies 'yeah, I know that one; it's at 142.179.14.97' and your browser happily heads off to contact that computer - rather like finally finding the phone number. Once you dial the phone, the system takes care of making sure you're connected to the right phone, and it's the same with the internet: the system makes sure your computer connects to the right computer, as long as it has the right numeric address.

Of course, it doesn't actually talk to the name server like that; it has to ask a certain way, and get responses in a certain way, otherwise there is confusion.

Once your browser knows what computer to ask, it has to know what to ask for. This is the part after the computer's name: /howitworks/theweb.html. That means that it's in the directory 'howitworks', and the file is called 'theweb.html'.

How to ask for it

But your browser can't just ask any old way. Computers aren't smart; they respond to questions in a form they're used to and get confused on anything else. The 'language' your browser uses to ask for pages is 'http' or hypertext transfer protocol. The language is actually amazingly simple: your browser simply connects to the server, then says 'GET path'. So for this page, your browser could have said 'GET /howitworks/theweb.html'.

But your browser actually sends a little bit more information - where you came from, called the 'Referer' (apparently when writing up the standards, proofreading is optional) and the type of browser you're using, called the 'User-Agent'. So often what it sends looks something like:


GET /howitworks/theweb.html HTTP/1.1
Referer:http://www.paradox.null/howitworks/
User-Agent:Mozilla/3.01Gold (Macintosh; I; 68K)

(That User-Agent string is what my computer sends; they're very easy to spoof. I don't actually use an old Mac and an old Netscape, I use junkbuster.)

When the server gets that, it goes looking for the file your browser asked for, then returns it in one big stream of letters, starting with a short status report, the date, and information on the file that follows:


HTTP/1.1 200 OK
Date: Mon, 22 Jan 2001 07:14:23 GMT
Server: Apache/1.3.12 (Unix) mod_perl/1.24
Last-Modified: Sat, 20 Jan 2001 01:39:27 GMT
ETag: "170921-338-3a68ec4f"
Accept-Ranges: bytes
Content-Length: 824
Connection: close
Content-Type: text/html

Followed by an empty line and then the page itself. It actually transmits everything in a format a human can read, instead of some coded gibberish that would take less space but make it harder for programmers to figure out what's wrong. (At first glance it may look like gibberish anyhow, but each line has the name of a piece of information (such as Date:) then the value (such as 'Mon, 22 Jan 2001 07:14:23 GMT') so it's actually pretty easy to read.) For example, the date: many computers store dates as the number of seconds since a certain date (Midnight Jan 1, 1970), but that's a whole lot harder for a human to read than the expanded form it actually sends. Email is the same way.

This essay is copyrighted, and may be reproduced under the following Creative Commons license:
[Attribution required][No derivative works]
The OpenNIC You are using the legacy domain name, paradox.homeip.net, to access this site. This site is also available through the OpenNIC system as www.paradox.null. For more information, please visit The OpenNIC web site. To learn how to configure your computer for OpenDNS, please click here, or see this page to learn how to configure your name server.

This page was last modified Wednesday March 08, 2006