Follow on Twitter Follow on YouTube Follow on Facebook My books, on Amazon Subscribe via email

How does the Internet work?

Robert Niles
By
Published: January 8, 2013 at 10:51 AM (MST)
The following is an excerpt from my guidebook for journalists who want to start a publishing business: How to Make Money Publishing Community News Online [$6.99 from Amazon.com]

A website is simply a collection of documents and programs that reside on a computer somewhere. On a very simple website, the site's webpages are individual documents on that computer, written in what's called Hypertext Markup Language, or HTML. Each of those webpage documents is identified by a unique name, called a Uniform Resource Locator, or URL. That computer will use what is called the Hypertext Transfer Protocol, or HTTP, to deliver a copy of that document to your computer when you ask for it. That's what allows you to read that webpage.

Here's an example of a URL:

http://www.robertniles.com/stats/stdev.shtml

When you type that into a Web browser (such as Safari or Chrome), you are asking your computer to use hypertext transfer protocol (the http:// part) to make a request to the computer that hosts the www.robertniles.com website, to find the stats folder on that computer, and to deliver a copy of the stdev.shtml file inside the stats folder to your computer. The HTML code in that file contains the instructions to your browser that tell it what text and images to display for that webpage.

How does your HTTP request get to that computer? Your computer needs to be connected to the Internet, through some Internet Service Provider (ISP). Connections can be wireless, over a Wireless Fidelity (WiFi) network or a cellular data network, such a EDGE, 3G, 4G or LTE. Or they can be wired, such as a connection you might have at home or the office, perhaps using a cable, DSL or fiber optic network. Your request goes from your computer (or smartphone) to the ISP, which then routes it across what's called an Internet backbone connection to the ISP that hosts the computer you're trying to contact.

The computer you're trying to contact is running a piece of software called a Web server, which manages incoming connections and finds and returns those HTML files that other computers are requesting. The Web server knows where to send the page you requested because your computer (or smartphone) — like every other device connected to the Internet — has what's called an Internet Protocol number, or IP address.

An IP address looks like this — 10.11.14.10 — and it's assigned by your ISP. Depending upon your connection, your device might get a different IP each time you log online. The computer hosting the website you're trying to get has an IP address, too. In fact, the domain name of every website is simply an English-language alias for the domain's IP address. After all, it's much easier to remember www.violinist.com than it is to remember 67.199.102.3. (But if you really want to be a geek, you also can type a website's IP address into a browser, instead of its domain name. Go ahead and type http://67.199.102.3 into Safari or Internet Explorer and see what happens!)

An ISP uses what's called the Domain Name System (or DNS) to look up what IP address corresponds to a specific domain name, whenever you make a request. ISPs have Domain Name Servers that keep a record of all those domain names and IP addresses, so that they can find and make those connections as quickly as possible, usually within microseconds.

When you "buy" a domain name, you are actually just contracting with a domain name registrar to associate a particular domain name exclusively with a computer of your choosing (your website host) for a specific period of time, from one to 99 years. In the Web's early days, whenever you wanted to make a change to your website, you would use File Transfer Protocol (FTP) to upload a new HTML document from your computer to the Web server hosting your website. You would use a piece of software called an FTP client to make that connection and transfer the files. Your website (ideally) would be protected by a username/password combination that would only allow you to make FTP connections to your Web server, even as it allowed everyone to use an HTTP connection to view the content of those files.

These days, websites typically are quite a bit more complex than a simple collection of HTML documents sorted into folders on a Web server. Most major websites use a Content Management System (or CMS) that can create virtual HTML documents on the fly. Modern websites aren't simply a collection of unchanging (or "flat") documents anymore — they're computer programs that generate highly customizable webpages and online presentations, under the direction of their publishers of course.

The CMS should have a user-friendly interface that allows a publisher with no programming experience to create a highly sophisticated-looking website, with powerful functionality. Few publishers use every feature that a CMS can deliver. Part of your job as a publisher will be to pick which functions and tools you think will best allow you to address your readers' real needs. Only rarely will you use FTP to update a website these days. Almost always, you'll use the interface of your CMS to add or edit content on your website. (That's what's happening when you use a service such as Blogger. The Web forms you fill out to post an entry on Blogger are simply the user interface to Blogger's CMS.)

On modern websites, a URL can do more than just request a specific document on the server — it can pass information to the CMS that allows it to create a customized page on the fly. That way, a website publisher doesn't have to create a new HTML file for every new post he or she writes. Just use the CMS interface to input your story, which will be stored in a database on the server. The website CMS then can take a single story-template file on the server and plug in the content from any of those stories you've posted to create what looks to the reader like a regular webpage. Consider a URL such as this:

http://www.robertniles.com/story.php?article=1003&ad=4

That URL tells the Web server that hosts www.robertniles.com that it wants to display the article with the identification number 1003, using the story.php template. And, oh yeah, go ahead and show the ad from advertiser number 4 when creating this webpage, too.

A CMS can create pages customized to an individual reader, too. Websites can leave a short line of text — called a "cookie" — in the memory of a browser when a user visits a page on the site. Cookies usually store an identification number, which the website can use to customize content on the site for that individual reader. Let's say a website records an identification number for each reader who visits the site. It does this by first asking the reader's Web browser if it is storing any cookies from that site. If the browser doesn't report a cookie, the website knows that the reader is a first-time visitor, and asks the browser to set a cookie from the site.

But if the browser already has a cookie from the site, it can report back to the website the identification number stored in the cookie. That way, the website knows that reader "100001" is back again and looking at the site.

A website can store in its own database a whole bunch of information about user number 100001, or any other visitor to the site. It could store the time that visitor last came to the site, allowing the website to show that visitor only new posts or comments since his or her last visit. If user number 100001 creates an account on the website, it could record if and when the user logs into the site, allowing the site to customize its display according to that user's preferences. The website also could track what user 100001 clicks on, and which pages he or she views, allowing the website to make a more educated guess about which types of ads the user might be most willing to click.

All major browsers allow users to control the setting of cookies, or to delete them altogether. But if a reader chooses not to allow cookies, he or she won't ever be able to log into a website, and every site online will treat that user like a first-time visitor.

TECH STUFF

If you're curious, here's what one example cookie looks like:

.violinist.com TRUE /bakery FALSE 1999999999 VISITORID 100001

What does that mean? Here's the technical explanation: This cookie says that it has been set by "violinist.com", which is the only Web domain that can read and act upon the cookie. The "TRUE" means that the cookie can be read via HTTP only, and the "/bakery" means that the cookie can be read by any webpage in the "bakery" directory on Violinist.com. The "FALSE" means that the reader doesn't need to be using a secure Internet connection for the website to read and act upon the cookie. The "1999999999" tells the browser when to delete this cookie. (It's the number of seconds since Jan. 1, 1970 at midnight GMT.) The "VISITORID" is the name of the variable that this cookie is storing, and "100001" is the value of the "VISITORID" variable.

A modern CMS will handle cookie management for you — it's not something that you'll need to set manually as a website publisher. But you should know what's happening "under the hood" of your CMS, to best be able to use its abilities to meet your community's needs as a publisher. As we move forward in this book — and as you move forward in your publishing business — I hope that you'll use this chapter to help you keep straight the alphabet soup of acronyms you'll find in online publishing: HTTP, URL, HTML, IP, ISP, DNS, FTP, CMS, etc. You'll see them again and again as you work in this field.

This is an excerpt from How to Make Money Publishing Community News Online [$6.99 from Amazon.com]

Robert Niles also can be found at http://www.themeparkinsider.com

All posts: 2016 · 2015 · 2014 · 2013 · 2012 · 2011 · 2010 · 2009 · 2008

© Robert Niles. Contact