Lesson 1: Search Engines

Introduction

This is a series of lessons for photographers who are serious about publishing their photography on-line. The tools and techniques described will be relatively inexpensive. When possible free resources will be used. That said, expect a minimum expense of $25-$35/month with some larger up-front costs for software. Using free services will get you started. Eventually you will need to start paying for services from CDNs, hosting providers, domain name registrars, and potentially more.

Before we start I should warn you. Learning on-line publishing, especially if you are not already familiar with how search engines work, is hard. The more you study, the better results you will get. Expect it to be time consuming. You will have to learn many new things that aren’t intuitive. Much of it is rather boring. If you don’t truly understand the basics you will get lost. I promise to do my best to ensure I cover the basics well. However, if there is anything you aren’t sure about please ask. I have tried to help others in the past with mixed results. After a few lessons they figure out that it is harder than they planned. This is not the expert knowledge level stuff. This is, in my opinion, the minimum amount of working knowledge you need to be an effective on-line publisher.

If you stay committed the results will come. If you are starting to sense this is really hard (and it is), and that is more than you bargained for (which it may be), that is okay. There are simpler methods to share your photographs with others. This guide is really intended for those that want to obtain a intermediate level of knowledge regarding on-line publishing with photography. I’m going to assume you are a regular user of the Internet, but do not know much about how web pages or search engines function. If you already understand these topics feel free to skim through the first few lessons.

On to lesson 1 of 99,000….

Anatomy Of The Web

Let’s start with a basic introduction to how search engines work. And the best way to learn how they work is to learn how they obtain information from a web site. To understand that, we need to know what a web site is. The Internet and the web are not really the same thing. The Internet is a global telecommunications network that has several services running on top of it. “The web” is just one of those services (there are thousands of services). A web site is a series of files that exist on a web server connected to the Internet. The files are just text files. When you navigate to a web page in your browser your computer downloads a copy of one of these text files to your computer’s memory from the server. What makes the files special is that they have text markers in them that affect the display of a web page called HTML (Hyper Text Markup Language). The markers tell your browser how to draw (or render) a page. An example would be if I wanted to make text appear bold. I could write “this is bold”. The web browser doesn’t know it should be bold. I have to tell the web browser to display the text as bold. That is done by adding these little characters: <b>this is bold</b>. So “<b>” is an HTML marker.

There are many different markers that change how a browser displays a web page. Some make images appear, embed a video, draw a table, or change colors. You do not need to know or learn much HTML. What you do need to know is that all the stuff that makes up the formatting of a web page is actually just text. Modern web sites have made it more complicated by adding very complex formatting rules, dynamic scripting languages, and more. Those things are all standard fair for the web site of today, but it is best to start with the basics. A web page is text (content) mixed with a whole bunch of other text that changes the formatting. This will become very important to understand in later lessons.

Creepy Crawlers

Search engines like Google have computer programs that scan the entire web called spiders, crawlers, or robots. These systems are literally downloading the entire web. Big search engines like Google might visit every web site in the world every few days, depending on a variety of factors. Some web sites might be downloaded every few minutes! Needless to say, these web crawlers are massive computer systems designed to keep track of the entire web. Google is one of several large search engines that do this. It is a marvel of modern life.

Using proprietary methods these web crawler programs examine the text of a web page and figure out which text elements are related to things like images, videos, etc., and which are related to formatting. The crawlers filter out the relevant text elements that contain actual content and store them in a huge indexes linked to an image or page. Each element of the index is ranked depending on the relevance of the text data to a certain set of search terms. This ensures that the most relevant data ends up at the top of the search results. Figuring out how to get your web pages ranked higher in search results is called SEO (search engine optimization). Because the crawlers are visiting your site quite frequently they are sure to pick up your latest changes. Those changes will affect how your site is indexed and ranked. What I’ve described here is a vast over-simplification, but we are keeping it basic.

Lots of people have tried to figure out how big search engines like Google and Bing “read” text on a web page and rank results. There are many people who want their pages ranked higher than everyone else, and they are willing to pay big. By figuring out how web crawlers work a programmer can cheat the system by sending crawlers misleading text. Many organizations use dirty tricks to achieve really high rankings. I have found, however, that it is far more effective to not get tricky at all. Search engines like Google reward good behavior. Good behavior means feeding the indexers exactly what they need and not trying to “trick” the crawlers using some other method to increase rank. A great deal of time is spent by software engineers at companies like Google in fighting back against those who are trying to trick the ranking system. If they think you are up to no good – they can wipe you out of the search results entirely!

Summary

What you need to know:

  • Web pages are just text.
  • Web crawlers scan the entire web downloading all that text.
  • The crawler figures out which parts of the web page text is important, and stores it in a huge index.
  • The search engine ranks the index against a certain set of search terms.
  • The goal of SEO is to get to the top of the ranking without resorting to dirty tricks.

As a phonographer, here is what you need to be thinking about:

  • A photographer needs a way to connect text elements to photographs in a fashion that crawlers from search engines can tell how they are related. A web crawler might download your photograph, but it doesn’t understand what is in it. The text does that.
  • It is important that text elements make sense to the crawler by containing search terms that search engines will care about. This helps ensure your data is ranked appropriately.
  • There are no tricks or quick fixes. Most search engines, most notably Google, reward good behavior. You will have to do the hard work by ensuring every element on your web site has the right text associated with it.

That is it for this round. Next we will talk about how to apply this understanding of search engines in a photographic work flow by associating text with an image.