How Search Engines Crawl Your Images
In previous issues of the Joomla! Community Magazine, we explained that search engine spiders crawl your site by
- checking for rules from an .htaccess file,
- then by looking for a file called robots.txt for more rules, and finally
- they apply these rules to crawl your site.
Some search engines are very sophisticated in the way they classify images. Depending on the particular engine, among the indexing considerations might be image dimensions, file size, color, geotagging information, Optical Character Recognition of any text in the image, and automatic detection of whether the image is a portrait (face recognition). Even after all this, your images may not yet be indexed while the engine attempts to find duplicates of your images—even when they have been cropped or changed.
In an earlier article Htaccess and Robots.txt file we discussed how it's important to remove the line "Disallow: /images/" in Joomla!'s robots.txt file. This line, as we wrote before, forbids search engines to crawl (cache) your folder images and to put them in a Search Engine Results Page Search Engine Results Page (SERP).
How Your Images Are Downloaded
As a visitor navigates around your Joomla! site, your server will send him or her the files that are required in order to correctly render your page. If you can reduce the number of requests for these files sent to your website's server, you achieve two benefits: one, your site uses less bandwidth to transfer files to the visitor's browser, saving you hosting costs; and two, your visitor has a faster experience of your site.
There are many ways to use a browser cache to speed up your site, and no doubt our readers will share some of theirs in their comments below. But let's consider one example. What happens if we modify Joomla!'s default .htaccess file (htaccess.txt) to include the following code, and a browser makes a second page request of our server?
Header set Cache-Control "max-age=2592000"
Header unset ETag
If no modifications are made to Joomla!'s default .htaccess file:
- The visitor's browser requests a page
- The server finds all the corresponding files
- The server reads them and responds
- The visitor's browser downloads the files and displays the page
- This process repeats when the visitor's browser requests another page
Browser caching by Last-Modified Date or entity tags ("ETags")
If the .htaccess page is modified to add ETags or to request the last modified date:
- The visitor's browser requests a page with a date or eTag to check against
- The server find all the corresponding files, checking also for the date or an ETag
- The server responds that there are no modified files.
- The Browser doesn't download files. Instead, it loads them from the local cache and displays page
Browser caching using max-age
If we use the code supplied above, which sets an expiration date for the various file types listed of 2,592,000 seconds, i.e. one month:
- The visitor's browser requests a page and checks in the local cache for the files.
- If the files are present and not expired, the browser will load them without a request to the server.
In this example, we have disabled ETags (entity tags) because we want stop the browser from validating files. Another method is to enable ETags and add a statement to see if the Apache web server module mod_expire is loaded. You can download a great master htaccess file by Nicholas Dionysopoulos.
The Image Files
Now that we're clear on how we'll serve the image files, let's talk about the images themselves. While there are many possible attributes which you can add to the HTML tag, two are required: SRC and ALT. SRC is the path to the image needed so that your image displays. The ALT is crucially important to SEO as it specifies an alternative text display for the image. It is important in making your pages accessible to those with visual handicaps who may be using screen reading software to visit your site. ALT text will be shown to people who have disabled images in their browser, too.
The ALT Attribute: Describing Your Image
Since a search engine robot doesn't have human eyeballs, providing a short description also helps it understand and verify your image content. It is a way to help search engines decide which are the most relevant images to a user's search query.
What should go into your ALT text? A concise, description that closely matches the name of your image. Three to seven words are optimal. If you need a few more words for the sake of accuracy that's OK, but many more are not. Resist the temptation to stuff your image with keywords. It's also helpful if the ALT text relates closely to the text surrounding the image and to the article title.
Joomla! Image SEO Checklist:
- alt attribute
- folder name and file name (use hyphens for name of the images)
- text surrounding an image
- position on the page
- unique images
- geotagging with EXIF or IPTC (latitude and longitude)
- rich snippets
- text in image (OCR)
- jpegs preferred
- anchor text matches image alt text
- backlink your photos from sharing sites (Flickr, Picasa, Facebook)
The SRC Attribute: Naming Your Image
Name your image descriptively. Logically arranging your images in descriptive folders and giving and supplying a name which matches the subject can also help a search engine determine the relevancy of your image. It's better to use hyphens than underscores to separate multiple words in your file name. For instance, in an article about trees, the tag for your illustrating image might read:
<IMG SRC="trees/european/italian-cypress.jpg" ALT="Italian Cypress trees on a Tuscan hillside">
Page PositionThe text surrounding an image should give an accurate description in order to improve your image SEO. This includes not just captions but where in your article you place your image, since your image should match your content. Have your important inline images close to the top of the article for best SEO results. If your image is placed too low on the page, it will have a smaller impact, much as a footer links might.
Understand that your images are part of the content that makes your site unique. Just as your text is more powerful when it can only be found on your site, whenever possible your images should be unique. If you reuse an image from elsewhere, it's possible that a search engine has indexed it already and ranked it higher.
Geotagging with EXIF or IPTC
Geotagging means adding metadata to the image to specify where the image subject is located, such as latitude and longitude or an address. The two most common standards for entering enter geotag metadata to your image are EXIF and IPTC.
EXIF (Exchangeable image file format) is the method that we recommend because it is a standard with which we can embed geographical coordinates (latitude and longitude) for search engines, and IPTC (International Press Telecommunications Council Information Interchange Model)is a method with which we embed words like city name, state and country. To geotag an image, you can use Google's free software Picasa.
One free Joomla Extension that deserves special mention is developer Jan Pavelka's wonderful Phoca Gallery. It offers a wealth of possibilities, such as the opportunity to geotag images and Categories too, and includes options for setting metatags and descriptions of images and Categories.
For more information about geotagging, we recommend the article "Ranking for Keyword + Cityname in Multiple Geographies."
Google's Rich Snippets present a great opportunity to make your search result stand out on a SERP and to take up a few extra lines in the results. This is a topic larger than the space we have available to discuss it here, but we would recommend the article "Introducing Rich Snippets". Here's how Rich Snippets look on a SERP:
JPEGs Are preferred
Search engines not only love jpegs, they love to find big, detail-rich images, too. But as we know, very large files can really degrade the speed of a website. The way to solve this problem is to use thumbnails and link to the full-sized image. There are many Extensions in the Joomla! Extentions Directory that help to create and display image thumbnails. This also gives you the opportunity to use to add a description of your image again, with anchor test.
Match Your Alt Attribute With Your Anchor Text
When your image is a link to another resource, such as a new page or another image, it’s wise to make sure that the IMG tag attributes relate with the name of the link. Let’s look at our cypress tree code example again:
<A HREF=”order-italian-cypress-trees.html”><IMG SRC="trees/european/italian-cypress.jpg" ALT="Italian Cypress trees on a Tuscan hillside">Italian Cypress trees on a Tuscan hillside</A>
Notice that the requested page in the link, the image source, and the alternative image text all repeat the words Italian, cypress, and trees. This reinforces to a search engine that the image is very likely to be a photo of Italian Cypress trees.
Text In Your Image (OCR)
You're probably aware that changing default fonts for security CAPTCHA images makes it harder for spammer's scripts to read those letters. Some search engines employ similar, even better technology for Optical Character Recognition to read text in an image. We are not aware of any research on how much impact this has on the on SEO of your website.
You can boost the SEO of your image by posting your images on photo sharing sites such as Flickr, Picasa or Facebook.
If Content Is King, Pay Your Homage
Your article text is not the only door from a search engine that a potential visitor can use to enter your site. High quality images are valuable content with many possibilities to leading to your site. Choosing your images wisely and coding for them with care helps people find your website and makes their experience of it more pleasant. By improving your website with good code and fast performance, you can open new ways for visitors to find you.