Danny Yee >> Web Design

Filesystems and URLs

Files and Directories | Search Engines and Links


Files and Directories

The underlying file-structure on the server may be largely invisible to the end-user, but it is still worth thinking carefully about.

First of all, it's not completely invisible to the user: URLs are visible. It is important to keep URLs short so people can cut and paste them into and from mail easily. Where possible file and directory names should be words - search engines look at these and people can remember them. Keep names in lowercase - also avoid using spaces, apostrophes, or other characters that can confuse humans or software, and don't mix dashes and underscores.

Encode as much information as possible into the file structure - give files and directories sensible and informative names. While it is also possible to put metadata into the file itself, it's not guaranteed that you (or others working on the site) will remember to do this, and such metadata can be hard to view. In contrast, a file has to have a name and be in some nested set of directories, and its location and name can be viewed using ordinary file-system tools.

Make use of the hierarchical file-system. Even if it appears silly to create a new directory just to put one "index.html" file into, this allows room for future expansion. It is also important to have files named "index.html" (or whatever your server defaults to when a directory is requested) so "URL chopping" will work.

Specifics:
  • Keep URLs under 80 characters, so they can be cut-and-pasted easily and without errors. (Not that many people are using curses-based Unix mailers these days, but a disproportionate number of those that do, run web sites.)
  • If you need to give different people access to different sections of a site, it will be much easier if the site is sensibly split into directories. Also, access permissions and .htaccess files work on a per-directory basis in Apache.
  • Arstechnica has an excellent article on filesystem metadata.
  • Prefer dashes to underscores - Google doesn't parse underscores as token separators. [Update: this is probably not true any more.]

Examples:

  • The file names for my book reviews use an unfortunate mix of capitals and lowercase, and some of the older ones are way too long. This is a legacy from pre-web delivery. I have since changed to shorter names, but have kept the mixed-case for consistency.

Search Engines and Links

Links to your pages from other sites are valuable. Not only will they bring visitors, but they will contribute to your site's ranking in citation measures like Google PageRank. So you should not break links by changing file names or removing old files.

Every page should have a TITLE metadata field that is descriptive without any context. TITLEs are used in search engine result lists, in bookmarks, and by automated link-creation software. So people will often view page titles in isolation from everything else on the page or site.

Every page must have its own URL. Implementations of frames where the location doesn't change as one changes pages are inherently broken, as they make it difficult or impossible to bookmark or link to specific pages.

Avoid having multiple addresses for the same page. With search engines doing more sophisticated ranking, you risk having your page appear twice in rankings - at positions 30 and 50 perhaps - instead of once - in the top 10.

Specifics:

  • If you must relocate directories or files, or remove files, or move your entire site, use server redirects to make sure requests are redirected to the most appropriate place. Do not use <META REFRESH> tags, since they will trap users (by stopping the back button from working) and search engines will ignore them. And don't use Javascript redirects - not all browsers support Javascript and not all users enable it.

  • TITLEs should be kept to around 60 characters. Many search engines truncate around there, and bookmark lists rarely fit more.

  • It is best not to use frames at all. Many major search engines do not index pages that use frames.
Last modified: December 2000

Web Design << Danny Yee