available to the search engines is worth the potential loss of traffic to the rest of
• Since non-HTML documents will often be downloaded onto searchers’ hard
drives, it’s possible that your content could be used in ways you don’t condone.
If you’re concerned about this, don’t put them on your site. At the very least, be
sure that every document is clearly marked with authorship information, copy-
right notice, and your web address.
• Non-HTML documents may contain confidential information hidden in the
metadata that you don’t wish to make public, including things like tracked
changes, comments, and speaker notes. It’s always a good idea from a security
standpoint to review metadata for your documents before posting them in pub-
lic view. Workshare’s free software, Trace, available at
, can help you weed out potential problems.
With metadata in your pages and content rich with keywords, your non-HTML
documents may turn out to be healthy sources of targeted traffic for your site!
Thursday: Content Thieves
You’re starting to develop a lovely collection of content on your website, but is some-
body else nibbling at your piece of the pie? Unfortunately, the Internet remains some-
thing of a Wild West for copyright law. Other websites might steal your content simply
by cutting and pasting, or they may use
, a more sophisticated technique of
automatically grabbing content from your web pages, to steal material from your site
and put it up on theirs.
You want to be aware of content thieves, not just because they are using your
content to compete with you for search engine visibility, but also because they may be
damaging your brand. An employer of ours once discovered that another company had
repurposed large chunks of our website’s marketing content—but
hadn’t even taken the
time to change all of the instances of our company name!
If your content is stolen by a
similarly pathetic character, unwitting users might actually think that they are visiting
website, and that’s something you certainly don’t want.
If you feel it’s for the best, remove non-HTML files from your website or exclude them from indexing
c09. 8:10 247