Referrals create positive feedback loop

This humble weblog has experienced a threefold increase in visitors since the
past weekend. Ktheory.com went from roughly 500 visitors per day to more than
1,500 yesterday. The reason, unfortunately, is not a sudden appreciation of my
panache and web-savvy wit. It has to do with the way my public referrer
logs
are generated, and quirks in Google's PageRank
software (but mostly my referrer logs).



Traffic to ktheory.com triples over the weekend

More detailed traffic

These pictures of my site statistics shows that in the past four days, traffic
to ktheory.com has roughly tripled. Note that these graphics were made on Wednesday
morning, Feb. 19, hence the relatively small number of visitors so far that day.


Background

Referrer information is built into the protocol that runs the web (HTTP).
When you click on a link to go to a new site, the new site is sent the address
(URL) of the referring site.
Web servers are usually configured to keep logs of referrer information so the
site owner can see where visitors are coming from. Referrer information seems
ancillary to the web protocol, since surfers can still move from site to site
even if the owners didn't know from whence they came or where they're going. But
these logs strongly affect the way the web is organized. For example, large sites
can better understand how users navigate their site and then change the site navigation
and architecture to make it more useable.


Also, weblog relationships rely heavily on referrer logs since that is the
primary way to see who's linking to your weblog. New technologies such as trackback
extend referrer information to track online conversations. Refer
and LGF referrers
are scripts to make your referrer logs publicly viewable (rather than private
text files).


The Cause

I use LGF referrers on ktheory.com. The way it works is I insert one line of code
at the top of each page that processes the referrer information, then loads the
rest of the page. (You can't see the code if you view the page source since it
is parsed by my server, not your browser.) The reason for my recent spike in traffic
is that I included the code on the same page that lists my referrers.
It makes sense that when I post
about Frenchie Davis
, Google would index ktheory.com and send a people searching
for "frenchie davis" to ktheory.com. But Google also indexes my referrer
logs
, since I make them public. So, for every person who visited ktheory.com
looking for Frenchie Davis, her name also cropped up as a link in my referrer
logs. Here is a static
version of what my recent referrers page looked like this morning. Below is a
graphic showing the top search keywords that bring surfers to my weblog.


The top 20 search keywords that point to my web site


Currently, my referrer log is listed as the #3 hit on Google for the search
"frenchie davis nude" (screen
grab
). I'm even the #1 hit for "frenchie davis naked" (screen
grab
). What about that Earthlink search page (screen
grab
) that appears in Google's results? That's there because it was also in
my referrer logs, and PageRank didn't realize it was recursively including another
version of itself in its results. (The screen grabs are really low-res to save
disk space.)


"Frenchie Davis nude " is not the only search term that occurs frequently
in my referrer logs. "kim possible porn" came from an asexual reference
I made to the Disney cartoon Kim Possible (and later commented
on the unusual Google hits). "aaron carter naked" and "gay aaron
carter", as in the 16-year-old
pop singer
, came from a reference
to former president Jimmy Carter
. "nick carter naked" is also complicated,
since I never explicitly mentioned the Backstreet
Boy
. He cropped up in my referrer logs since both Nick
Denton
and Aaron Carter appear in my referrer logs.


The issue is kind of confusing, so I'll try to word it as clearly as possible:
When somebody visits ktheory.com from a search engine using popular search terms,
those search terms get repeated in my referrer logs. Then, when Google indexes
my referrer log, it sees many instances of the search term, and increases the
rank of my referrer log. That in turn drives more traffic to the log, which causes
the search terms to appear more frequently. It's a positive feedback loop
in which the output amplifies the source (like the screeching noise created when
you hold a microphone next to its speaker). Because of this, my referrer page
(show-refs.php) has become the most frequently visited
page at ktheory.com.

show-refs.php is getting way too much attention


Normally, my usage looks more like the graphic below. All the stuff in the /tmpl/
directory are files that get included on every page at ktheory.com.

This is what my page statistics should look like


I asked myself, is this even a problem? I mean, thousands of people are now
visiting my site. Isn't that what most webloggers want? Well, not exactly. People
searching for pornography of underage pop stars and cartoons are not the people
whose favor I wish to curry. From a less judgmental perspective, I'm making the
web more inconvenient to use by giving people the wrong content. It is a quirk
of poor programming that is driving traffic to my site. The idea of what constitutes
a legitimate visitor is fuzzy, since visitors could come to my site for myriad
reasons. But I'm pretty sure these visitors aren't looking for referrer stats
for some weblog that once happened to mention their pop star.


The Solution

Once I decided that I should change my page to discourage inappropriate traffic,
my first idea was to stop search engines from indexing my referrer logs. But that's
not ideal since it is possible that someone could be legitimately trying to track
a meme via referrer logs. The best solution was simply to stop my referrer page
from logging references to itself. This one line of code on my referrer page is
the difference between 500 and 1,500 visitors per day:

<?php include("~/www/lgf-reflog.php");
?>


While this is a marginal problem with web programming in general, it illustrates
potential problems of indexing, searching, and tracking online conversations.
Webloggers in particular often exhibit a recursive behavior by blogging about
recent goings on in the weblogging community. These positive feedback loops give
rise to the power
law
patterns that exist on several levels of the internet. While such patterns
generally serves us well, it means that small anomalies can quickly get exaggerated,
leading thousands of surfers to a confusing dead end.

← Previously: The Chicago nightclub tragedy: my subconscious racism | All posts | Next: Joe Millionaire →