More Clouds

Last week I wrote PHP scripts to generate tag clouds for The Rhetoric’s sidebar. Tag clouds are a specific kind of weighted list in which words are arranged in a continuous list. The words are generally displayed with a weight and/or size that correspond to their frequency.

The first cloud I wrote was for my Flickr tags. Each picture in my photostream has multiple tags that describe the photo and allow people to find it in a search and to find similar pictures in my stream and others. The script I wrote uses the Phlickr PHP libraries to get the tags from the Flickr API and output the cloud to the sidebar.

The second cloud was for post categories on this blog. Each post has one or more categories which enable the user to find other posts with similar subjects. Instead of querying a web server using API calls, I wrote SQL statements in PHP to query The Rhetoric’s MySQL database for the category names.

The methods the two PHP scripts use to query the databases and build the data structures are very different. The code used to build the clouds, however, was almost identical. Therefore, I moved this code to functions in a seperate PHP file. This allows me to make changes to one file and have the changes propogated instantly to each cloud.

While surfing at O’Reilly I discovered a paper titled “Building Tag Clouds in Perl and PHP“. I was interested to know how the author designed his tag cloud algorithms so I purchased the paper and quickly read its 46 pages. Our algorithms are almost identical. One major improvement I made to my clouds after reading this paper was to use logarithms in my calculations of size and weights. Weighted lists, like many things, are susceptible to power curves. Throttling the sizes and weights with logarithms solves the power curve problem. In my code I use an associative array to store the tag and tag count where the author of my O’Reilly paper uses multidimensional arrays. To me an associative array is cleaner, simpler, more efficient, and faster. So he had a better idea and so did I.  It was $10 well spent.

The author of the O’Reilly paper, Jim Bumgardner, used tags from his del.icio.us site in one of the examples. I had wanted to do this for my sidebar but couldn’t find good documentation at del.icio.us. Using Jim’s code as a reference I wrote a PHP script to pull my del.icio.us tags from an RSS feed, process them through MagpieRSS, and output them to the sidebar using my cloud generation functions. It worked great in testing until I moved it to The Rhetoric. After hours of frustrating troubleshooting I went to bed. The solution to my problem came just after crawling under the covers and was crystal clear by morning. The Rhetoric’s blog engine, WordPress, must be using a wrapper function, I thought. I typed “global” in front of my tags variable and voila, it works!

Leave a Reply