BlogFlare.com - Dodging Bullets On The Back End
December 26th, 2006 | by Scott Weaver |Often times, to get a site rolling, you must sort of throw code together that actually works for the time and you have to figure that later on, you’ll come up with a better solution if there is one. Well, such is the case on Blog Flare.
When I launched the site on the 6th of this month, I built a script that is executed hourly via cron to update all the ranks for the site. The script itself consists of a SELECT statement grabbing all the categories, an UPDATE LEFT JOIN on the table that holds all the blogs and the table that holds all the statistical data in order to provide a per-category rank, and finally an UPDATE LEFT JOIN on those tables for an overall ranking.
The result? Well, for the data of 79 blogs, the script execution time comes in just under a minute (well, an average of 47 seconds currently). To be fair, the table holding the hits/impressions is already over half a million lines deep, so this is a great indicator as to why it takes so long; however, I did build the script so it would do most of its execution on the server-side, thus minimizing the transfer of data between client and server.
Now, I realize that the obvious solution to this quandary is to build a nice Stored Procedure to take care of the job, but the server for BlogFlare is not currently running MySQL 5.0+ so that is currently not an option.
Not to fear! After dishing over a few possible solutions with a member of my wife’s family, he finally suggested I try a temporary table solution. I decided I’d code it up tonight and do a test-run on live data (that’s my style, homie). So I did just that and BAM- all the statistical data stored in the blogs table was wiped out due to the inferior method I had created, sans the data of the last category. I wasn’t worried, as I could still run the original script and repair all of this, so I did. What was hilarious about the newer method was that not only did it mess up the data, but it also took longer to run (clocking in at an average of 53 seconds).
You see, my worry (which is rational) is that when the site has, oh I don’t know .. 1,000 blogs, my script will then take about 10 minutes to update. That’s not actually a huge deal. I mean, it’s no picnic but it’s not the end of the world. The end of the world will come when I get up to 10,000 blogs and the script takes 100 minutes (or 2 hours) to execute! Yikes!!!
At any given time, the site has the capacity to display 999 blogs per category. So since the site has 34 actual categories, that comes out to a 33,966 blog capacity. At .59 seconds per blog to do an update, that comes out to a total time of 20,207 seconds or just over 14 hours total execution time for an update script that is run on an hourly basis.
Yeah, so now you see my problem. The upside is that I can always buy a dedicated server for BlogFlare.com, once it becomes popular, and I can give it its own memory and cpu cycles. That, and I will surely have MySQL 5.0 installed. This will drastically help me out, but until then, I am stuck trying to figure out a way to crunch the data hourly without taking too much time to do it (a few seconds preferably).
One approach I have not tried is taking the JOIN statements out of the equation and just doing the updates via PHP itself. Perhaps in trying to save myself transfer time, I have inadvertently created an unnecessary burden on mysql processing.


