<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Durability, Scalability, Availability</title>
	<atom:link href="http://www.jessesanford.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jessesanford.com</link>
	<description></description>
	<lastBuildDate>Fri, 05 Feb 2010 04:46:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Installing xdebug on centos 5</title>
		<link>http://www.jessesanford.com/2010/02/04/installing-xdebug-on-centos-5/</link>
		<comments>http://www.jessesanford.com/2010/02/04/installing-xdebug-on-centos-5/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 04:36:59 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=45</guid>
		<description><![CDATA[Performance profiling php in my opinion is best done with xdebug and kcachegrind (install it via APT on an Ubuntu virtual machine and save yourself some trouble. I know you can get it to work with xwindows natively on a mac but damn it&#8217;s not easy and it certainly isn&#8217;t as pretty as using it [...]]]></description>
			<content:encoded><![CDATA[<p>Performance profiling php in my opinion is best done with xdebug and kcachegrind (install it via APT on an Ubuntu virtual machine and save yourself some trouble. I know you can get it to work with xwindows natively on a mac but damn it&#8217;s not easy and it certainly isn&#8217;t as pretty as using it in KDE 4 )</p>
<p>How to performance profile an app is an art not covered here. Seriously if you want to start somewhere I suggest reading this article over at Zend developer zone (It&#8217;s from 2007 but it&#8217;s still relevant):</p>
<p><a href="http://devzone.zend.com/article/2899-Profiling-PHP-Applications-With-xdebug">http://devzone.zend.com/article/2899-Profiling-PHP-Applications-With-xdebug</a></p>
<p>Also there is a great article if you can find it from the September 2004 edition of php|Architect&#8230; If you can&#8217;t find it on the internet email me and I will send you a PDF version of it. It is a complete and comprehensive article and should not be missed if you are working with xdebug at all.</p>
<p>So finally back on topic:</p>
<p>Installing xdebug is pretty easy assuming you have php/pecl/pear installed&#8230; If you don&#8217;t I suggest you use yum and if you don&#8217;t have that installed then just follow this tutorial: <a href="http://www.matteomattei.com/en/install-yum-and-php-pear-on-centos-5">http://www.matteomattei.com/en/install-yum-and-php-pear-on-centos-5</a></p>
<p>Anyway once you have pecl installed you can run the following:</p>
<blockquote><p># pear install pecl/xdebug</p>
<p>downloading xdebug-2.0.5.tgz &#8230;<br />
Starting to download xdebug-2.0.5.tgz (289,234 bytes)<br />
&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;done: 289,234 bytes<br />
67 source files, building<br />
running: phpize<br />
Configuring for:<br />
PHP Api Version:         20041225<br />
Zend Module Api No:      20050922<br />
Zend Extension Api No:   220051025<br />
/usr/bin/phpize: /tmp/tmpOcEvNL/xdebug-2.0.5/build/shtool: /bin/sh: bad interpreter: Permission denied<br />
Cannot find autoconf. Please check your autoconf installation and the $PHP_AUTOCONF<br />
environment variable is set correctly and then rerun this script.</p></blockquote>
<p>Woops!</p>
<p>Looks like something is up with running /tmp/tmpOcEvNL/xdebug-2.0.5/build/shtool</p>
<p>I have seen this type of build error before when using other package management systems and it turns out that what I thought was true. That the /tmp directory was mounted with the noexec switch.</p>
<p>See here:</p>
<blockquote><p># mount -l | grep /tmp<br />
simfs on /tmp type simfs (rw,noexec)<br />
simfs on /var/tmp type simfs (rw,noexec)</p></blockquote>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"># pear install pecl/xdebug</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">downloading xdebug-2.0.5.tgz &#8230;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Starting to download xdebug-2.0.5.tgz (289,234 bytes)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;done: 289,234 bytes</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">67 source files, building</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">running: phpize</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Configuring for:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">PHP Api Version:         20041225</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Zend Module Api No:      20050922</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Zend Extension Api No:   220051025</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">/usr/bin/phpize: /tmp/tmpOcEvNL/xdebug-2.0.5/build/shtool: /bin/sh: bad interpreter: Permission denied</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Cannot find autoconf. Please check your autoconf installation and the $PHP_AUTOCONF</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">environment variable is set correctly and then rerun this script.</div>
<p>So let&#8217;s remount it with the exec switch:</p>
<blockquote><p>mount -o remount,exec /tmp</p></blockquote>
<p>And now let&#8217;s see what mount -l says:</p>
<blockquote><p>mount -l | grep /tmp<br />
simfs on /tmp type simfs (rw)<br />
simfs on /var/tmp type simfs (rw,noexec)</p></blockquote>
<p>So we can see that now the /tmp does not have noexec listed.</p>
<p>Ok let&#8217;s try and install via pecl again:</p>
<blockquote><p># pear install pecl/xdebug<br />
downloading xdebug-2.0.5.tgz &#8230;<br />
Starting to download xdebug-2.0.5.tgz (289,234 bytes)<br />
&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;done: 289,234 bytes<br />
67 source files, building<br />
running: phpize<br />
Configuring for:<br />
PHP Api Version:         20041225<br />
Zend Module Api No:      20050922<br />
Zend Extension Api No:   220051025<br />
building in /var/tmp/pear-build-root/xdebug-2.0.5<br />
running: /tmp/tmp3tHVR2/xdebug-2.0.5/configure<br />
checking for egrep&#8230; grep -E<br />
checking for a sed that does not truncate output&#8230; /bin/sed<br />
checking for gcc&#8230; gcc<br />
checking for C compiler default output file name&#8230; a.out<br />
checking whether the C compiler works&#8230; configure: error: cannot run C compiled programs.<br />
If you meant to cross compile, use `&#8211;host&#8217;.<br />
See `config.log&#8217; for more details.<br />
ERROR: `/tmp/tmp3tHVR2/xdebug-2.0.5/configure&#8217; failed</p></blockquote>
<p>Well were not in the clear yet the error looks surprisingly similar&#8230; let&#8217;s just try to mount /var/tmp without the noexec switch as well:</p>
<blockquote><p># mount -o remount,exec /var/tmp</p></blockquote>
<p>And again the pecl install:</p>
<blockquote><p># pear install pecl/xdebug<br />
downloading xdebug-2.0.5.tgz &#8230;<br />
Starting to download xdebug-2.0.5.tgz (289,234 bytes)<br />
&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;..done: 289,234 bytes<br />
67 source files, building<br />
running: phpize<br />
Configuring for:<br />
PHP Api Version:         20041225<br />
Zend Module Api No:      20050922<br />
Zend Extension Api No:   220051025<br />
building in /var/tmp/pear-build-root/xdebug-2.0.5<br />
running: /tmp/tmpkWolpH/xdebug-2.0.5/configure<br />
checking for egrep&#8230; grep -E<br />
checking for a sed that does not truncate output&#8230; /bin/sed<br />
checking for gcc&#8230; gcc<br />
checking for C compiler default output file name&#8230; a.out<br />
checking whether the C compiler works&#8230; yes<br />
checking whether we are cross compiling&#8230; no<br />
checking for suffix of executables&#8230;</p>
<p>&lt;SNIP&gt;</p>
<p>Build process completed successfully<br />
Installing &#8216;/var/tmp/pear-build-root/install-xdebug-2.0.5//usr/lib64/php/modules/xdebug.so&#8217;<br />
install ok: channel://pecl.php.net/xdebug-2.0.5<br />
You should add &#8220;extension=xdebug.so&#8221; to php.ini</p></blockquote>
<div>Nice. Ok so now it worked. Notice the path that it was installed to (on this 64 bit machine it was): /usr/lib64/php/modules/xdebug.so</div>
<div>We will need that later when we put it in our new /etc/php.d/xdebug.ini: (the contents of which I will explain in another blog post but for now the comments inside of it should be self explanatory)</div>
<blockquote><p>#extension_dir = &#8220;/usr/local/lib/php/20060613&#8243;</p>
<p>#doc_root=&#8221;/usr/local/apache2/htdocs/web&#8221;</p>
<p>#the following was added by jesse to fix the path variables when run in cgi mode<br />
#cgi.fix_pathinfo=0</p>
<p>zend_extension=&#8221;/usr/local/lib/php/20060613/xdebug.so&#8221;<br />
#xdebug.remote_enable=1</p>
<p>;JESSE: the following was taken from: http://code.google.com/p/syslogr-utils/wiki/XdebugHelper<br />
;When using Eclipse with PDT and xdebug &#8212; make sure to change your<br />
;tools &#8211;&gt; addons &#8211;&gt; xdebug helper &#8211;&gt; preferences &#8211;&gt; idekey = ECLIPSE_DBGP<br />
;(This is the setting that XDEBUG_SESSION_START is set to on your web browser URL)</p>
<p>;JESSE: I have put the following notes in from some research into profiling with webgrind:<br />
;(they were taken from: http://www.chrisabernethy.com/php-profiling-xdebug-webgrind/ )</p>
<p>;NOTE: usually use cachegrind.out.%t.%p when I want one output file per script run,<br />
;but if I want to run a script multiple times and see the aggregate numbers<br />
;I use cachegrind.out.%s and set xdebug.profiler_append = on</p>
<p>;NOTE: A Nice to know, for different cachegrind files per host you can use %H in the<br />
;profiler_output_name, according to<br />
;http://www.xdebug.org/docs/all_settings#trace_output_name</p>
<p>;Always profile scripts with xdebug:<br />
;xdebug.profiler_enable = 1</p>
<p>#xdebug.profiler_enable=1</p>
<p>;Alternatively, enable profiling with GET/POST parameter XDEBUG_PROFILE,<br />
;e.g. http://localhost/samplepage.php?XDEBUG_PROFILE:<br />
;xdebug.profiler_enable_trigger = 1</p>
<p>xdebug.profiler_enable_trigger=1</p>
<p>xdebug.profiler_output_dir=&#8221;/tmp/xdebug/&#8221;</p>
<p>;the following forces xdebug to append to the outfile rather than<br />
;overwrite it on each exec of a script<br />
;xdebug.profiler_append=On</p>
<p>;the patterns for the names of the outfiles<br />
;xdebug.profiler_output_name = cachegrind.out.%s<br />
xdebug.profiler_output_name = cachegrind.out.%t.%p</p>
<p>;the following opens debug output links in textmate<br />
xdebug.file_link_format = &#8220;txmt://open?url=file://%f&amp;line=%l&#8221;</p>
<p>xdebug.remote_host=&#8221;127.0.0.1&#8243;<br />
xdebug.remote_port=9000<br />
xdebug.remote_handler=&#8221;dbgp&#8221;<br />
xdebug.remote_mode=req<br />
xdebug.idekey=1</p></blockquote>
<p>Now check the php-cli and see if xdebug shows up in it&#8217;s ini dump.</p>
<blockquote><p>php -i | grep xdebug<br />
/etc/php.d/xdebug.ini,<br />
xdebug<br />
xdebug support =&gt; enabled<br />
xdebug.auto_trace =&gt; Off =&gt; Off<br />
xdebug.collect_includes =&gt; On =&gt; On<br />
xdebug.collect_params =&gt; 0 =&gt; 0<br />
xdebug.collect_return =&gt; Off =&gt; Off<br />
xdebug.collect_vars =&gt; Off =&gt; Off<br />
xdebug.default_enable =&gt; On =&gt; On<br />
xdebug.dump.COOKIE =&gt; no value =&gt; no value</p>
<div>&lt;snip&gt;</div>
</blockquote>
<p>Looking good!</p>
<p>Oh and obviously restart apache&#8230; /etc/init.d/httpd restart</p>
<p>Sweet.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2010/02/04/installing-xdebug-on-centos-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Drupal polls don&#8217;t play well with full page caching or how a 3-tier architecture works.</title>
		<link>http://www.jessesanford.com/2009/12/17/why-drupal-polls-dont-play-well-with-full-page-caching-and-basicly-how-a-3-tier-architecture-works/</link>
		<comments>http://www.jessesanford.com/2009/12/17/why-drupal-polls-dont-play-well-with-full-page-caching-and-basicly-how-a-3-tier-architecture-works/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 19:56:25 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=39</guid>
		<description><![CDATA[I had a client who didn&#8217;t understand why caching was breaking things on their site. Of particular nuisance was the polls. To be fair, grasping why something as easy as polls works without caching and then does not work with caching can be confusing. Here is the email I wrote to help educate all of [...]]]></description>
			<content:encoded><![CDATA[<p>I had a client who didn&#8217;t understand why caching was breaking things on their site. Of particular nuisance was the polls. To be fair, grasping why something as easy as polls works without caching and then does not work with caching can be confusing. Here is the email I wrote to help educate all of us. Below you will see the contents of my email to them.</p>
<p>&lt;snip&gt;</p>
<p>To begin I think it is necessary to explain why things are cached and why caching has an impact on editorial workflow and functionality. Specifically this is in reference to polls. It&#8217;s going to take a bit to grasp what is happening so please read slow and bare with me.</p>
<p>Full web page caching or HTTP acceleration is done in part to effectively reduce the load experienced by an application and also to mask performance bottlenecks in the application to users.</p>
<p>Without caching, every time a user makes a request, that request eventually has some action that is performed by business logic in the application and ultimately the database. That action can actually be one or THOUSANDS of connections and queries. Those thousands of queries per user request are normally NOT unique as in one request will normally cause the same exact connections and queries to be performed as the next request for the same content. So as the number of users goes up the number of requests go up and ultimately the number of connections and queries on the db will go up (however far more dramatically).</p>
<p>Here are some scenarios:</p>
<p>1 user makes 1 request for 1 page that takes 500 queries to render = 500 queries on the database.</p>
<p>1 user makes 3 requests for 3 different pages each taking 500 queries to render = 1500 queries on the database.</p>
<p>2 users each make 3 requests for 3 different pages each taking 500 queries to render = 3000 queries.</p>
<p>It&#8217;s linear when you look at it in this fashion but if you think about how the database actually performs transactions you will see that the performance hit becomes EXPONENTIAL.</p>
<p>The database performs each transaction in a fashion that is sometimes blocking only allowing a single transaction to request a certain record at a time (rare in read only scenarios) OR the io needed to service requests can actually cause the processor to wait while the hard disks are saturated (common on high volume, data intensive applications). This wait time experienced by the processor can quickly add up to minutes experienced by an individual request. Minutes can become tens of minutes even hours as queries stack up waiting for the hard disks to return the data to the queries in the queue ahead of them.</p>
<p>There are many ways to solve this problem but the first is always to remove the waste.</p>
<p>Why perform all those non-unique calculations over and over again when you can just do them once, store the results and then return those stored results when the same non-unique request comes back again?</p>
<p>That is caching.</p>
<p>There are many levels of caching.</p>
<p>We can cache the query results for each query performed on the database. Every single one of the thousands of unique queries is currently kept cached automatically by the datbase for a certain period of time until it is thrown away to make room for &#8220;Hotter&#8221; queries.</p>
<p>We can also cache the results of the processing of the business logic (in the case of web applications this is usual some sort of html or xml output) those results can be used to produce an actual .html file on the hard disk.</p>
<p>This called &#8220;page level&#8221; caching.</p>
<p><strong>Hotness explained </strong>(this is not unique to databases this is for most simple caching algorithms)</p>
<p>If a certain request is made for a certain piece of data lets assign it an importance value of 1.</p>
<p>If a subsequent request is made for that same data lets add 1 to that importance value so that it is now 2</p>
<p>If no other request is made for that data it&#8217;s value stays at 2 for some period of time.</p>
<p>Now let&#8217;s graph this importance or hotness value:</p>
<p>The x axis is some deliniation of the volume of data. Lets just presume that all your data is in a single directory and it all has random alphabetical filenames. Then X would possibly start at at AKURALEJD and end at ZNFNALHE. NOTE the filenames are RANDOMLY assigned to the data.</p>
<p>The Y axis is the hotness or importance value assigned to each piece of data so the Hotter the data the higher it is plotted on the Y axis.</p>
<p>What you will find is that SOME portion of the data is hotter than others. The plot of said hotness follows a bell curve.</p>
<p>So why not just cache ALL possible data ever produced by an application?</p>
<p>Memory is limited. We can&#8217;t possibly account for every single unique view of the data. SO we can only keep a portion of that data in the cache. There for we draw vertical lines on our graph starting at the middle of the bell curve and expanding them until the content contained in distance between them is equal to amount of available memory. After that the &#8220;tail ends&#8221; remain un-cached.</p>
<p><img class="alignnone size-full wp-image-43" title="ttl_cache_bell_curve1-300x210" src="http://www.jessesanford.com/wp-content/uploads/2009/12/ttl_cache_bell_curve1-300x210.png" alt="ttl_cache_bell_curve1-300x210" width="300" height="210" /></p>
<p>Now what happens when what&#8217;s hot changes? Well we have to make room in the cache so that this new data can be put in. How do we know what to throw out of the cache to make room? Well we introduce a lifetime or a TTL (Time To Live) on each element in the cache.</p>
<p>So in it&#8217;s most basic form a TTL will tell the cache to &#8220;Throw out anything older than X&#8221;</p>
<p>That means that as time goes on if something HAS not been requested in a longer period of time than the TTL is set to then it will eventually make it&#8217;s way out of the cache.</p>
<p>This aging process also allows for the content that is cached to be &#8220;refreshed&#8221; or updated periodically. SO if you add NEW data or change data in the application then those changes will eventually make their way into the cache and ultimately be seen by people making requests for cached data.</p>
<p>Ok so that all sounds good so why all the issues with caching?</p>
<p>Before we get into this we need to take an even farther step back. Let&#8217;s look at how our content is uniquely named.</p>
<p>When it comes to the web our content is uniquely named by it&#8217;s URL (uniform resource locator) See: <a href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator">http://en.wikipedia.org/wiki/Uniform_Resource_Locator</a></p>
<p>With a unique URL the cache is able to determine &#8220;what is what&#8221; or more technically what the &#8220;STATE&#8221; of the application is for that URL. <em>The web used to be so simple.</em></p>
<p>What webservers do with URL&#8217;s:<br />
I will briefly explain the two most common functions a web server performs. These are:</p>
<p>Responding to GET requests. (commonly associated with clicking a link)</p>
<p>Responding to POST requests. (commonly associated with pushing a submit button)</p>
<p>NOTE that BOTH are &#8220;REQUESTS&#8221; there for if you click a link OR click a submit button you are &#8220;REQUESTING&#8221; something from the webserver. In the case of a POST request most times you are also sending something extra to the webserver (for instance a contact form OR your answers to a poll!)</p>
<p>Once the webserver receives your request it will check to see if it is a GET or a POST, then it will check to see what file was requested and then if the file is of a certain type it will sometimes offload the request to some application server (for instance PHP running as a cgi process). This application server (PHP for instance) will then process the request (taking into account the information in the GET or POST, making needed database queries etc.) finally the application server returns a response back to the webserver which will in turn return the response to your browser.</p>
<p>That response may or may return unique information depending on what was passed in the GET or POST request to the webserver (and ultimately to the application server) NOTE: In the past ALL requests both GET and POST caused the browser to refresh the page. <em>THAT fact alone is what defined WEB 1.0.</em></p>
<p>Enter AJAX.</p>
<p>Now web 2.0. We no longer have Unique URLS for every piece of content. More importantly we don&#8217;t have page refreshes upon every request. The same URL/page might look one way when you first visit it and then let&#8217;s just say (for the sake of a timely example) you submit a poll for instance. If the submission of that poll does not navigate you away from the page to a new URL then with page level caching you will never see the results of the poll as your submission will only return the same exact cached page that you first saw since the url is the same. THEREFOR you will see the UNSUBMITTED poll once again.</p>
<p>So then how do we deal with caching AND AJAX?</p>
<p>We do so in one of two ways.</p>
<p>1) By maintaining STATE or at least some subset of the state on the client side (within the browser&#8217;s cookies).</p>
<p>OR</p>
<p>2) By caching only small parts of a page that don&#8217;t change and allowing the other parts of a page to remain UNCACHED. This is commonly called &#8220;partial caching&#8221;.</p>
<p>So why not use partial caching on my project?</p>
<p>Partial caching is entirely application dependent. It has to be architected into the application from the start. You unfortunately cannot &#8220;just add&#8221; partial caching because someone has to take the time to determine which portions of an application CAN be cached and which portions CANNOT. Most times off the shelf web applications cannot have partial caching added on top of them. Most traditional CMS applications and legacy web applications are not Web 2.0 &#8220;savy&#8221; enough.</p>
<p>Drupal falls into a gray area. Some modules are smart enough. Others are not. In general you cannot use partial caching with drupal without some legwork.</p>
<p>SO we use the page level caching described above. Now what?</p>
<p>Well now we have to take into account the functionality implications of a page looking the same way no matter what every time you visit it. Sounds like it doesn&#8217;t fix anything, and it doesn&#8217;t, without some tricks.</p>
<p>To start we can tell the cache to NEVER respond to a POST request with cached content. That means that every time someone submits a poll then show them the results. The results are generated each time by the application server and then the unique response is sent back to the browser.</p>
<p>Why does the poll still NOT work on the site then?</p>
<p>The long and the short of it:</p>
<p>It does. BUT unfortunately the poll is one of those modules that was NOT written to be used with page level caching.</p>
<p>What&#8217;s happening is this:</p>
<p>The first time you visit the site (or at least before ever taking the poll) you get there by typing www.yourpage.com into your browser. That is a GET request and the homepage is retrieved from the cache. You see the poll and you ARE allowed to vote and you DO see your results. Your request is sent via a POST to the application servers and your response is returned by the application servers&#8230; NOT the cache because it was a POST request.</p>
<p>Then you navigate away from that page and everything is fine. Then let&#8217;s say you come back to the home page by clicking the logo in the top left. That is a GET request. The homepage is returned FROM THE CACHE this time and you do NOT see your results in the poll. You see a poll that has not been taken yet. Because that is how the homepage was cached originally.</p>
<p>SO then you try to take the poll again and this time it doesn&#8217;t let you submit. Why? Because the poll module in drupal was built to only allow ONE vote per person per poll. That means that you CANT submit the poll more than once.</p>
<p>So since you have already submitted and since the poll is cached as unsubmitted you are stuck with an unsubmittable poll.</p>
<p>DARN! How do we fix this?</p>
<p>The easy way: Make the polls allow more than one vote per person per poll. Really why is that so bad?</p>
<p>The hard way: Rewrite the poll module correctly to refresh itself via another Ajax call for an un-cached version of the results. This means essentially writing a new poll module.</p>
<p>I hope that explains why the polls are not working. Maybe you picked up a thing or two a long the way.</p>
<p>&lt;/snip&gt;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2009/12/17/why-drupal-polls-dont-play-well-with-full-page-caching-and-basicly-how-a-3-tier-architecture-works/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More research on Youtube racism with Hadoop and Nutch</title>
		<link>http://www.jessesanford.com/2009/11/29/more-research-on-youtube-racism-with-hadoop-and-nutch/</link>
		<comments>http://www.jessesanford.com/2009/11/29/more-research-on-youtube-racism-with-hadoop-and-nutch/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 18:16:36 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=37</guid>
		<description><![CDATA[I will go into the details later but I am thoroughly convinced that youtube is one of the largest most racist places on the internet. You would be hard-pressed to find a more high profile site with the N-word used so frequently in a derogatory sense. Anyway more recently in parallel I have been looking [...]]]></description>
			<content:encoded><![CDATA[<p>I will go into the details later but I am thoroughly convinced that youtube is one of the largest most racist places on the internet. You would be hard-pressed to find a more high profile site with the N-word used so frequently in a derogatory sense. Anyway more recently in parallel I have been looking for some way of getting my hand&#8217;s on enough data to put Hadoop to the test. About a week ago the two idea&#8217;s converged in my head and I had a vision of creating an application that would consume all of the comments on youtube (or at least as many as it can swallow) and then using a little linguistic regexp magic figure out who is the MOST racist user on Youtube. I started out by writing the basic mapper in python in my previous post. It is only good for a single video however. I new I would need a spider and I was planning on writing my own (python has some awesome dom and xml parsing: beautiful soup mini-dom, expat) but decided that for future use it might be better to make use of something based on lucene. I have been using solr quite a bit lately but was hoping for something a little more lightweight. Luckily I was able to remember the name of the spider that came out shortly after lucene was made available. Nutch! It hasn&#8217;t had the glory that Solr has had in recent years (I don&#8217;t understand why these two projects exist and don&#8217;t converge?) but it turns out that it is now based on Hadoop by default! In fact little to my knowledge but it looks like Hadoop was actually spawned by Nutch?! Anyway I spent some time setting up nutch on a hadoop cluster today and I have to say it is still not the easiest thing in the world to work with. I am still new to Hadoop and was following this <a href="http://wiki.apache.org/nutch/NutchHadoopTutorial">Nutch tutorial</a> very closely. Unfortunatley it glazes over setting up the slave nodes so I plan to fill in the details here and offer them to the community. One thing that it fails to mention that I should have know from my other Hadoop tutorials (never underestimate the minutia!) hdfs configurations need to use FULLY qualified hostnames! you can&#8217;t just user your machine name even though you may be able to ping each machine on a local subnet with just the hostname and even though you can telnet from the slave machine to say an Httpd instance running on the namenode you WILL get silent failure when trying to telnet to the namenode daemon&#8217;s port. Check your firewalls of course but even if they are turned off on both your master and slave it is quite pssible you will just see:</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 453px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">telnet: connect to address 127.0.0.1: Connection refused</div>
<p>$telnet master 9000</p>
<p>telnet: connect to address 192.168.0.1: Connection refused</p>
<p>however if you try the same thing from the master machine itself it works!? (assuming your name node is currently running and supposedly listening on port 9000)</p>
<p>This of course through me for a loop and I assumed it was everything from firewalls to bad interconnects between my machines. Strange that I never thought about the hostnames. I guess it was because all other services between them have been working fine. If anyone has any idea why hdfs has a problem with this I would love to know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2009/11/29/more-research-on-youtube-racism-with-hadoop-and-nutch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Map-Reduce, Hadoop, Hadoop Streaming, Python and racism</title>
		<link>http://www.jessesanford.com/2009/11/27/map-reduce-hadoop-hadoop-streaming-python-and-racism/</link>
		<comments>http://www.jessesanford.com/2009/11/27/map-reduce-hadoop-hadoop-streaming-python-and-racism/#comments</comments>
		<pubDate>Fri, 27 Nov 2009 22:25:08 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=34</guid>
		<description><![CDATA[Here is my python script for grabbing the latest 1000 comments (the api only allows access to the latest 1000 unfortunately) and then checks them against a regexp for matching agains known racist words. Right now it is just looking for the N word. This script will be one of the inner MAP tasks in [...]]]></description>
			<content:encoded><![CDATA[<p>Here is my python script for grabbing the latest 1000 comments (the api only allows access to the latest 1000 unfortunately) and then checks them against a regexp for matching agains known racist words. Right now it is just looking for the N word. This script will be one of the inner MAP tasks in a series of Map-Reduce steps.</p>
<p>#!/usr/bin/env python</p>
<p>import sys</p>
<p>import gdata.youtube</p>
<p>import gdata.youtube.service</p>
<p>import re</p>
<p>racist_pattern = re.compile(&#8217;.*igger.*&#8217;, re.IGNORECASE)</p>
<p>#import pprint</p>
<p>#pp = pprint.PrettyPrinter(indent=4)</p>
<p>yt_service = gdata.youtube.service.YouTubeService()</p>
<p>#yt_service.developer_key = &#8220;&#8221;     #turns out the developer key isn&#8217;t necessary</p>
<p>urlpattern = &#8216;http://gdata.youtube.com/feeds/api/videos/%s/comments?start-index=%d&amp;max-results=50&#8242;</p>
<p>for line in sys.stdin:</p>
<p>video_id = line.strip()</p>
<p>index = 1</p>
<p>url = urlpattern % (video_id, index)</p>
<p>#print url</p>
<p>comments = []</p>
<p>while url:</p>
<p>if index &lt; 20:</p>
<p>comment_feed = yt_service.GetYouTubeVideoCommentFeed(uri=url)</p>
<p>#comments.extend([ comment.content.text for comment in comment_feed.entry ])</p>
<p>for comment in comment_feed.entry:</p>
<p>if racist_pattern.match(comment.content.text):</p>
<p>print &#8216;%s\t%s\n&#8217; % (comment.author[0].name.text, comment.content.text)</p>
<p>#print [ 'Author: %s\t Comment: %s\n' % (comment.author[0].name.text, comment.content.text) for comment in comment_feed.entry ]</p>
<p>url = comment_feed.GetNextLink().href</p>
<p>index += 1</p>
<p>else:</p>
<p>#currently the google youtube gdata api will not support over 1000 comments</p>
<p>url = &#8216;http://gdata.youtube.com/feeds/api/videos/%s/comments?start-index=951&amp;max-results=49&#8242; % video_id</p>
<p>comment_feed = yt_service.GetYouTubeVideoCommentFeed(uri=url)</p>
<p>for comment in comment_feed.entry:</p>
<p>if racist_pattern.match(comment.content.text):</p>
<p>print &#8216;%s\t%s\n&#8217; % (comment.author[0].name.text, comment.content.text)</p>
<p>#comments.extend([ comment.content.text for comment in comment_feed.entry ])</p>
<p>#print [ 'Author: %s\t Comment: %s\n' % (comment.author[0].name.text, comment.content.text) for comment in comment_feed.entry ]</p>
<p>break</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#!/usr/bin/env python</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">import sys</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">import gdata.youtube</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">import gdata.youtube.service</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">import re</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">racist_pattern = re.compile(&#8217;.*igger.*&#8217;, re.IGNORECASE)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#import pprint</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#pp = pprint.PrettyPrinter(indent=4)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">yt_service = gdata.youtube.service.YouTubeService()</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#yt_service.developer_key = &#8220;AI39si7MDdkK_3HKW7C-NykJxoCuBYSBk3GfFDdjEG7tHWmNIZKyLgnvLR9sj6D4wss3IXWQ-oIWm_hB29vb7oOFUCMk8OClMQ&#8221;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">urlpattern = &#8216;http://gdata.youtube.com/feeds/api/videos/%s/comments?start-index=%d&amp;max-results=50&#8242;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">for line in sys.stdin:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">video_id = line.strip()</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">index = 1</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">url = urlpattern % (video_id, index)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#print url</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">comments = []</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">while url:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if index &lt; 20:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">comment_feed = yt_service.GetYouTubeVideoCommentFeed(uri=url)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#comments.extend([ comment.content.text for comment in comment_feed.entry ])</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">for comment in comment_feed.entry:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if racist_pattern.match(comment.content.text):</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">print &#8216;%s\t%s\n&#8217; % (comment.author[0].name.text, comment.content.text)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#print [ 'Author: %s\t Comment: %s\n' % (comment.author[0].name.text, comment.content.text) for comment in comment_feed.entry ]</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">url = comment_feed.GetNextLink().href</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">index += 1</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">else:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#currently the google youtube gdata api will not support over 1000 comments</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">url = &#8216;http://gdata.youtube.com/feeds/api/videos/%s/comments?start-index=951&amp;max-results=49&#8242; % video_id</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">comment_feed = yt_service.GetYouTubeVideoCommentFeed(uri=url)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">for comment in comment_feed.entry:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if racist_pattern.match(comment.content.text):</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">print &#8216;%s\t%s\n&#8217; % (comment.author[0].name.text, comment.content.text)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#comments.extend([ comment.content.text for comment in comment_feed.entry ])</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#print [ 'Author: %s\t Comment: %s\n' % (comment.author[0].name.text, comment.content.text) for comment in comment_feed.entry ]</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">bre</div>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2009/11/27/map-reduce-hadoop-hadoop-streaming-python-and-racism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Varnish and Squid working together&#8230; What?!</title>
		<link>http://www.jessesanford.com/2009/11/12/varnish-and-squid-what/</link>
		<comments>http://www.jessesanford.com/2009/11/12/varnish-and-squid-what/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 23:43:35 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Real World]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Systems Engineering]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=23</guid>
		<description><![CDATA[I have been using Varnish for quite some time and have always wished that there was some way for Varnish to know to serve &#8220;Stale&#8221; pages when the upstream application servers are swamped. There is actually a feature request for this on the Varnish Trac system here. NOTE: this feature should not really be necessary [...]]]></description>
			<content:encoded><![CDATA[<p>I have been using Varnish for quite some time and have always wished that there was some way for Varnish to know to serve &#8220;Stale&#8221; pages when the upstream application servers are swamped. There is actually a feature request for this on the Varnish Trac system <a href="http://varnish.projects.linpro.no/ticket/369" target="_blank">here</a>. NOTE: this feature should not really be necessary unless you have underestimated the ability of your application servers to handle your traffic. However even after proper capacity planning sometimes you get well&#8230; DUGG. We all know the &#8220;digg effect&#8221; (formerly referred to as the &#8220;slashdot effect&#8221;) and it&#8217;s repercussions (500, Guru meditation, Houston we have a problem!) There are many ways to skin a cat, but none would be as simple as this (considering we have an existing varnish setup). I should note that simply getting &#8220;Dugg&#8221; or &#8220;Slashdotted&#8221; normally wouldn&#8217;t take down a site with a proper reverse proxy setup based on Varnish. If your TTL is appropriate and you are using an appropriate <a href="http://varnish.projects.linpro.no/wiki/VCLExampleGrace" target="_blank">GRACE</a> value (for you Squid readers: &#8220;<a href="http://www.igvita.com/2009/08/05/masking-latency-failures-with-squid/comment-page-1/" target="_blank">stale-while-revalidate</a>&#8220;) then you will probably not saturate your app servers. Unfortunately if your content is good and your UI is right then maybe, just maybe a certain percentage of your new readers will stick around. And here is where it gets scary for the app servers. Maybe just, maybe your new readers will start to navigate in ways that your cache is not used to. Maybe they will start to hit those really OLD articles that haven&#8217;t been requested in months! If you think about your sites content vs it&#8217;s popularity you it will look something like this:</p>
<p><img class="alignnone size-medium wp-image-27" title="Cache Bell Curve" src="http://www.jessesanford.com/wp-content/uploads/2009/11/ttl_cache_bell_curve1-300x210.png" alt="Cache Bell Curve" width="300" height="210" /></p>
<p>No matter what you do there will always be something that falls into those &#8220;long tails&#8221; if your traffic patterns shift suddenly you can very well start to make a lot more request to your upstream servers than you (or more importantly your reverse proxy) expected.</p>
<p>Back to the task at hand. What can I do while I wait for the Varnish team to put this feature through? EASY&#8230; use Squid! There are so many debates over which reverse proxy is currently the fastest, which one is easier to setup or integrate with legacy apps etc. <a href="http://varnish.projects.linpro.no/wiki/ArchitectNotes" target="_blank">I&#8217;m</a> <a href="http://deserialized.com/reverse-proxy-performance-varnish-vs-squid-part-1/" target="_blank">certainly</a> <a href="http://deserialized.com/reverse-proxy-performance-varnish-vs-squid-part-2/" target="_blank">NOT</a> <a href="http://dotimes.com/iscale/2008/04/benchmark-caching-of-varnish-and-squid-again.html" target="_blank">trying</a> <a href="http://wfelipe.wordpress.com/2009/08/13/squid-vs-varnish/" target="_blank">to</a> <a href="http://www.kitchensoap.com/2008/06/24/varnish-and-squid-again/" target="_blank">get</a> <a href="http://t-a-w.blogspot.com/2007/04/varnish-vs-squid-assembly-still-matters.html" target="_blank">into</a> <a href="http://seankelly.tv/blog/blogentry.2007-03-02.4768602564" target="_blank">that</a>! In fact I will skirt the issue entirely saying this: when the features are right and you can afford to use it then why not? NOW don&#8217;t get me wrong. Afford can mean a lot of things. Take it as you will. I personally HATE using software, ANY software when I don&#8217;t have to. In fact I try to design my stacks as small as possible. As a general rule LESS SOFTWARE IS BETTER! It means less maintenance, less quality assurance&#8230; less hastle! However there are situations like the one I described above when you are put between a rock and a hard place. I can either:</p>
<p>A) Swap Varnish out completely and start using squid.</p>
<p>B) Augment my http acceleration layer with squid.</p>
<p>C) Buy more application servers and avoid the issue.</p>
<p>I wish, I wish, I wish C was always an option. Unfortunately not all client&#8217;s can afford to simply throw more money at the problem. If I had my choice I would scale horizontally off to the&#8230;horizon. SO I now get to choose between A and B. A is what my Sysadmin gut feeling (about never using more software than necessary) is telling me to do. BUT A also has the Test Engineer in me screaming &#8220;You will have to test everything all over again!&#8221;</p>
<p>Sooo here is another instance where the REAL WORLD comes crashing down on good systems engineering. C is the cheapest most cost effective solution. It could be said that maintaing another piece of software over time is going to be more costly than the upfront cost of swapping out Varnish entirely. But consider this&#8230;the Varnish feature that I was mentioning earlier&#8230; has already been assigned. It is only a matter of time before someone decides to pick it up and implement it. Hell I might even go ahead and do it if I can find the time. (BTW if your reading this month&#8217;s past the publish date of this post then you should definitely check that Trac ticket and see what has become of it.)</p>
<p>C it is. Now I am going to have to dust off my Squid skills and install that beast again. (Of course I couldn&#8217;t get through an article about varnish and squid with out some opinion&#8230;. Setting up Squid is not the easiest thing in the world!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2009/11/12/varnish-and-squid-what/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When good code goes BAD or The Unfortunate Misuse of Good Software Development Principles</title>
		<link>http://www.jessesanford.com/2009/11/09/when-good-code-goes-bad-or-the-unfortunate-misuse-of-good-software-development-principles/</link>
		<comments>http://www.jessesanford.com/2009/11/09/when-good-code-goes-bad-or-the-unfortunate-misuse-of-good-software-development-principles/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 06:29:31 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Real World]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[acceptance testing]]></category>
		<category><![CDATA[ActiveRecord]]></category>
		<category><![CDATA[automated testing]]></category>
		<category><![CDATA[code reuse]]></category>
		<category><![CDATA[code review]]></category>
		<category><![CDATA[Martin Fowler]]></category>
		<category><![CDATA[oop]]></category>
		<category><![CDATA[ORM]]></category>
		<category><![CDATA[overloaded methods]]></category>
		<category><![CDATA[pair programming]]></category>
		<category><![CDATA[peer review]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[regression testing]]></category>
		<category><![CDATA[single responsibility principle]]></category>
		<category><![CDATA[software development]]></category>
		<category><![CDATA[unit testing]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=8</guid>
		<description><![CDATA[I had a bit of a quandary today as I reviewed some code today and I decided to write up this little post on the experience. In essence it came down to choosing between code clarity and code reuse. I am a huge fan of both and they almost never conflict in fact they almost always exist in a symbiotic state. But not this time.]]></description>
			<content:encoded><![CDATA[<p>I had a bit of a quandary today as I reviewed some code today and I decided to write up this little post on the experience. In essence it came down to choosing between <a href="http://www.fulltablescan.com/index.php?/archives/140-Toms-First-Law-Code-Readability-is-the-Key-to-Long-term-Software-Quality.html">code</a> <a href="http://haacked.com/archive/2007/04/20/write-readable-code-by-making-its-intentions-clear.aspx">readability</a> and <a href="http://en.wikipedia.org/wiki/Code_reuse">code</a> <a href="http://devlicio.us/blogs/tim_barcz/archive/2009/03/09/real-life-code-reuse.aspx">reuse</a>. I am a huge fan of both and they almost never conflict in fact they almost always exist in a symbiotic state. But not this time.</p>
<p>This title of this post has been buzzing around in my head recently as I have been doing a lot of performance tuning on a very large and HIGHLY trafficked web site of a major print publication. Anyway I was recently reading (reference coming soon! until then dig through my delicious links) a blog post on the topic of performance tuning and why it is silly how often people micro-tuning. Think of that in the same negative respect that you think of micro-managing. It&#8217;s a waste of time and resources. The moral: even though performance tuning is normally a good thing, after a while it looses it&#8217;s value. You spend your money on the low hanging fruit (AHEM more hardware!)</p>
<p>Back to today. The code that I was reviewing had a few very cryptic stanzas that included a call to a function that consumed a single integer based parameter which after thorough inspection simply ended up determining if the query it finally triggers against the database is ascending or descending. Now I honestly doubt this developer (no matter how Jr) actually thought they would be increasing the performance of the application by including this integer parameter. In fact I am sure that they assumed that they were doing a good thing by reusing code (which I am a HUGE proponent of normally!) but when it came time for another developer to get into the code to augment it slightly it took exponentially more time then it would have if the original developer had simply a) created the database queries inline or b) created two separate nearly identical functions. Instead (fyi we are using an <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a> that follows the <a href="http://en.wikipedia.org/wiki/Active_record_pattern">ActiveRecord</a> <a href="http://martinfowler.com/eaaCatalog/activeRecord.html">Pattern</a>) the object has one single method <code>getNextArticle(int foo)</code> that returns both the previous AND the next article! How confusing! Anyway the best solution that required the least new untested code at this point was to wrap that function with two new functions that map to the integer values. Now we have:</p>
<p><code><br />
/*Note that the following function has a different method signature than the original and thus can coexist due to the <a href="http://en.wikipedia.org/wiki/Method_overloading">method overloading</a> feature of java*/<br />
getNextArticle(){<br />
 return this.getNextArticle(1);<br />
}<br />
</code></p>
<p><code><br />
getPreviousArticle(){<br />
  return this.getNextArticle(2);<br />
}<br />
</code></p>
<p>I can already hear someone out there saying: &#8220;That&#8217;s silly why not re-factor the whole object so that it has:</p>
<p><code>getNextArticle()</code> </p>
<p><code>getPreviousArticle()</code></p>
<p>wrapping some new function that is called with a String paramater like ASC or DESC.&#8221; Well to put it simply, time. Regression testing takes time and we would have to do a heck of a lot more of it on this project that unfortunately does not yet have 100% unit test coverage (to my DISDAIN) and does not have an automated testing setup. So the moral of the story is this. Even though code reuse is almost always a GOOD sign of proper software development practices sometimes it can lead to poor readability which then becomes more of a maintenance problem than the code reuse actually solves. In retrospect it would have been great to have found this code before it went through thorough user acceptance testing. However due to the accelerated nature of meeting client demands and working on unrealistic project schedules we were not able to do enough peer <a href="http://en.wikipedia.org/wiki/Code_review">code reviews</a> of code nor were we able to create a team large enough to do <a href="http://en.wikipedia.org/wiki/Pair_programming">pair programming</a>. </p>
<p>AHH how the <em>real world</em> always ruins principles, techniques, patterns and paradigms that work so well on paper!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2009/11/09/when-good-code-goes-bad-or-the-unfortunate-misuse-of-good-software-development-principles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Post</title>
		<link>http://www.jessesanford.com/2009/11/08/first-post/</link>
		<comments>http://www.jessesanford.com/2009/11/08/first-post/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 05:06:37 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jessesanford.com/?p=3</guid>
		<description><![CDATA[I am planning a few articles on my work specializing in highly durable, highly available, horizontally scalable open source architectures pertaining not only to the web but to any massively concurrent application. Any questions, comments or topic suggestions are gladly accepted.]]></description>
			<content:encoded><![CDATA[<p>After nearly a decade of the &#8220;cobler&#8217;s children have no shoes&#8221; I have decided to pick this blog back up again and write about my experiences as a software architect. I am planning a few articles on my work specializing in highly durable, highly available, horizontally scalable open source architectures pertaining not only to the web but to any massively concurrent application. Any questions, comments or topic suggestions are gladly accepted.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jessesanford.com/2009/11/08/first-post/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
