Had to talk through another php mod_fcgid setup today. It is amazing to me how difficult it is to find clear information on this setup out there on the internet.
If any of the below are wrong please tell me. I am not always right but I have studied this setup quite a bit and have come across many permutations and the below finally make sense.
The most MEMORY efficient way to use php with apache is via the WORKER MPM with a fastcgi plugin forking PHP-CGI processes. I specify MEMORY here because I am pretty sure that CPU wise mod_php is still faster. I can only assume that is because all of the data structures etc that are used within php are within direct access of the apache processes servicing the request. Faster still is probably compiling mod_php directly into apache rather than using the shared module.
The most MEMORY efficient way of using PHP-CGI is to use as FEW parent processes as possible if not 0 and use as many child processes as possible under those parent processes. This is especially true when using an opcode cache like APC. There is a known bug with the mod_FCGID that causes it to be unable to share memory between FCGI processes. mod_FASTCGI on the otherhand does not have this bug. Why not use mod_FASTCGI module instead? Because unfortunately it is a dead project (nothing since 2003): http://freshmeat.net/projects/mod_fastcgi/
Thus APC is not very effective when using mod_fcgid. It just can’t share its memory across subsequent requests to the same php code if those requests get routed to seperate php-cgi processes. See here:
“PHP child process management (PHP_FCGI_CHILDREN) should always be disabled with mod_fcgid, which will only route one request at a time to application processes it has spawned; thus, any child processes created by PHP will not be used effectively. (Additionally, the PHP child processes may not be terminated properly.) By default, and with the environment variable setting PHP_FCGI_CHILDREN=0, PHP child process management is disabled.
The popular APC opcode cache for PHP cannot share a cache between PHP FastCGI processes unless PHP manages the child processes. Thus, the effectiveness of the cache is limited with mod_fcgid; concurrent PHP requests will use different opcode caches.”
Anyway I used the following config on a relatively low traffic 4gb virtual instance with a VERY memory intensive PHP app.
<IfModule mod_fcgid.c>
AddHandler fcgid-script .fcgi
</IfModule>
## Sane place to put sockets and shared memory file
#FcgidIPCDir run/mod_fcgid #to use sockets on unix or pipes on windows
#FcgidProcessTableFile run/mod_fcgid/fcgid_shm #to use shared memory on unix
<IfModule mod_fcgid.c>
## FcgidInitialEnv PHPRC "/etc" # is also set in the wrapper script per virtual host. both ways will send the variables on to the cgi instances
##the following MaxRequestsPerProcess is also set in the wrapper script. both ways will send the variables on to the cgi instances
FcgidInitialEnv PHPRC=/etc/php5/cgi #working directory of php-cgi
#"By default, PHP FastCGI processes exit after handling 500 requests, and they may exit after this module has already connected to the application and sent the next request. When that occurs, an error will be logged and 500 Internal Server Error will be returned to the client. This PHP behavior can be disabled by setting PHP_FCGI_MAX_REQUESTS to 0, but that can be a problem if the PHP application leaks resources. Alternatively, PHP_FCGI_MAX_REQUESTS can be set to a much higher value than the default to reduce the frequency of this problem. FcgidMaxRequestsPerProcess can be set to a value less than or equal to PHP_FCGI_MAX_REQUESTS to resolve the problem."
FcgidInitialEnv PHP_FCGI_MAX_REQUESTS 5000 #same as below just as environment variable. sometimes good to set as well. due dilligence
FcgidMaxRequestsPerProcess 5000 #restarts child after so many requests to take care of memory leaks. needs to equal or be less than the PHP_FCGI_MAX_REQUEST variable sent to the cgi process
FcgidIdleTimeout 300 #allows process to sit for 5 mins while idle
FcgidIdleScanInterval 120 #how often fcgid parent polls children who are idle
FcgidBusyTimeout 3000 #allows php to spin for 50 mins before throwing 500 error
FcgidBusyScanInterval 120 #how often fcgid parent polls children who are busy
FcgidErrorScanInterval 60 #how often fcgid parent polls children for errors
FcgidZombieScanInterval 60 #how often to scan for zombie processes
FcgidProcessLifeTime 7200 #max time a child is alive by default
FcgidMaxProcesses 15 #max children PER PARENT PROCESS (remember this is not max TOTAL)
FcgidMaxProcessesPerClass 15 #Max fcgi processes
FcgidMaxProcessesPerClass 15 #Min to leave lying around
FcgidIPCConnectTimeout 3600 #Max time to wait for first "packet" on port or socket from child process
FcgidIPCCommTimeout 3600 #Max time to wait since the last "packet" on port or socket from child process
FcgidOutputBufferSize 128 #comm buffer between mod_fcgid and the cgi process. flushed to client after full.
</IfModule>
<VirtualHost *:80>
<IfModule mod_fcgid.c>
SuexecUserGroup some_unprivileged_user_with_rights_to_webroot some_unprivileged_group_with_rights_to_webroot #often these are www or apache user and group
FcgidFixPathinfo 1 #"This directive enables special SCRIPT_NAME processing which allows PHP to provide additional path information. The setting of FcgidFixPathinfoshould mirror the cgi.fix_pathinfo setting in php.ini."
<Directory "/var/www/html">
Options +ExecCGI
AllowOverride All
AddHandler fcgid-script .php
FcgidWrapper /var/www/php-fcgi-scripts/php-fcgi-starter .php #location of your suexeced bash script
Order allow,deny
Allow from all
</Directory>
</IfModule>
</VirtualHost>
#!/bin/sh
PHPRC=/etc/php5/cgi/ #again the php-cgi working directory
export PHPRC #also set in the httpd.conf. I think this just stomps the one set in the httpd.conf.
#"By default, PHP FastCGI processes exit after handling 500 requests, and they may exit after this module has already connected to the application and sent the next request. When that occurs, an error will be logged and 500 Internal Server Error will be returned to the client. This PHP behavior can be disabled by setting PHP_FCGI_MAX_REQUESTS to 0, but that can be a problem if the PHP application leaks resources. Alternatively, PHP_FCGI_MAX_REQUESTS can be set to a much higher value than the default to reduce the frequency of this problem. FcgidMaxRequestsPerProcess can be set to a value less than or equal to PHP_FCGI_MAX_REQUESTS to resolve the problem."
export PHP_FCGI_MAX_REQUESTS=5000 #again setting the max number of requests per child process. Again also in httpd.conf so I think these are redundant.
#"PHP child process management (PHP_FCGI_CHILDREN) should always be disabled with mod_fcgid, which will only route one request at a time to application processes it has spawned; thus, any child processes created by PHP will not be used effectively. (Additionally, the PHP child processes may not be terminated properly.) By default, and with the environment variable setting PHP_FCGI_CHILDREN=0, PHP child process management is disabled. The popular APC opcode cache for PHP cannot share a cache between PHP FastCGI processes unless PHP manages the child processes. Thus, the effectiveness of the cache is limited with mod_fcgid; concurrent PHP requests will use different opcode caches."
export PHP_FCGI_CHILDREN=0 #THIS IS WHERE YOU SET THE NUMBER OF PARENT PROCESSES These will each spawn 15 children in this example. If set to 0 then you will have 1 parent with 15 children and apache will handle spawning the children not php.
exec /usr/bin/php-cgi #the php-cgi binary
http://www.linode.com/forums/viewtopic.php?t=2982&postdays=0&postorder=asc&start=15
http://www.magentocommerce.com/boards/viewthread/29264/
http://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html
http://2bits.com/articles/apache-fcgid-acceptable-performance-and-better-resource-utilization.html
When using a varnish/squid/nginx or any reverse proxy +cache / http accelerator on top of a modern web framework / web application that has it’s own caching layer you may run into some strange seemingly uncontrollable TTL’s on your content. I have come to rely heavily on varnish in the last year and squid before that. If your reverse proxy is setup defensively (and in my opinion it should be) then you probably tend to ignore somewhat the http headers coming out of the application layer and put an additional TTL on top of all of the (non-dynamic) content. A for instance: There is a certain editorial site that I maintain that I cache everything on the site in varnish for 30 mins… period. This means that once an editor pushes a publish button and the content is picked up into the cache it won’t change for a half an hour. OR WILL IT? Turns out that since the application layer also has a cache lifetime of something like 5 mins the object in varnish may or may not refresh for an hour or more. How? Well that’s where it gets difficult to explain. It all depends on when the change get’s published to the content with respect to the application servers cache expiry on the object AND varnish’s cache expiry on the object AND the time of the next request for the object after Varnish’s cache expiry has happened. HAH! Before we go any further a picture is needed.
As you can see from the graph there are two sawtooth looking lines representing the two types of cache’s in use on this site. I have made the assumption that an editor publishes a change to an object (say a homepage of a site) every half hour. Really they may publish 25 times within that half hour but let’s just keep it simple. So in the best case example an editor publishes a change. That publish event happens RIGHT BEFORE the cache in varnish AND the cache in the application server is invalidated for that object and the user request comes in RIGHT AFTER the cache is invalidated in both places so that the user request flows all the way upstream to the application server which re-renders the page with the new content, caches it and then sends it back up to varnish which also caches it and then sends it to the client. That is the BEST POSSIBLE case. That RARELY happens. What normally happens is that a user request comes in, hit’s the varnish cache and is served a moderately old object. I say moderate because more than likely on average the object is somewhere near the middle of it’s TTL. IF the varnish TTL has expired on an object what is next most likely is that the APP server is somewhere within the middle of the TTL for the object within IT’S cache and the app server will just serve the cached response and then varnish will do what it knows best and CACHE THE OBJECT FOR ANOTHER CACHE CYLE. That means ANOTHER 30 mins in this scenario. Hmmm editors are not to happy about this situation. Thing is. It get’s worse. What CAN happen is that editor publishes a change immediately AFTER the app server rolls over on it’s cache for the object, re-caches the old content and never sees the editors change. The re-cached old content in the appserver is then served again to varnish who could (as within the last example) roll over on it’s cache RIGHT at the end of the app servers cache cycle for the object BUT BEFORE the appserver’s cache actually invalidates. As explained in the last example the appserver serves varnish the old object, varnish re-caches the old object and then that old object is within the varnish cache for the extra cache cycle. SO in the worst case the editors story doesn’t get picked up for what is essentially the sum of TWO FULL CACHE CYCLES of both cache’s TTL’s.
All of this is VERY hard to explain to laypeople. Hence the picture above with the perty colorful stars.
I hope this helps someone out there. I know I will use it over and over again..
P.S. I do realize you can always just get rid of one of the application servers cache. Varnish is performing the same exact function so why bother. Fact is I simplified this example to html buffer caching in both layers so that you it would be easier to understand. What Is really going on is that the application layer is not caching the html buffer. It is actually caching the results of the queries from the database in memcache for a period of time. In addition there are “partials” strewn through out the app that are cached independently of context within the app which gives finer grained control over areas that change less frequently than varnish allows for. BTW I do realize varnish supports ESI allowing certain of these partials to be re-calculated at the app server rather than at varnish even when varnish serves some of the page from it’s cache. I hope to utilize ESI in the future but that will require an app re-write. ALSO I would love to have the application server send the cache invalidation requests upstream to varnish when it’s own cache cycles roll over on objects. Unfortunately the off the shelf app we are using is not that smart and again it would require a re-write to get that in there.
I have been using Varnish for quite some time and have always wished that there was some way for Varnish to know to serve “Stale” pages when the upstream application servers are swamped. There is actually a feature request for this on the Varnish Trac system here. NOTE: this feature should not really be necessary unless you have underestimated the ability of your application servers to handle your traffic. However even after proper capacity planning sometimes you get well… DUGG. We all know the “digg effect” (formerly referred to as the “slashdot effect”) and it’s repercussions (500, Guru meditation, Houston we have a problem!) There are many ways to skin a cat, but none would be as simple as this (considering we have an existing varnish setup). I should note that simply getting “Dugg” or “Slashdotted” normally wouldn’t take down a site with a proper reverse proxy setup based on Varnish. If your TTL is appropriate and you are using an appropriate GRACE value (for you Squid readers: “stale-while-revalidate“) then you will probably not saturate your app servers. Unfortunately if your content is good and your UI is right then maybe, just maybe a certain percentage of your new readers will stick around. And here is where it gets scary for the app servers. Maybe just, maybe your new readers will start to navigate in ways that your cache is not used to. Maybe they will start to hit those really OLD articles that haven’t been requested in months! If you think about your sites content vs it’s popularity you it will look something like this:

No matter what you do there will always be something that falls into those “long tails” if your traffic patterns shift suddenly you can very well start to make a lot more request to your upstream servers than you (or more importantly your reverse proxy) expected.
Back to the task at hand. What can I do while I wait for the Varnish team to put this feature through? EASY… use Squid! There are so many debates over which reverse proxy is currently the fastest, which one is easier to setup or integrate with legacy apps etc. I’m certainly NOT trying to get into that! In fact I will skirt the issue entirely saying this: when the features are right and you can afford to use it then why not? NOW don’t get me wrong. Afford can mean a lot of things. Take it as you will. I personally HATE using software, ANY software when I don’t have to. In fact I try to design my stacks as small as possible. As a general rule LESS SOFTWARE IS BETTER! It means less maintenance, less quality assurance… less hastle! However there are situations like the one I described above when you are put between a rock and a hard place. I can either:
A) Swap Varnish out completely and start using squid.
B) Augment my http acceleration layer with squid.
C) Buy more application servers and avoid the issue.
I wish, I wish, I wish C was always an option. Unfortunately not all client’s can afford to simply throw more money at the problem. If I had my choice I would scale horizontally off to the…horizon. SO I now get to choose between A and B. A is what my Sysadmin gut feeling (about never using more software than necessary) is telling me to do. BUT A also has the Test Engineer in me screaming “You will have to test everything all over again!”
Sooo here is another instance where the REAL WORLD comes crashing down on good systems engineering. C is the cheapest most cost effective solution. It could be said that maintaing another piece of software over time is going to be more costly than the upfront cost of swapping out Varnish entirely. But consider this…the Varnish feature that I was mentioning earlier… has already been assigned. It is only a matter of time before someone decides to pick it up and implement it. Hell I might even go ahead and do it if I can find the time. (BTW if your reading this month’s past the publish date of this post then you should definitely check that Trac ticket and see what has become of it.)
C it is. Now I am going to have to dust off my Squid skills and install that beast again. (Of course I couldn’t get through an article about varnish and squid with out some opinion…. Setting up Squid is not the easiest thing in the world!)