Monday, September 26, 2011

Setup Caching and Proxy with Nginx in Centos/Fedora P2


user  nobody nobody;
worker_processes     4;
worker_rlimit_nofile 8192;
pid /var/run/nginx.pid;
events {
  worker_connections 2048;
}
 http {
    include       mime.types;
    default_type  application/octet-stream;
     log_format main ‘$remote_addr – $remote_user [$time_local] ‘
                    ‘”$request” $status  $body_bytes_sent “$http_referer” ‘
                    ‘”$http_user_agent” “$http_x_forwarded_for”‘;
    access_log  logs/nginx_access.log  main;
    error_log  logs/nginx_error.log debug;
    server_names_hash_bucket_size 64;
    sendfile on;
    tcp_nopush     on;
    tcp_nodelay    off;
    keepalive_timeout  30;
     gzip  on;
    gzip_comp_level 9;
    gzip_proxied any;
     proxy_buffering on;
    proxy_cache_path /usr/local/nginx/proxy levels=1:2 keys_zone=one:15m inactive=7d max_size=1000m;
    proxy_buffer_size 4k;
    proxy_buffers 100 8k;
    proxy_connect_timeout      60;
    proxy_send_timeout         60;
    proxy_read_timeout         60;
     include /usr/local/nginx/vhosts/*.conf;
}
Lets take a moment to review some of the more important options in nginx.conf before we move along…
user nobody nobody;
If you are running this on a server with an apache install or other software using the user ‘nobody’, it might be wise to create a user specifically for nginx (i.e: useradd nginx -d /usr/local/nginx -s /bin/false)
worker_processes 4;
This should reflect the number of CPU cores which you can find out by running ‘cat /proc/cpuinfo | grep processor‘ — I recommend a setting of at least 2 but no more than 6, nginx is VERY efficient.
proxy_cache_path /usr/local/nginx/proxy … inactive=7d max_size=1000m;
The ‘inactive’ option is the maximum age of content in the cache path and the ‘max_size’ is the maximum on disk size of the cache path. If you are serving up lots of object heavy content such as images, you are going to want to increase this.
proxy_send|read_timeout 60;
These timeout values are important, if you run any scripts through admin interfaces or other maintenance URL’s, these values will cause the proxy to time them out — that said increase them to sane values as appropriate, anything more than 300 is probably excessive and you should consider running such tasks from cronjobs.
Apache style MaxClients
Finally, maximum amount of connections, or MaxClients, that nginx can accept is determined by worker_processes * worker_connections/2 (2 fd per session) = 8192 MaxClients in our configuration.
Moving along we need to create two paths that we defined in our configuration, the first is the content caching folder and the second is where we will create our vhosts.
# mkdir /usr/local/nginx/proxy /usr/local/nginx/vhosts /usr/local/nginx/client_body_temp /usr/local/nginx/fastcgi_temp  /usr/local/nginx/proxy_temp
# chown    nobody.nobody    /usr/local/nginx/proxy    /usr/local/nginx/vhosts       /usr/local/nginx/client_body_temp /usr/local/nginx/fastcgi_temp   /usr/local/nginx/proxy_temp
Lets go ahead and get our initial vhosts file created, my template is available from here and should be saved to ‘/usr/local/nginx/vhosts/myforums.com.conf’, the contents of which are as follows:

Setup Caching and Proxy with Nginx in Centos/Fedora P3


server {
    listen 80;
    server_name myforums.com alias www.myforuns.com;
     access_log  logs/myforums.com_access.log  main;
    error_log  logs/myforums.com_error.log debug;
     location / {
        proxy_pass http://10.10.6.230;
        proxy_redirect     off;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
         proxy_cache               one;
        proxy_cache_key         backend$request_uri;
        proxy_cache_valid       200 301 302 20m;
        proxy_cache_valid       404 1m;
        proxy_cache_valid       any 15m;
        proxy_cache_use_stale   error timeout invalid_header updating;
    }
     location /admin {
        proxy_pass http://10.10.6.230;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    }
}
The obvious changes you want to make are ‘myforums.com’ to whatever domain you are serving, you can append multiple aliases to the server_name string such as ‘server_name domain.com alias www.domain.com alias sub.domain.com;‘. Now, lets take a look at some of the important options in the vhosts configuration:
listen 80;
This is the port which nginx will listen on for this vhost, by default unless you specify an IP address with it, you will bind port 80 on all local IP’s for nginx — you can limit this by setting the value as ‘listen 10.10.3.5:80;‘.
proxy_pass http://10.10.6.230;
Here we are telling nginx where to find our content aka the backend server, this should be an IP and it is also important to not forget setting the ‘proxy_set_header Host’ option so that the backend server knows what vhost to serve.
proxy_cache_valid
This allows us to define cache times based on HTTP status codes for our content, for 99% of traffic it will fall under the ’200 301 302 20m’ value. If you are running allot of dynamic content you may want to lower this from 20m to 10m or 5m, any lower defeats the purpose of caching. The ’404 1m’ value ensures that not found pages are not stored for long in case you are updating the site/have a temporary error but also prevent 404′s from choking up the backend server. Then the ‘any 15m’ value grabs all other content and caches it for 15m, again if you are running a very dynamic site you may want to lower this.
proxy_cache_use_stale
When the cache has stale content, that is content which has expired but not yet been updated, nginx can serve this content in the event errors are encountered. Here we are telling nginx to serve stale cache data if there is an error/timeout/invalid header talking to the backend servers or if another nginx worker process is busy updating the cache. This is really useful in the event your web server crashes, as to clients they will receive data from the cache.
location /admin
With this location statement we are telling nginx to take all requests to ‘http://myforums.com/admin’ and pass it off directly to our backend server with no further interaction — no caching.
That’s it! You can start nginx by running ‘/usr/local/nginx/sbin/nginx’, it should not generate any errors if you did everything right! To start nginx on boot you can append the command into ‘/etc/rc.local’. All you have to do now is point the respective domain DNS records to the IP of the server running nginx and it will start proxy-caching for you. If you wanted to run nginx on the same host as your Apache server you could set Apache to listen on port 8080 and then adjust the ‘proxy_pass’ options accordingly as ‘proxy_pass http://127.0.0.1:8080;’.
Extended Usage:
If you wanted to have nginx serve static content instead of Apache, since it is so horrible at it, we need to declare a new location option in our vhosts/*.conf file. We have two options here, we can either point nginx to a local path with our static content or have nginx cache our static content then retain it for longer periods of time — the later is far simpler.
Serve static content from a local path:
        location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$ {
            root   /home/myuser/public_html;
            expires 1d;
        }
In the above, we are telling nginx that our static content is located at ‘/home/myuser/public_html’, paths must be relative!! When a user requests ‘http://www.mydomain.com/img/flyingpigs.jpg’, nginx will look for it at ‘/home/myuser/public_html/img/flyingpigs.jpg’ . The expires option can have values in seconds, minutes, hours or days — if you have allot of dynamic images on your site then you might consider an option like 2h or 30m, anything lower defeats the purpose. Using this method has a slight performance benefit over the cache option below.
Serve static content from cache:
         location ~* ^.+.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$ {
             proxy_cache_valid 200 301 302 120m;
             expires 2d;
             proxy_pass http://10.10.6.230;
             proxy_cache one;
        }
With this setup we are telling nginx to cache our static content just like we did with the parent site itself, except that we are defining an extended time period for which the content is valid/cached. The time values are, content is valid for 2h (nginx updates cache) and every 2 days the content expires (client browsers cache expires and requests again). Using this method is simple and does not require copying static content to a dedicated nginx host.
We can also do load balancing very easily with nginx, this is done by setting an alias for a group of servers, we then define this alias in place of addresses in our ‘proxy_pass’ settings. In the ‘upstream’ option shown below, we want to list all of our web servers that load should be distributed across:
   upstream my_server_group {
    server 10.10.6.230:8000 weight=1;
    server 10.10.6.231:8000 weight=2 max_fails=3  fail_timeout=30s;
    server 10.10.6.15:8080 weight=2;
    server 10.10.6.17:8081
  }
This must be placed in the ‘http { }’ section of the ‘conf/nginx.conf’ file, then the server group can be used in any vhost. To do this we would replace ‘proxy_pass http://208.76.83.135;’ with ‘proxy_pass http://my_server_group;’. The requests will be distributed across the server group in a round-robin fashion with respect to the weighted values, if any. If a request to one of the servers fails, nginx will try the next server until it finds a working server. In the event no working servers can be found, nginx will fall back to stale cache data and ultimately an error if that’s not available.

Conclusion:

This has turned into a longer post than I had planned but oh well, I hope it proves to be useful. If you need any help on the configuration options, please check out http://wiki.nginx.org, it covers just about everything one could need.
Although I noted this nginx setup is deployed on a Xen guest (CentOS 5.4, 2GB RAM & 2 CPU cores), it proved to be so efficient, that these specs were overkill for it. You could easily run nginx on a 1GB guest with a single core, a recycled server or locally on the Apache server. I should also mention that I took apart the MySQL replication cluster and am now running with a single MySQL server without issue — down from 4.