Monday, November 17, 2008

Small surprise from Git 1.6

After installing Git 1.6.0.4 on my Leopard box I noticed that lots of git-* executables are finally gone in favor of the single git binary. I like it, as I never felt that qmail or old-git style is right. By the way, if you're on Mac don't forget turning on bash_completion variant for git core:

sudo port install git-core +doc +bash_completion

Also set up aliases (git st really saves your fingers), colors and personal data:

git config --global alias.st status
git config --global alias.ci commit
git config --global alias.co checkout
git config --global alias.br branch
git config --global color.ui auto

git config --global user.name "Your name"
git config --global user.email your-email@your-domain.com

Finally, some .bash_profile goodness to track branch in bash prompt (and add colors):

PS1='\[\033[01;32m\]\h\[\033[01;34m\] \w\[\033[31m\]$(__git_ps1 "(%s)") \[\033[01;34m\]$\[\033[00m\] '

Wednesday, November 12, 2008

Rails and Amazon EC2

While we were planning the launch of CookEatShare, a question of the good Rails hosting was heavily discussed. We've been considering MediaTemple, Amazon EC2 and a couple of other solutions (hard to remember right now, potentially Slicehost). Finally we chose EC2.

So after I have a lot of experience deploying to EC2, let's try to answer the question: is Amazon EC2 good for Rails applications?

Ok, first let's see what EC2 is. The official definition is "a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers." But what does that mean to the application maintainer?

Generally, it's an AWS (Amazon Web Services) account with a couple of associated keys that allow you launch and shut down the instances. You can treat an instance as a physical box with given characteristics with one nuance: if your instance fails (or you terminate it manually) all your data is lost. That is why EC2 is positioned and used more like the platform for streamlined tasks such as video processing or distributed calculations. It may not sound appropriate for the web application, but let's turn to the bright side.

As an advantage, that non-persistence teaches the discipline. When you know that your data will be lost if (I would even say 'when', not 'if') something goes wrong, you'll definitely have backups. And you'll pay more attention to a deployment scheme, especially to ease adding instances later (We didn't scale to the second instance yet but can't wait to try!). And it's fun to have the Capistrano task that builds nginx from source and compiles fair proxy balancer in!

To guard the sensitive data (database, user-uploaded content in case you don't store it on S3), Amazon EBS service can be useful. It's just a block device that can be attached to the instance and mounted as a usual partition. It's persistent and pretty robust, but since we started to use EBS our MySQL server started to behave strangely from time to time. Whether this is Amazon problem or some bug in our configuration, I still don't know. Anyway, I keep trying to fix it.

Instances are pretty stable: previous one worked for almost a year with one hangup that was solved with rebooting) until I manually terminated it after moving to a new one (due deployment change and distribution upgrade).

Disadvantages are not very numerous so far: email rejecting problem (see below), strange EBS behavior with MySQL (again, the real source is still unknown), more effort required to organize the deployment, not all Linux distributions are officially supported, that's all I can remember right now.

So, EC2 can be decent platform for the Rails application deployment. Yet I wouldn't recommend it unless you back it up with the application restore scheme that will allow to operate quickly in case of the instance failure.

Useful facts:

  • if you reboot an instance, the data won't be lost, only the termination leads to the data loss;
  • keeping configuration files (like app server settings, mysql configs, mail server settings) in the application itself is a very good idea. Especially if you need to launch another instance quickly;
  • sending out email from EC2 can be tricky: most servers will reject your email because of blacklisted IPs. This excellent article really helped me set up our mail infrastructure;
  • it's better to start with Elastic IP to make moving from instance to instance seamless, otherwise you will be assigned a dynamic IP that cannot be transferred to another instance.

EC2 solutions:

  • Ubuntu images for EC2 — probably the best Ubuntu AMIs available;
  • EC2 on Rails — very good deployment solution, my own deployment plugin was heavily inspired by this work;
  • Rubber, another decent alternative, it was tricky to follow and I don't need multi-instance setup at the moment, yet it's very interesting;
  • Deployer — my own small solution extracted from CookEatShare. It's not general and I doubt that it works out of the box on the arbitrary instance but I am going to put some care into this product in the future, please fork it if you want to use it and make improvements.

Friday, September 19, 2008

Turning off email delivery for test users

For some reasons I happen to like a mail server configuration. Maybe it's just with Postfix, I didn't try anything else for ages. This is why the following task seemed interesting to me.

Let's say we have two sites: first a beta with latest features under testing and second is the production one. We don't want our e-mails to be sent on the beta except for some users who test the site. But we can't just remove other users because the good infrastructure is critical for the beta site (and it's fun to test with a recent copy of the production data). One good approach is to change the users' emails to point somewhere like username@example.com thus guaranteeing that those emails won't be delivered to the real person. But we can actually go further by telling Postfix to discard all messages for anyone@example.com without even trying to deliver them.

It's easy to do with header checks. So, here's a little snippet:


In /etc/postfix/main.cf:
header_checks = pcre:/etc/postfix/header_checks

In /etc/postfix/header_checks:
/^To:.*@example.com/ DISCARD

Tuesday, September 16, 2008

Avoiding simultaneous run for rake tasks

Sometimes we cannot guarantee that some periodic job will be finished before its next call. Especially if the environment is heavily loaded or the interval between job runs is small enough relative to a processed data set. This is why I decided to run some rake tasks wrapped in this helper:


def if_not_running(task, &block)
  lock_file = Rails.root + "/tmp/running-task-#{task.name}.lock"
  unless File.exists?(lock_file)
    FileUtils.touch(lock_file)
    begin
      yield
    ensure
      File.delete(lock_file)        
    end
  else
    puts "#{Time.now}: task #{task.name} is already running, skipping"
  end
end

I wonder if the helper covers all situations (and possible race conditions). Anyway it seems to work fine at the moment.

Usage:


namespace :test do
  task :test => :environment do |t|
    if_not_running(t) do
      puts Rails.env
      sleep 20
    end
  end
end

Thursday, September 11, 2008

If your /etc/cron.d/* tasks don't seem to work

Then probably there is no newline at the end of file. I am working on a set of Capistrano tasks that set up Cron jobs, like this:

set :daily_tasks, %w(digest:daily)
set :weekly_tasks, %w(digest:weekly)
set :schedule_tasks, {"do_something" => "*/5 * * * *", "do_something_else" => "* */2 * * *"}

I generate shell scripts in /etc/cron.daily and /etc/cron.weekly for the daily and weekly tasks, and put cron records in /etc/cron.d/* files for the the custom schedules.

Here's the line responsible for a /etc/cron.d entry generation:

cron_record = "#{schedule} #{user} #{task_body(task)}"

Actually it didn't work. All files were in the place but they weren't picked up by the cron daemon because of no tailing newline in a cron_record. Not very obvious error for sure!

Here's the final version of the helper:

def add_cron_task(task, schedule, user, options = {})
  cron_record = "#{schedule} #{user} #{task_body(task)}\n"
  cron_file = "/etc/cron.d/#{task_name(task)}"
      
  process_and_push_config(cron_file, cron_record, options)
end

Sunday, September 7, 2008

Pragmatic Thinking and Learning: a great book on learning and self-improving

Sometimes I think the software development is so nervous and stressful, with all the deadlines, loud co-workers and amount of information to process. So I ask myself: how all these guys out there handle this? Maybe it's just personal issues? What can we do to improve everyday work? How to acquire all the knowledge and stay sane?

This is why I really liked the new book from Pragmatic Programmer — "Pragmatic Thinking and Learning". Not only it gave me answers for those questions but uncovered such interesting topics as the human brain, the conscious and Dreyfus models of learning. Bravo, Andy Hunt!

It's still in Beta but looks pretty release-ready. My highest recommendation, especially for programmers, although it can be useful for anyone who wants to learn.

Thursday, September 4, 2008

Handling deployment issues with Git and Capistrano

Well, that happens sometimes: a big bad piece of the early development changes suddenly leaks to the production. But if you have Capistrano and Git everything comes up pretty manageable.

On CookEatShare we use the next deployment scheme:

  • the master branch of our git repository is the development mainline, it's accessible to all team members and used mostly to share intermediate work results; most of us do the development in the private branches, rebase those to the master branch, then merge into master and push/pull the changes;
  • the staging branch used, well, for the staging. The master branch is being merged to the staging when there is something to demonstrate. Usually it's a working code with possible bugs here and there;
  • the production branch is current cookeatshare.com state: staging is being merged there only if everything is fine and the current bugs were fixed. Capistrano deploys to the production from that branch.

We don't have a strict release management discipline. We're a startup and everyone is wearing multiple hats, so deployment can be done by anyone who has something to show. Sometimes it really screws up everything.

So, what to do if some very important patch came into the production and took some hairy guests with him. First, is, of course, cap deploy:rollback. After that we have our previous state restored but there are still 2 problems:

  • Our important patch is somewhere deep inside of the current mainline and it's still not applied to the production;
  • The production branch is no longer pristine, it has all the undesired commits.

So here's the thing:

cap deploy:rollback
git co production

Now take a look at git log and find the commit with patch (let's say it is P)

git diff P^1 P > our.patch # Keep patch at hand
git reset --hard PREV # PREV is pristine production state before unfortunate merging
git apply our.patch # We can do it in separate branch forked of current one if we need to rebase to other branches.

Actually this patch didn't apply at the first try, so I had to do some small changes, fortunately the patch was tiny, only 5 lines changed in 3 or 4 files.

git ci -a -m "Patch applied" # I use aliases for commit, checkout, status etc.
git push origin production --force # Note '--force' as it's not fast-forward push
cap deploy

Phew, thanks git for making all of this so easy (and there's maybe even easier way if not in the middle of the night with CEO on Skype).

Anyway, to correctly apply the patch to the production it should be forked from the production itself, merged back, and then rebased and propagated to other branches). But sometimes things are not that nice and maybe this will help you (if you use separate branches for the multistaging).

Finally, some sleep!