Technology


17
Jan 11

Performance benchmarking the node.js backend of our 48h product, WeHearVoices.net

I decided to spend some time looking into improving the performance of the node.js backend of our 48h hackathon product We Hear Voices – a dead-simple but slick feedback tool for websites and web apps (go get it for your site ;) .

Anyway, I didn’t do any performance optimization during Garage48 since we were adding functionality rather than worrying about performance; but as you can see at the end of this post the backend is now optimized for much greater performance.

Here are my initial benchmarks, node.js vs Django:

The unoptimized implementation for the backend is hitting the database on every request and creating new MySQL client connections on each load; it does not use any caching. Given this, the performance fo node is almost 190 requests per slower than Django:

Node.js requests per second:    275.41 [#/sec] (mean)

Python (front page of site) requests per second:    454.49 [#/sec] (mean)

I decided to tackle the client connections first by implementing MySQL client pooling. Instead of creating new MySQL connections, the server uses a pool of 10 MySQL connections to perform queries.

The resulting performance takes Node.js to a similar level of performance as Django. Note that those queries are still there, loaded each time and we have only eliminated the latency from connecting to MySQL.

Node.js requests per second:    417.23 [#/sec] (mean)

Next, I implemented caching for questions, with a cache lifetime of 30 minutes:

Now node.js is twice as fast as Django. Note that in this naive caching, the result is not expired immediately if the user changes the questions. For that, we need to expose a way for the frontend to tell the backend that the user updated the question – which is rather simple to implement (later).

Node.js requests per second:    848.21 [#/sec] (mean)

I added some more improvements:

This improves the performance by a further 300 requests per second. There is some degradation at the very slowest requests, but most requests are faster (98% complete in less than 192 ms). I still need to look into the performance degradation, although the requests per second tended to remain rather constant even when I tested it with 20 000 requests at concurrency of 1000 simultaneous requests.

Node.js requests per second:    1135.72 [#/sec] (mean)

Then I refactored the code a bit more, and added request-level caching (which runs the minimum amount of logic while taking into account the necessity of getting and setting per-user cookies and per-referrer questions).

Node.js requests per second:    1879.86 [#/sec] (mean)

Time per request:       53.196 [ms] (mean)

Time per request:       0.532 [ms] (mean, across all concurrent requests)

The very lowest curve, nodejs-hello-world is the performance of a node.js server which simply returns Hello World. We aren’t quite at that level since we need to some routing and cookie setting/getting.

There are a few special cases where performance is somewhere between nodejs-p-cache-2 (notably if you have two identical URLs mapped to different questions) and nodejs-p-final; but even those get cached after a couple of questions.

However, I am satisfied that any of our early users will get good performance out of the service. Testing the most recent code with 1000 concurrent users gives similar performance (50-60 ms) with a couple of outliers which take longer.

What this means that even if you send us a continuous stream of traffic where 1800 users load a page each second, we can still cope with it :) (with some caveats, so maybe 1200+ requests per second would be more appropriate). I think the next problem is to get that traffic…



17
Jan 11

Attending Garage48 Helsinki 2011, my experience

I just came back from Garage48, a 48 hour hackathon organized by Aalto Entrepreneurship Society and Garage48 in Helsinki. I had a ton of fun, met some interesting new people and built something awesome!

Our team, WeHearVoices, built a dead-simple but slick feedback tool for websites and web apps. Go have a look and register if you want a simpler, cheaper tool for collecting feedback from your users. The admin UI is based on an inbox with starred items; the integration only requires that you add two lines to your page. We’ll decide on the next steps based on the feedback we get – so let us know if you want this product and do register/write your email so we can get back to you.

Update (day after Garage48): Some performance benchmarks for the WeHearVoices backend after I spent some time optimizing it.

It was a great experience! I loved being able to wrap up and ship a product that I would want to use in just two hectic days; closing a couple of sites as customers of the product during the hackathon was the cherry on top.

The great thing about our team was that we all shared a vision for a product that got its value from being simple – that made it possible to just do two meetings during whole thing (initiation and refinement) and we worked really well together despite having different skills and responsibilities.

I learned a lot more about building stuff with Node.js; got exposure to Django and Ubuntu server (if you follow this blog you’ll know that I started Node.js during Christmas and do my server stuff on Centos) and learned about the way in which other people do web product work. Thank you Jens, Jori and Jukka! And thank you  Garage48 and AaltoES!

You can find the list of startups at the end of this post. And here are some random notes that I want to remember for the next hackathon:

For thinking about ideas:

  • Big-and-visionary vs. small-and-shipped. The bigger your idea, the less likely that you’ll get it shipped. We went with the small-and-shipped, which I think was liked by many entrepreneurs, but projects are evaluated based on both their ambition and their ability to execute on that ambition. If you want to be judged favorably, the sweet spot is probably somewhere in between the two ends with emphasis on working on the business, not the tech. For me though – with my unlimited supply of projects to do – shipping is an epic win.
  • Size isn’t just about scope, it is also about expectations. Think about the  expectations you create, since those determine how people judge your outcome. I think Bookstrap got too much flak from the judges since they weren’t convinced that it was possible to build good-enough accounting software over a hackathon.
  • Judging is probably based on the interests of the sponsors (e.g. promoting their platform) – know their tech and you’ll be more eligible to win.
  • Unusual is interesting. Standing out from all the other products is likely to make you an audience favorite if you can execute (that’s a big if). Montroller did a great job with this!

For future productivity as a developer:

  • Get your server set up in advance; target the big-3 if you can (Ruby, Python, PHP). While setting up a server is insignificant on a long-term project, it does eat some time which could be spent better. We had two servers bought but there was still some setup work which took a couple of hours. I’d like to have that done in advance next time.
  • Have an automated way of doing deployment which uses a DCVS as frontend. We had a deploy script which was based on Python, which was mostly fine. However, since two (out of four) on our team didn’t know Python and didn’t have it set up on our computers, we ended up having the Python devs use the script while others only used Mercurial. If we had hosted the repositories directly on the server, and just pushed data directly to it (rather than via BitBucket) then we could have probably saved some troubles. Ideally, a push should trigger a standardized set of deployment actions which would be maintained on the server rather than a client-side deployment script, since it reduces the amount of stuff that people need to learn. I’ll have to think about that, since I’ve written quite a bit about deploying and repo hosting using Mercurial.
  • Learn both git and hg. Since it seems that Windows supports hg well but does OSX rather poorly, while git supports OSX well but isn’t quite as smooth on Windows (both are awesome on Linux). Hence people tend to know one or the other; if you know the basics of both, it’ll be easier to get started. It’s also a good idea to keep the cheat sheets and recommended configurations for the chosen DCVS somewhere so that people who need to switch can have get their environment set up quickly (e.g. username settings, diff tools). I still need to play around more with branching, since I am the only coder on my current projects and haven’t yet tried out the variations.
  • For high productivity, use a shared technology/shared language to connect dev tasks. I wished I had known more about Django, since we used it for the adminstrative user interface and I probably could have helped more. Everybody knows Javascript, CSS, HTML and database schemas, so those provide a useful way to share work done with others. If you don’t have the time, at least learn the frontend stuff (e.g. templating) since that allows you to contribute more. We ended up specializing so that the backend was written in Node.js, the frontend via Django and the widget design using jQuery; it worked here but I wonder about the next hackathon.
  • Have people who can dedicate time to think about particular aspects of the product; even if in theory you can do frontend or marketing, it’ll be hard to keep switching between roles. Our solution to this was not to worry about marketing at all during the 48hrs – but that wouldn’t work for all projects and we realized later that we ought to at least have had an about-us page.
  • Worry about the iteration you are starting now, not the next one. It’s a tech thing to start thinking too far ahead, since we don’t usually do sprints (e.g. 48h excluding sleep time) but marathons (months+). We did this right in our team: there was very little time during wasted on thinking about pie-in-the-sky architectures.

Want to be popular at the next event as a developer?

  • Do real-time technology; everything social benefits from having someone who can do Comet well. I had a lot of possibilities thanks to this!
  • Learn mobile development. Location sensing and various alternative user interaction methods are pop. Doing cross-platform well in particular is impressive.
  • Release your boilerplate code as open source. I did this, though we didn’t get use the code since we had different tech backgrounds.
  • More good tips here.

The Garage48 ideas, linked & categorized (randomly) for your convenience:

  1. Consumer-oriented
    1. SportsTradr – bet on your favorite teams in sports, win money and share the information with your friends easily
    2. FloFlo - design a flower bouquet online, order it, share the order on social media. Remember occasions when to send flowers.
  2. Social
    1. Aidbook48 - A website to make sense of the development assistance. Support organisations don’t know what others are doing – make a transparent marketplace for sharing information & knowledge.
    2. Readlish - “Social instapaper” for Kindle. App for finding & reading content for Kindle: web articles, links people share etc.
    3. CrapWall - 4chan on Foursquare. Location based messagewall. Post fun photos, links, comments for a venue.
    4. Let’s Meet Here! – Send meeting requests to friends from widget on venue (museum, restaurant, club etc) website to meet at that venue.
  3. Small-business
    1. WeHearVoices – a dead-simple but slick feedback tool for websites and web apps.
    2. Ordimo – Make an order at bar or restaurant via mobile phone, when lazy restaurant staff don’t come to your table and you need a refill.
    3. Bookstrapp – Simple & easy to use accounting software for startups. Basic invoicing & debit/credit to get started.
    4. IdeaHub – Service to find co-founders, match with other startup people, solve problems, create virtual teams fast. Like an online version of Garage48 events.
    5. LapLab – Application to measure and improve lap times for racing car drivers. Build a web-service to visualise lap times based on GPS data and race faster.
  4. Location-based
    1. TranSpotr – See on the map best areas to move in. Visualization of local rent, living conditions, transport etc.
    2. iTaxi – Call a Taxi from mobile, using location based service.
    3. TipTrails - Mobile app that sends your location data in background in real time and notifies about your friends sips, postings & locations.
  5. Other: mobile/game
    1. Montroller - Control a robot kit from your mobile phone on Nokia, Android etc.
    2. BlowEm - Mobile game which uses the power of your lungs to blow things (via microphone of mobile phone).



13
Jan 11

Getting started with Useradmin, my Kohana 3 auth admin module

In this post, I discuss the details of how to use my admin module. Hopefully this covers all the stuff you need to know; if not, let me know in the comments. I’m just one guy, so it may be quicker to ask on the Kohana forums for troubleshooting. Previous writing:

  1. Setting up the basic Auth in KO3 (part 1)
  2. An overview of the functionality provided by the Auth module (part 2)
  3. Kohana 3 auth: sample implementation and documentation (part 3)
  4. Getting started with Useradmin, my Kohana 3 auth admin module (part 4; this part)

KO3.1 support is DONE, updating this tutorial to match that now. See part 3 above for changelog.

1. Installation

Changes from 3.x version of the module to 3.1.x compatible version

The useradmin module is no longer offered as a “single application” with Kohana bundled in the same repository. Instead, only the content of the /modules/user directory is now in the repository. This makes it easier to work with the repository in Github.

  • You should copy the module to your /modules -directory.
  • You need to have the kohana-email module for sending forgotten password emails for now, if you want to use that feature.
  • Kohana 3.1 no longer bundles the pagination module which is used in the admin interface. You have to get that module and enable it.

Database schema

Import the MySQL schema from /schema.sql. It will create a “useradmin” database. You might want to rename that when you start; in that case you need to change /application/config/database.php.

Watch this space! Right now, the database schema’s passwords do not work out of the box. I’m working on making a script to generate a secure starting SQL file for you… For now, use Auth::instance()->hash(‘password’) and then reset the admin password via your MySQL admin tool.

Module load order

Make sure that Useradmin is loaded before Auth in your bootstrap.php, because otherwise Kohana will not load the correct Model_User (it’ll use the one in Auth, not the one in Useradmin). Example with the minimum required modules:

Kohana::modules(array(
   'user'       => MODPATH.'user',       // Useradmin module
   'auth'       => MODPATH.'auth',       // Basic authentication
   'database'   => MODPATH.'database',   // Database access
   'orm'        => MODPATH.'orm',        // Object Relationship Mapping
   'pagination'        => MODPATH.'pagination',        // Pagination
   'oauth'        => MODPATH.'oauth',        // Kohana-Oauth for Twitter
   'kohana-email'        => MODPATH.'kohana-email',        // Kohana-Email for email
   ));

The required modules are user, auth, database, ORM, Pagination. Optional modules: Oauth (for Twitter) and kohana-email (for email sending support).

Writable directories

Make sure the /application/logs and /application/cache directories exist and are writable by your server (you’ll get an error if they aren’t).

Copying the static files for performance

The useradmin module now includes a simple media serving capability so that you can get started by just including the module.

However, since it is not a good idea for performance to load CSS and image files via Kohana, you should copy the /public folder to wherever you put your webroot. This way Apache will load it directly (since direct file accesses are preferred in the Kohana default htaccess file).

KO3.1 Auth configuration

In Kohana 3.1, the default hash method is now sha256 instead of sha1. This means that there is no salt_pattern; and that old KO 3.x passwords are not compatible with KO 3.1! See the discussion on this bug for more information. TL;DR: the salt pattern is weak, so if someone steals your database but does not know your salt_key, they can deduce it easily and perform a dictionary attack.

Instead, you need to configure your hash_key which gets passed to http://php.net/manual/en/function.hash-hmac.php. You can also use any of the hast_hmac() supported algorithms if you want to.

Use a random hash_key, for example from: https://www.grc.com/passwords.htm

return array

(
	'driver' => 'ORM',
	'hash_method' => 'sha256',
	'hash_key' => NULL, // replace with random string
	'lifetime' => 1209600,
	'session_key' => 'auth_user',
	'users' => array(),
);

Migrating from KO3.x

Watch this space! I’m working on a better migration path than throwing out the old password database…

Important: Note that the password column should be CHAR(64) for sha256.

2. Useradmin configuration

By default, reCaptcha support and Facebook logins are disabled, but password reset via email is enabled.

Facebook login

To enable Facebook login, set the “facebook” option to true in config/useradmin.php. Then you need to copy /modules/user/config/facebook.php as /application/config/facebook.php and set app_id and secret to the values you got from Facebook. You need to register your site/app here to get those values. That’s really all it takes; you can then start accepting Facebook logins.

For more info about how Facebook logins work, see my series on implementing Facebook login. No additional database changes are needed if you are using my schema.sql; otherwise you need to add one extra field to your User table (`facebook_user_id` BIGINT( 20 )).

ReCaptcha on registration

To enable a ReCaptcha check on registration, set the “captcha” option to true in config/useradmin.php. Then you need to copy /modules/user/config/recaptcha.php as /application/config/recaptcha.php and set privatekey and publickey to the values you get from reCaptcha. Register for reCaptcha here.

Disabling password reset via email, Facebook login or ReCaptcha on registration

If you want to disable Facebook logins, or disable the password reset via email functionality, then copy /modules/user/config/useradmin.php to /application/config/ and set either “facebook” or “email” to  false. You can also change the address from which the password reset emails are sent in that file.

3. Customization

Creating your own controllers which extend Controller_App

All the controllers in Useradmin inherit from Controller_App in /modules/user/classes/controller/app.php.

You’ll want you own controllers to also inherit from it, since Controller_App defines a before() action which performs the auth checks.

In addition, Controller_App provides support for template rendering: it defaults to using /modules/user/views/template/default.php. If you want to override that template, you’ll want to copy the default.php file to /application/views/template/default.php and modify it.

The Controller classes for Useradmin default to using /modules/user/views/template/useradmin.php. This means that you can have one UI template for Useradmin, and another for the rest of your application.  Alternatively you can integrate the two by adding links to your own template.

Naming your controllers

If you are OK with not using Controller_User, Controller_Admin_User and Controller_App as names in your application, then there will be no naming conflicts and you can just add your controller to /application/classes/controller/.

If you want to extend these controllers without copying, this is possible starting with the Jan 2011 release. Just create /application/classes/controller/user.php and in it do this:

<?php
class Controller_User extends Controller_Useradmin_User {
 // your code here
}

This will simply extend the base class defined in the useradmin module and is useful because you don’t need to copy all of the contents of the useradmin-defined controller when you just want to add a few methods or add/change/override properties.

Setting up auth rules for your controllers

You will want to set up auth rules for your controllers. To set a default rule for your whole controller, set the controller’s $auth_required to a string which is the role that is needed to access the controller (or an array of roles if multiple roles are required). In Kohana, the “login” role is given to all users who are allowed to log in, so you can check for the “login” role if you want to ensure users have logged in.

To set the roles required for individual actions, use $secure_actions. It is an array, indexed by action name, of strings or arrays of roles required for each particular action. Here is an example (you might want to copy the comments from Controller_App as well so that the properties are documented):

class Controller_Chat extends Controller_App {
   public $auth_required = 'login'; // FALSE | string | array
   public $secure_actions = array('chat' => 'admin'); // array( action => role)

   function action_index() { ... }
   function action_chat() { ... }
}

In the example, the login role is required to access the controller, and the “chat” action requires the admin role.

4. Using the bundled extras

I’ve bundled three helpers within the module. You can find them in /modules/user/classes/. These are:

Message - A simple helper for setting session messages in controllers. Has two static methods:

  • Message::add($type, $message). Adds a message to the session. Type specifies what the class attribute will say on the output.
  • Message::output(). Returns the messages set in the session as a set of div tags with the message as the content and the type as the class attribute.

Two basic classes are defined in the css file, which are “error” (red div) and “success” (green div). So you can add messages like this: Message::add(‘success’, ‘Saved data.’) and a div (class=”success”) will be displayed on the next load. Multiple simultaneous messages are supported. The included template file echos Message::output() before the content to show the messages.

Appform - An API-compatible stateful extension to the core Form which shows error messages and additional information in context. Appform calls work exactly the same as core Form calls (e.g. Form::select becomes $appform->select). However, it is stateful: you can set $appform->errors to errors, $appform->defaults to default values for fields and $appform->values to values. More details are available on my post discussing Kohana validation with forms. You have to initialize it using $appform = new Appform() in your views.

Helper_Format. Totally nonessential, it just helps format dates in the useradmin views. See the source code for details.

Note that these do not support transparent extension; this is because I think you should re-implement them anyway in your own application.

5. Extending the user model and adding stuff to the user profile

Adding additional fields to Model_User

It’s not mandatory to have a Model_User, since there is a default one in /modules/user/model/user.php.

Thanks to Kohana 3′s HMVC loading, you can start simple and work your way up to a more comprehensive rewrite. If you just need an extra field, you can add that in the database. The code for saving users is flexible enough that it will save new fields if they exist in the database.

For the view, you can just copy /modules/user/views/user/profile_edit.php to /application/views/user/profile_edit.php and add the extra field input(s) there.

Extending Model_User with new or improved functions

The Model_User in Useradmin is empty and extends Model_Useradmin_User. To extend Model_User in you app, you can create your custom Model_User in /application/classes/model/user.php, and make it extend Model_Useradmin_User.

Note: Support for extending via Model_Useradmin_User was added in the Jan 2011 release.

Adding associations to Model_User

If you need to add an association to another model, you’ll need to redefine that property of Model_User; make sure you copy the auth associations ($_has_many) from the most recent version of Model_Useradmin_User, then use extends to extend without copying the rest of the code, for example:

class Model_User extends Model_Useradmin_User {

    protected $_has_many = array(
      // New association: attachment
      'post' => array('model' => 'post),
      // Copied from Model_Useradmin_User
      '
roles' => array('through' => 'roles_users'),
      '
user_tokens' => array(),
      );
}

Customizing the user profile view and administrative edit view

Again, you can start implementing your custom views by copying the view files from /modules/user/views/* to /application/views/*, then improving them. Kohana HMVC will take care of loading your application view files instead of the module default views.

Appendix: Auth module models

Useradmin is based on the core KO3 Auth. It does not define Model_Role and Model_User_Token. Instead, these are loaded directly from the KO3 core ORM module (they were moved there in KO 3.1). If you want to redefine these models, make sure you extend Model_Auth_Role and Model_Auth_User_Token in your Model_Role/Model_User_Token (see the core Auth module for definitions).


9
Jan 11

Implementing Facebook login (part 3)

Damn Facebook documentation! I finally managed to piece together a better picture – after first getting all that info in the previous posts (part 1, part 2) – about how registration and login can be handled.

Basically, you can’t use the registration tool unless you really want to replace your registration dialog with Facebook’s own version. You can’t redirect to it conditionally, e.g. if the user has Facebook and clicks the login button, since the login will already trigger the Facebook authorization process.

So unless you want two logins (Facebook login link and regular login form) and two registration links (regular registration page and Facebook register page), you can’t use the registration tool. My ideal implementation would just have a single link for everything related to Facebook, but present the full registration form if the user clicks it and has not yet registered with my site… no luck.

Hence, I will do this the traditional way:

  • Use the Javascript SDK to render the Facebook Login link and popup.
  • Ask for additional permissions on the Login.
  • When the authorization returns, process the user login / registration.
  • If the user wants to register using Facebook, then they will still need to fill in additional fields in a separate field after authorizing the information we can read from Facebook.

Preparations

Get an app_id and a secret from Facebook. Register your website/app here to get an app_id and secret.

Add the Facebook namespace. Your HTML tag should contain the additional xmlns:fb -attribute so that the Facebook custom tags work with IE:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml">

I did not know you could add XML namespaces to your documents and actually have custom tags which work in all current browsers, but since Facebook does it, it must have been tested well.

Rendering the Facebook login link and popup

<div id="fb-root"></div>
<script>
    window.fbAsyncInit = function() {
        FB.init({
            appId   : '<?php echo $facebook->getAppId(); ?>',
            session : <?php echo json_encode($session); ?>, // don't refetch the session when PHP already has it
            status  : true, // check login status
            cookie  : true, // enable cookies to allow the server to access the session
            xfbml   : true // parse XFBML
    });
    // whenever the user logs in, we tell our login service
    FB.Event.subscribe('auth.login', function() {
       window.location = "<?php echo URL::site('/user/fb_login') ?>";
    });
  };

  (function() {
    var e = document.createElement('script');
    e.src = document.location.protocol + '//connect.facebook.net/en_US/all.js';
    e.async = true;
    document.getElementById('fb-root').appendChild(e);
  }());
</script>

Asking for additional permissions

You have to do this by adding the perms attribute to the fb:login-button tag. The links in the docs are broken, but here is the page documenting the login button and the extended permissions you can ask for are listed here.

We’ll just ask for the email address:

<fb:login-button perms="email" size="large"></fb:login-button>

Processing the resulting login or registration

As you can see above, we subscribe to the auth.login event using Javascript and specify that on completion, the user will be redirected to /user/fb_login/.

On that page, we need to:

  1. Check whether a user account exists with the returned Facebook user id.
  2. If no user is found with the FB uid, retrieve the user’s email via the Graph API and check whether a user exists with that email. If so, we will merge the user.
  3. If no user is found with the FB email, create a new account.
  4. If creating a new user fails, then accept defeat and show the regular registration page.

A complete implementation of this available from Bitbucket – soon! I’m releasing it as part of my Useradmin module for Kohana 3.

Logging out

We need to handle logging out sensibly. I prefer that the user explicitly clicks a login button if they do not currently have a session with our application. Automatic relogin is confusing, particularly when logging out usually redirects to the login page.

So we need to handle the case where the person is connected (logged in to Facebook and authorized with us) by displaying a custom login button.

You can find the Facebook logo from the branding site. Of course, a Google search for facebook logo won’t find it, so click here to get the image that you are allowed to use for this purpose.

window.fbAsyncInit = function() {
        FB.init({
            appId   : '<?php echo Kohana::config('facebook')->app_id; ?>',
            status  : true, // check login status
            cookie  : true, // enable cookies to allow the server to access the session
            xfbml   : true // parse XFBML
    });
    // whenever the user logs in, we tell our login service
    FB.Event.subscribe('auth.login', function() {
       window.location = "<?php echo URL::site('/user/fb_login') ?>";
    });
    // if the user is already logged in, redirect them to the login action
    // they cannot reach the login page if they are already logged in
    // since login() redirects to profile if the user is logged in
   FB.getLoginStatus(function(response) {
     if (response.status == 'connected') {
        document.getElementById('fb-login-li').innerHTML = '<a href="<?php echo URL::site('/user/fb_login') ?>"><img src="/img/fb-login.png"></a>';
     } else {
        document.getElementById('fb-login-li').innerHTML = '<fb:login-button perms="email" size="large"><?php echo __('Login / Register with Facebook')?></fb:login-button>';
        FB.XFBML.parse(document.getElementById('fb-login-li'));
     }
   });
  };

Appendix: all fb:login-button attributes

The Facebook documentation does not list the fb:login-button attributes, which is infuriating. I found a forum post (here), which documents the optional attributes of fb:login-button:

  • condition; string Indicates whether the button is visible or hidden.
  • size; string Specifies the size of the button. Specify icon to display a favicon only, or small, medium, large, or xlarge. (Default value is medium.)
  • autologoutlink; bool If true and the user is already connected and has a session, then the button image changes to indicate the user can log out. Clicking the button logs the user out of Facebook and all connected sessions. (Default value is false.)
  • background; string Specifies the button image to use that is anti-aliased to match the background of your site — whether it’s pure white, light, or dark. Specify white, dark, or light. (Default value is light.) Note: You don’t specify this attribute if you are using v=”2″.
  • length; string Specifies which text label to use on a button with size specified as small, medium, large, or xlarge. Specify short for the text label Connect only or long for the text label Connect with Facebook. If you are rendering the login button text by including it within the fb:login-button tags, you don’t specify a length at all. (Default value is short.)
  • onlogin;     string JavaScript code to execute when the user gains a Facebook session (that is, after logging into Facebook and authorizing the site).
  • v;     string Specify “2″ to use the latest Facebook Connect login buttons, (examples available in the Facebook Connect wizard). Don’t use the attribute if you need to use the original Facebook Connect login buttons.

4
Jan 11

Nginx, php-fpm and node.js install on Centos 5.5

After some time with node.js I recently decided to move to nginx + php-fpm + node.js for my future servers. Here we will be installing:

  • Nginx as a fast HTTP server with reverse proxy to node.js
  • php-fpm for running PHP scripts. The php-fpm (PHP FastCGI sapi) is built into the PHP core but only since 5.3.3, so we need a recent version of PHP.
  • node.js for handling comet and high-concurrency/persistent connections.
  • Monit is used to restart node.js in case of errors.

For all the packages other than node.js (obviously, since it’s new/actively changing), you can get the most recent version without to patching/compiling anything anymore.

Start by updating your packages and uninstalling httpd if you had it installed:

yum update
yum remove httpd

Add repositories for nginx and PHP-fpm

Just add the EPEL and Remi repos:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm
rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-5.rpm

Install nginx and PHP-fpm

Remi has the php-fpm package (php 5.3.4); EPEL has nginx:
yum --enablerepo=remi install php-fpm nginx
Some other common packages:
yum --enablerepo=remi install mysql mysql-server php-mysql php-common php-gd php-mbstring php-mcrypt php-xml php-gd php-bcmath

PHPunit and PHPdocumentor:

yum --enablerepo=remi install php-channel-phpunit php-pear-PhpDocumentor php-phpunit-PHPUnit

Create a test application

mkdir /var/www/

In /var/www/index.php:

<?php
  phpinfo();

Configure nginx

In /etc/nginx/nginx.conf:
server {
   listen       80;
   server_name  _;
   access_log  logs/host.access.log  main;
   root   /var/www;
   index  index.php index.html index.htm;
   location ~ \.php$ {
      # Security: must set cgi.fixpathinfo to 0 in php.ini!
     fastcgi_split_path_info ^(.+\.php)(/.+)$;
      fastcgi_pass 127.0.0.1:9000;
      fastcgi_index index.php;
      fastcgi_param SCRIPT_FILENAME         $document_root$fastcgi_script_name;
      fastcgi_param PATH_INFO $fastcgi_path_info;
      include /etc/nginx/fastcgi_params;
   }
}

Default fastcgi_params

I start the php location by including /etc/nginx/fastcgi_params. This is because I want to use those as defaults, then override them individually afterwards. It is also recommended to have one root directive per server.

PHP path

I  use fastcgi_split_path_info which became available in 0.7.31 to pass PATH_INFO.

Security settings

IMPORTANT: Change cgi.fix_pathinfo to 0 in php.ini to prevent a security issue which arises with the default PHP configuration when PHP incorrectly tries to guess which file you want for URLS specifying nonexistent files. Setting cgi.fix_pathinfo=0 causes PHP to only try the literal path given. Alternatively check that the file exists: if (!-f $request_filename) { return 404; }

See also: Nginx pitfalls.

Start/restart and test

service php-fpm start
service nginx restart

Test by going to your server root, it should show you your phpinfo().

Prepare to install node.js

Excellent post: http://wavded.tumblr.com/post/475957278/hosting-nodejs-apps-on-centos-5

Add a user for node.js:

groupadd -r node
useradd -r --shell /bin/bash --comment 'User for running node.js' -g node  --home /var/lib/node node

I prefer using /var/lib/node rather than /home/node because this is more in line with how other server daemon users are defined (e.g. nginx, mysql). -r is for –system, –comment is for –gecos.

Install node.js

yum install gcc-c++ openssl-devel
wget --no-check-certificate https://github.com/ry/node/tarball/v0.3.3
tar -xzvf ry-node-v0.3.3-0-g57544ba.tar.gz
cd ry-node-v0.3.3-0-g57544bac1
./configure
make
make install
mkdir /var/node

Create a test application

For node.js apps, I prefer to keep the www and node trees separate. Also note that I am running unstable 0.3.3 not 0.2.x, so the example is slightly different. Create the example file /var/node/hello_world/example.js:
var sys = require("sys"),
   http = require("http");
http.createServer(function (request, response) {
  response.writeHead(200, {"Content-Type": "text/plain"});
  response.end("Hello World\n");
}).listen(8000);
sys.puts("Server running at 127.0.0.1:8000");

Configure nginx for node.js (subdirectory approach)

In /etc/nginx/nginx.conf (after the server definition has started):
location /node {
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_set_header Host $http_host;
   proxy_set_header X-NginX-Proxy true;
   proxy_pass http://127.0.0.1:8000/;
   proxy_redirect off;
}

Install monit

yum install monit

Create monit script

/etc/monit.d/hello_world content:

check host hello_world with address 127.0.0.1
    start program = "/usr/local/bin/node /var/node/hello_world/example.js" as uid node and gid node
    stop program  = "/usr/bin/pkill -f 'node /var/node/hello_world/example.js'"
    if failed port 8000 protocol HTTP
        request /
        with timeout 10 seconds
        then restart

Start/restart and test

service monit start
service nginx restart

Test by going to /node on your server.

Also, run ps -Af to verify that node is running with the uid node.

The reboot test

Finally, test everything by restarting your server. A friendly reminder:

chkconfig nginx on

chkconfig monit on

chkconfig php-fpm on

chkconfig mysqld on

chkconfig –list


3
Jan 11

Implementing Facebook login (part 2)

In part 2 of my Facebook login tutorial, I will discuss implementing Facebook registration (see also: part 1 and part 3).

My design goals are to allow users to have either their own, separate account on the site or to link their account with Facebook. The design goals are:

  • User registration
    • Allow the user to register using Facebook and retrieve data from their profile to help fill in their information.
    • Allow the user to register without using Facebook.
    • Allow the user to add a Facebook profile later on to their account.
    • Treat Facebook user ids just like email addresses: allow just one user account associated per Facebook user ID.
  • User login / authentication
    • Allow the user to sign in with just clicking a Sign in via Facebook button.
    • Allow the user to sign in using their registered account name, if they have one.

Registration flow

Facebook has recently (Dec 16th 2010)  launched a registration tool for streamlining Facebook-based registration. We will be using it to implement the Facebook-enabled registration. However, since the dialog looks very “facebooky”, and some users will prefer not to have anything to do with Facebook, we will also offer our own registration form which looks more consistent with the site, but has no Facebook stuff in it.

The following diagram illustrates the registration flow with the Facebook alternative:

So basically, on the Login page, we will show a Facebook login link next to the regular login form, as well as normal registration link. If the user clicks the Facebook login link, then they will either be signed on via Facebook, or if they haven’t signed up, they will be shown the Facebook enabled registration page. If the user clicks the regular registration link, they will be shown the non-Facebook registration page. Finally, the user can just login using their username and password on your site using the regular login form.

Designing the Facebook-enabled registration page

If you already have a registration system on your site, you can leave that there. You then need to add the secondary, Facebook-enabled registration page.

The Facebook registration just needs to contain an iframe with the following content:

<iframe src="http://www.facebook.com/plugins/registration.php?
            client_id=YOUR_APP_ID&
            redirect_uri=URLESCAPED_RETURN_PAGE
            fields=name,email"

       scrolling="auto"
       frameborder="no"
       style="border:none"
       allowTransparency="true"
       width="100%"
       height="310px">
</iframe>

Check out the Facebook documentation on what you can put in the fields. The new registration tool can support custom fields. I will assume here that you just want their name and email.

Receiving the data from the Facebook registration form

As you will notice from the Facebook documentation, the return value of the Facebook registration form is base64-encoded JSON, which includes an encoded (H)MAC (message authentication code) which is separated from the data itself by a single dot character.

First split the string, then decode the data (from Facebook’s example code):

<?php
  // split to two variables based on the first dot character
  list($encoded_sig, $payload) = explode('.', $signed_request, 2);
  // decode the data
  $sig = base64_url_decode($encoded_sig);
  $data = json_decode(base64_url_decode($payload), true);

  echo 'Hello '.$data['registration']['name'].'!';
  echo 'Your Facebook user ID is '.$data['user_id'].'.';

You should also authenticate the return value using the secret key you received when you registered your website with Facebook. Otherwise, someone could fake a registration request and cause problems with user login. You can do it like this:

<?php
  // check the signature in $sig using $payload
  if ($sig !== hash_hmac('sha256', $payload, FACEBOOK_SECRET, TRUE)) {
    // INVALID SIGNATURE! Show an error message.
  }

Saving the data into an existing user account system

There are number of things you want to take into account when saving the from the user data into the system:

  • Facebook does not give you a username for the user unless you ask for one. You can either:
    • Ask for one from the user as a custom field (easy, but I don’t like it since it reduces the convenience of Facebook integration; and what about if that username is taken?).
    • Make it so that you can have user accounts that do not have a username at all.
    • In both cases, you have to make sure that the user cannot log in directly using the old style login, only via Facebook (unless you ask for a password as well).
  • The user might have already registered an account on your site with the same email address. In this case, you don’t want to duplicate the user, and usually you can’t since email addresses are expected to be unique across the system.
  • You need to make database changes to add two pieces of information: first, the user’s Facebook user id and second, the time-limited access token we need to retrieve more data from Facebook (oauth_token in the return value).

Since these are specific to your application, I can’t say much about them. In any case, you want to add facebook_user_id as one field in the user information.

Option 1 would be to set the user name to empty for those users registered from Facebook. You then need to design your application so that it can show the full name of the user instead of the username elsewhere, since this field will now not necessarily be filled in. A properly designed app should not need to search for users by their username, but rather by user ID on your site. You also need to make sure that users with empty usernames/passwords cannot log in.

Option 2 would be to auto-generate a user id for the user based on their name. You need to make sure the username is not taken, and if it is, generate an alternative. Then you can allow the user to change their username later, as long as uniqueness checks are maintained. You need to make sure that users without a password cannot log in, or generate a really long random string as the password, hash it and forget the password completely.

Option 3 would be to ask for a username, and perhaps even a password – but then the only benefit is that you get some information typed in automatically, which does not make for a smooth experience (weren’t we trying to avoid filling in fields here?).

I would go with option 2. While it does theoretically open the possibility that someone could guess the random string and autogenerated user id; I think there are easier avenues of attack such as guessing weak passwords.

Next part

In part 3 of my Facebook login tutorial I go into implementation specifics using the Javascript SDK.


30
Dec 10

Learning node.js: my experiences and helpful resources

Here are my notes from learning node.js over the winter vacation. You can think of this post as asking me what resources did you find helpful in learning node.js? There is a lot of node.js stuff out there, but few posts on what is noteworthy, popular or tricky - I am sick of seeing posts explaining what node.js is and showing me the same five lines of code.

Structuring code

I recommend reading the series on control flow in node.js from Tim Caswell:

Scope and this are different/tricky in JS, read this and this.

Update Feb 2011:

Update Jan 2011:

Inheritance patterns in Javascript (using a modern style that optimizes well when using the Google Closure compiler).

You’ll probably run into problems with assigning values if you use the class pattern. The problems is that adding variables in the prototype makes them shared among instances. You’ll have to explicitly initialize any per-instance variables in the constructor, or otherwise the object instance will keep accessing the prototype property => property acts like it would be static. Obvious in retrospect, but hindsight makes fools of us all.

Modules

Have a look at the node.js modules page on github for a comprehensive list of modules. Install npm to install packages. Or don’t, just drop the repos in a subdirectory and use require, since require in node.js is flexible.

Debugging

Simply printing out stuff can be done using sys.puts(string) (first “var sys = require(‘sys’)”).

Since you are on V8, you can use JSON.stringify(object) natively. console.log() also works, it automatically prettyprints objects, but only one level deep (no nested objects).

For those rare occasions where JSON.stringify(object) fails (e.g. due to recursion) you can use require(‘util’).inspect(object) as an alternative.

Setting up the server

Update: Here is how I got Node.js up and running on Centos 5 with nginx + monit.

First, you’ll want to decide how you want to build your applications – will you use node.js for the whole stack, or will you mix it with more familiar scripting languages. If you want to mix two technology stacks, I strongly recommend NOT trying to combine node.js with other web app stacks using subdomains, since you’ll have an unlimited amount of trouble due to the same origin policy enforcement that is built into all browsers. You can either solve that problem for all browsers:

  • IE6/7 has no mechanism,
  • IE8/9 has XDomainRequest,
  • Mozilla/Chrome/Opera have XMLHttpRequest Level 2 (note that level 2 is needed for cross-domain support).

Or you can solve the problem on the server:

  1. by only using node.js for all of your stack, running a bare node.js server
  2. by running a mixed stack using Nginx or Apache to proxy node.js requests

I’m not quite ready to start from scratch, given the productivity that my non-node-js stack gives; I did the mixed route with Apache which is adequate for trying things out. If you want to scale a bit more, you’ll want to setup Nginx.

You can proxy requests from Apache to Node.js (second example), or do a rewrite. You’ll want to use a subdirectory for node.js requests so that you don’t have to deal with the same origin policy. Here is what I did:

<VirtualHost *:80>
    ServerAdmin webmaster@localhost
    DocumentRoot /path/to/www/
    ServerName example.com
    ProxyPass /node/ http://127.0.0.1:8001/ retry=0 timeout=120
    <Proxy *>
        Allow from all
    </Proxy>
</VirtualHost>

The “retry=0″ parameter prevents Apache from waiting 60 seconds if a node.js response fails (e.g. due to server restart).

Comet with node.js

Have a look at socket.io (github page) for streaming. In my testing, I couldn’t get it to work with IE9 (sending stuff) and Chrome seemed to keep dropping connections. Firefox worked solidly and Chrome after I set it to reconnect, but I couldn’t figure out why IE9 was not working (it could receive messages, but sending them did not seem to work..). I am sure that socket.io will get there, reliable real-time transmission in all major browsers just isn’t quite yet a problem that has been solved neatly. There are still issues to resolve, it seems. To be fair, that’s for cross-domain requests.

If you use socket.io, you will probably want to implement your own abstractions over it, since you’ll want to do something with channels. There is the broadcast(message, list_of_excluded_user_ids) method, but you probably want to have more fine-grained control which can be done via additional abstractions.

There are also two other promising comet projects: Push-it and Faye. Push-it (githubstackoverflow) is built on top of socket.io. Faye (homepage, github) also looks interesting but I spotted this to-do on the repo: “Detect failed WebSocket connection and fall back to polling transports” – e.g. it doesn’t seem to work in non-Websocket browsers (IE6/IE7), something I would like to have (in fact right now, Dec 2010 WebSockets is disabled in newer FF 4 builds and Opera 11 due to security concerns).

Connecting to MySQL

The top three MySQL bindings are felixge’s node-mysql, Sannis’ node-mysql-libmysqlclient and sidorare’s nodejs-mysql-native.

node-mysql and nodejs-mysql-native are pure node.js clients, while node-mysql-libmysqlclient uses libmysqlclient.

Sannis publishes benchmarks of  the prominent node.js MySQL bindings which point to node-mysql-libmysqlclient being the fastest. Looking at the GitHub stats for the different projects, felixge’s implementation is most popular (most followed and forked). Both Sannis and felixge are actively committing (Sannis has daily commits, felixge approximately weekly). sidorare’s last commit is from August (3 months ago; checked Dec 2010) so the project seems to be less active. Regarding performance, felixge notes (prior to developing node-mysql):

“Performance should be a secondary concern here. Show me a realistic MySql scenario where you perform 170k queries / sec against a single database server. Inserting 13k records / sec doesn’t sound like a good use case for MySql either. The mysql driver is pretty unlikely to become a bottleneck in the real world. Anyway, I think there is plenty of room for improvement.”

So basically it comes down to whether you prefer to have an all-node.js solution or a slightly faster libmysqlclient-dependent solution.

I went with node-mysql, which works really nicely. What I am bit unsure of is how I can make sure that the code I write using the library performs well, I’d love to see a better explanation of what goes on inside node-mysql when I do a connect or a query…

Once you get that done, you’ll want to look into connection pooling. Have a look at node-pool.

Or, you might opt to just have one shared persistent connection to MySQL. Apparently, this is what Felix’s company does or did for quite a while.

Deploying node.js with monitoring

Performance/maturity discussion

Amit Dalihefendic from Plurk writes about the performance of node.js and notes that while they were able to serve millions of real customer notifications during a 8-month period; however, they decided to go with Java and Netty due to current performance problems. Amit attributes the limitations they ran into to the V8 engine – which makes some assumptions which make sense on a browser, but which are problematic on a server (Igor Sysoev, the author of nginx, summarizes this as “V8 will work well in any program, provided that the program is called Chrome“; via Google Translate). On Hacker News, Amit concludes:

“I think the ultimate perfomance is found in pure java.nio/C/C++ solutions, but I think having a bit slower perfomance and higher abstraction is better since it makes it much easier to maintain and debug the system. My general impression of java.nio is that it’s very low-level and a generally hard to code against and it’s the main reason why we didn’t choose it.

This said, I think node.js offers great usability while perfomance is pretty good. So if I was developing a new comet solution I would give node a go – you can always rewrite to something more low level once you begin to hit limits. IMO going after java.nio directly is a premature optimization and most projects won’t hit limits with node.js.”

I figure this is a fair conclusion.

Testing and debugging

Frameworks

Express.js seems to be the most popular web application framework for node.js. Geddy is another (popular?) alternative. Connect provides middleware for framework development, so it is more bare-bones — but since node.js app development is still in the early stages, I’ve seen it used in many repositories. Express is built on Connect, for example. Check out this review of node.js frameworks from May 2010. Since I opted not to write everything in node.js, I’ll have to get back to you on the experiences with these.

Tutorial series

Other interesting links


27
Dec 10

Implementing Facebook login / single sign-on (part 1)

I am implementing Facebook single sign on for my applications. In this first part of my Facebook authentication tutorial, I discuss the basics of the Facebook authentication process (see also: part 2, part 3).

Getting started: some terminology

If you are like me and haven’t implemented Facebook integration before, the terminology and various API’s can be confusing. And the Facebook developer documentation is a combination of too-much-information and not-enough-explanation.

First, the Facebook documentation talks about different kinds of applications, which are:

  • Facebook canvas applications: Applications-within-Facebook, or what users think of as “Facebook apps”.
    • FBML / FBJS apps (built using FB-specific markup)
    • Iframe canvas apps (built using Javascript and the FB JS API)
  • Facebook desktop applications (rare; basically anything that cannot run within a browser, like a desktop client for Facebook)
  • Facebook web applications (websites with Facebook-integration such as single sign-on and custom Facebook elements within them such as CNN and Digg)

And there are multiple developer tools that Facebook offers:

  • Facebook Connect – no such thing exists anymore (e.g. http://mashable.com/2010/04/21/facebook-kills-facebook-connect/). The new API is called the Graph API, and talking about Facebook Connect is inaccurate..
  • APIs (always used, either on the server side or on the client’s web browser via Javascript)
    • Graph API: The new version of Facebook’s API. The Graph API is much more than just authentication: it is the mechanism which powers Facebook applications and allows you to read and write date to Facebook.
    • Old REST API: The early version of Facebook’s Graph API; Facebook does not recommend you use it because they are in the process of deprecating it.
    • FQL (Facebook Query Language): A SQL-like language for querying the Graph API. Supposedly makes using the API easier for apps.
  • Markup
    • FBML (Facebook Markup Language): A HTML-like language for building Facebook applications. FBML is rendered on the Facebook server. Facebook does not recommend that you use it; instead suggesting that you use the Graph API via their Javascript SDK with XFBML.
    • XFBML: FBML-within-(X)HTML pages rendered using Facebook’s Javascript SDK.
  • SDKs
    • Language-specific libraries for querying the API. Other than the Javascript SDK, it’s all server-side and based on HTTP requests.
    • Note that the Javascript SDK allows you to render XFBML embedded on a (X)HTML page! This was confusing because I had done something with FBML before, and didn’t see where exactly the transformation from fb:login-button to HTML was occurring…
  • Other stuff from Facebook:
    • Open Graph Protocol: A convention of meta and other tags which allows you to add metadata to Like button clicks, integrating them with Facebook (e.g. a like on a profile).
    • Social plugins: Ready-made Facebook widgets you can embed using an iframe.

How does Facebook authentication work?

Conceptually, it works like this (picture of user represents page shown to user):

There most important authentication methods are:

  1. Javascript single sign-on: The Javascript single sign-on is simplest, because it provides a function (FB.login) which handles everything from the details of OAuth 2.0 requests to rendering the actual login button. When using the Javascript API, the user id and access token are stored in a cookie (e.g. fbs_app_id) which you can access both server-side and via Javascript in the client.
  2. web application authentication (using server side SDK’s): When you do web application authentication, you need redirect the user to Facebook in your app. When the user allows access for your app, Facebook redirects them back and sends the access token and user ID as part of the GET request to your authentication-complete redirection page. Then you can access the user’s information using the returned token.

Next part

In part 2 of my Facebook login tutorial I address database design considerations and look at how Facebook login can be integrated with your existing user management.

In part 3 of my Facebook login tutorial I go into implementation specifics using the Javascript SDK.

References


22
Dec 10

5 principles of web application development productivity

My early New Year’s resolution is to ship more next year (why wait for next year to start?). With this in mind, I decided to write about increasing productivity in web app development (more posts coming!). In this first part, I will look at the principles behind getting more done quicker. In the next parts, I will look into the points marked with ** in more detail.

Based on a look at the top 400 or so questions tagged “productivity” on Stackoverflow, the five principles of web application development productivity are:

1. Know your tools

Learn all you can about your particular language, framework and tools. Doh!

Explore other people’s solutions to your problems and learn best practices.

Use version control; diff tools, project/bug/feature tracking; virtual machines; VPS/platform-as-a-service providers when needed.

2. Define your problem well

Get obsessed about your project: think about specifications, UI and implementation mechanisms when you’re not coding. **

Don’t get bogged down on details, and try to identify the most valuable parts of the project. Don’t get too “clever” about features; gold-plating carries an opportunity cost.

If you use something more than three times, make it reusable and think about how future features will influence the architecture.

Invest time in creating tools for recurring and error-prone parts of your project.

3. Keep yourself and your team on track

Eliminate distractions; communicate frequently enough but keep it brief.

Stick to a MVP; avoid scope creep.

Have a detailed to-do list. Keep to-dos small and specific enough that you can tick them off. If you can’t determine whether the task is done, then refactor the description until it is unambiguous and small enough. Finish one task at a time.

Time-box your iterations and tasks (e.g. Pomodoro technique): set deadlines and create pressure by making a commitment (e.g. tell someone).

“Accept that everything is a draft. It helps to get it done. Laugh at perfection. It’s boring and keeps you from being done.”

“The point of being done is not to finish but to get other things done.” In other words, get the first version done and then move on to the next most important thing. Get back to it if feedback indicates it’s important.

Try to keep any projects/project iterations to less than a week long, so that your ideas about how it should work won’t start diverging wildly from how your users actually think. Get feedback from real customers to re-orient yourself. If this not possible, try to involve someone else to bounce ideas off.

4. Write less code

Think more about what you are writing and how to do it more intelligently.

Think about code organization for code reuse.

Use a language that allows for easy/quick constructs.

Use a framework that provides good classes/functionality for your problem.

Use standardized boilerplate code: CSS frameworks, boilerplate distributions… **

Use libraries: emailing, captchas, admin, accounts… **

5. Create fewer bugs

Use xUnit for your language and do TDD.

Automate testing: e.g. Selenium, continuous integration.

Next part

In the next part I will discuss boilerplate code and problem definition in detail. See you soon!

References:


1
Dec 10

How to deploy web applications using Mercurial

Does deploying changes to your site take too long? Are you tired of manually sorting out the update? Here is how to deploy your projects using Mercurial.

Why?

  • Ease of updating. Mercurial keeps track of the changes and only sends the necessary changes -  you don’t need to worry about transferring files.
  • Make it possible to roll back changes on the deployed site. You can use hg to roll back from a bad update if necessary.
  • Once set up, it’s beautiful. “hg deploy”. How can you not like that?

How is this different from the other guide you wrote about setting up private repo hosting?

  • While the differences aren’t that big, this setup is better for deployment rather than code distribution via repositories:
    • Minimal dependencies. This approach only uses the hg-ssh script from the Mercurial core contrib.
    • Manual configuration. You can set different directories for each repository which allows you to work with your existing webroot setup. However, you will need to manually add new repositories, since hg-ssh does not support adding new repositories remotely.

If you want a private version of Bitbucket (without any additional features, of course), e.g.  to be able to remotely init/clone new repositories, check out my other tutorial about setting up private repo hosting.

1. Make sure that the .hg directories are never served to the public

To prevent .hg directories from being accessible to other people, you can take a number of (optional, but recommended) precautions:

Move the web root of your project to a different folder

You can probably alter your project  repository so that you have an explicit webroot folder in which you only have the files that should be accessible to the public (bootstrap + js/css resources). This makes it less likely that you accidentally serve your .hg directory. You move the files within hg do this using hg move:

mkdir webroot
hg move . webroot

Or you can use hg addremove –similarity=100 after moving the files manually. It will detect identical files as moved.

After this, you may need to update your apache config to serve from the webroot (e.g. adjust the DocumentRoot directive for the site / virtual host).

Set up Apache not to serve .hg directories

You can also prevent Apache from serving directories with .hg in them by adding the following to your httpd.conf:

<DirectoryMatch \.hg>
   Order allow,deny
   Deny from all
</DirectoryMatch>

Restart Apache if you change httpd.conf.

2. Setup the server

Create the user

# make a system user (-r) with a home directory (-m)
useradd -r -m hg
# lock the user account
usermod -L hg

Setup ssh

#install mercurial if not previously installed
yum install mercurial
su hg
cd /home/hg/
wget http://www.selenic.com/repo/hg-stable/raw-file/tip/contrib/hg-ssh
chmod u+x ~hg/hg-ssh
mkdir /home/hg/.ssh/
nano /home/hg/.ssh/authorized_keys
# remember to chmod the ssh config
chmod -R 0700 /home/hg/.ssh/

Copy the public key: ssh-rsa …(key data)== to /home/hg/.ssh/authorized_keys, one line for each authorized key.

Add the following in front of each authorized key in /home/hg/.ssh/authorized_keys:

command="~hg/hg-ssh /path/to/repository",no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-rsa ...==

If necessary, enable public key auth in /etc/ssh/sshd_config and add hg to the AllowUsers directive. Restart sshd if you change sshd_config.

3. Init the repo

New repo on the remote server

Just run hg init in the new repo folder, then hg clone it on your computer from the remote server:

hg clone -v --debug ssh://hg@server:port/path/to/repo

The -v and –debug make the clone show more information about what is going on.

From existing sources

Unfortunately hg-ssh does not support the hg init/hg clone command remotely. If you have an existing repo, you have to copy it first to the repository directory on the server. Since there are many files to move, I recommend gzipping the whole directory before moving it, then unzipping on the server. Do a hg log and hg status to see that everything transferred correctly.

Adding more repos

To add more repositories, simply add another repository path (separated by a space) immediately after the first repo path in .ssh/authorized_keys, and either init a new repo or copy an existing repo.

4. Push updates

After you have the same repository on both the server and locally, you can start pushing  stuff to the server:

hg push ssh://hg@server:port/path/relative/to/home/dir

If your repo path is not relative to the home dir, you need an extra slash in front of the push:

hg push ssh://hg@server:port//path/relative/to/base/dir

If you run into problems, try connecting via ssh – see my previous post for some tips on this.

5. Automate hg update

You should automate hg update so that each push causes the repository to be updated automatically to the pushed version. You can automate hg update on the remote repo by adding the following to the remote .hg/hgrc:

[hooks]
changegroup = hg update >&2

Note that output has to be redirected to stderr (or /dev/null), because stdout is used for the data stream -which is why there is the >&2 at the end of the command.

If you get errors in updating automatically, check the permissions. In particular, make sure that .hg/ is owned by hg and that the content is readable by apache.

6. Set up default push or create an alias

If you only push to one location, then you can set up the default locations for pull and push in the local  .hg/hgrc:

[paths]
default = ssh://hg@servername/reponame
default-push = ssh//:hg@servername/reponame

This allows you to simply use “hg push” to deploy.

If you need more than one push location, create an alias in the local  .hg/hgrc:

[alias]
deploy = push -v --debug ssh://hg@server:port/path/to/repo

This allows you use “hg deploy” as an alias for deploying.

What about configuration files?

One simple solution is to use “hg forget” to forget them once you have deployed the configuration files. This means that hg will not track the configuration files, but will not delete them either.