Ruby data object comparison (or why you should never, ever use OpenStruct)

In profiling some code recently, we found a spot where we were using OpenStruct to handle parsed JSON coming back from a web service. OpenStruct is nice, because it allows you to address the returned object like a normal Ruby class using dot notation (IE = "foo"). We read the following warning in the docs:

This should be a consideration if there is a concern about the performance of the objects that are created, as there is much more overhead in the setting of these properties compared to using a Hash or a Struct.

However, we didn't realize just how bad performance would be. Check out this benchmark comparing Hash, OpenStruct, Struct, and Class:

OpenStruct uses Ruby's method_missing and define_method to simulate "normal" objects, so if you are using these methods frequently in code, check to see if they are really necessary.

Seeing that classes outperformed hashes by a significant margin, I decided to investigate using Ruby 2.0's new keyword arguments when I needed optional parameters on methods. My assumption was that calling a method and passing in the arguments as a hash would incur some performance penalty as each method invocation would result in a new hash being instantiated.

I was surprised to see that using hashes outperformed Ruby 2.0's native keyword arguments by quite a bit:

My testing was done with Ruby 2.0p353 on a MacBook Retina (2012).

Update - June 2015

Turns out there have been some improvements to the Ruby 2.x codebase to make keyword args perform much better. They now beat hashes  by roughly 3x:
# Using Ruby 2.2.1p85 on a MacBook Pro (Retina Early 2015) w/ 3.1 GHz i7
                 user     system      total        real
hash:        7.980000   0.510000   8.490000 (  8.631545)
keywords:    2.940000   0.060000   3.000000 (  3.038953)

And just for reference, here is an updated run of the Hash vs OpenStruct vs Struct vs Class comparison:
                    user     system      total        real
hash:           5.490000   0.540000   6.030000 (  6.396844)
openstruct:   127.070000   1.610000 128.680000 (131.303417)
struct:         3.630000   0.040000   3.670000 (  3.691814)
class:          2.350000   0.010000   2.360000 (  2.415393)

Running Ruby tests in parallel using Rake, fork and Docker

In honor of Docker Global Hack Day, here's a recipe for running your Ruby tests in parallel using Docker and Rake on your local development machine.

This was inspired by Nick Gauthier's great post detailing how to parallelize Rails tests using a shell script and Docker. My use case was slightly different (Sinatra, not Rails) and I wanted to have my dependencies running in multiple docker containers instead of a single one. I also wanted to avoid the cost of running `bundle install` each time the tests run. This means that the ruby tests will run outside of Docker, pointing to the running Docker containers for their infrastructure dependencies.

To do this, I'm using Vagrant in OSX using the Docker Vagrant distribution. This means that I had to customize the Vagrant VM to include the version of Ruby I needed, install bundler there, and map a directory to my host file system so I could run tests on the same code I am working on.

Clearly, this won't isolate the Ruby environment in its own Docker container. However, we don't typically experience problems that stem from differences in Ruby or Gems, so this tradeoff was one I was willing to make.

Docker containers

Using this technique you can run multiple sets of Docker containers, each running a discrete portion of your infrastructure (in my case it's Redis and Solr). I'm going to assume you have your own Docker images configured in a working Docker installation (I used Docker 0.6 and 0.7 for my tests).

Rakefile task

We'll be creating a custom Rake task to start the docker containers and run the tests, here's a sample:

The rake task can be configured to run as many forks as necessary. Remember, your entire test time will only be as short as your longest individual test, so you may want to break test files down to improve the overall efficiency. The Rake task invokes Docker using a system call and I found it was important to let the fork sleep for a bit to allow the process within the container to start.

We use MiniTest (included in Ruby 2.0.0) for our test framework and this using the `` method invokes the test runner explicitly.

Getting better output

One of the first things you will notice is that the output from the tests running in each fork gets output as the test run, so tests in fork 0 will print output alongside tests from fork 3. This makes it hard to track down errors and view the overall test results. To get around this, let's modify the code to redirect STDOUT and STDERR for each fork while the tests run:

Now all of the output for each fork gets printed together.

Making sure we get the output

The final step is to make sure our output is available, even when we hit an exception somewhere (hidden errors in tests are very bad). We can do this using a try/catch:

You can view the whole finished task as well.

How much better is it?

In short, a lot. It will depend on your specific scenario, however. In my case, I saw average test times drop from 25-30 minutes to 3-5 minutes. There are some opportunities for improvement as some forks always finish well before others. Splitting test files into smaller, more discrete groups of methods would help distribute the load a bit, but again the whole process can only be as short as your longest test file.

Suggestions for improvements are welcome. You can find me at @palexander.

How to compile and run mosh (Mobile Shell) on Dreamhost servers

mosh is a very nice way to connect to remote servers via the command line. It works in conjunction with ssh for authentication, but after that it's a totally new client/server protocol.

mosh helps people who move around a lot stay connected and also has some advantages over ssh in terms of responsiveness, which is nice on slower connections (like when you're connecting via your cell on a train).

Getting mosh working on Dreamhost isn't totally straightforward, but it is pretty simple if you are comfortable working on the command line. I'll give a step-by-step guide on installing and running mosh 1.2.2 with Protocol Buffers 2.4.1 below (there is definitely room for improvement, as you'll see, and suggestions are welcome).

Despite trying both .bash_profile and .profile, I couldn't get the export from the last line to stick. Same thing for the path settings. I'm not sure what kind of shell mosh invokes when it tries to run itself on the server, nor do I know how to get the environment variables to work there. Suggestions welcome!

Installing Drush 4.5 or 5 on Dreamhost Using PHP CLI

There are several guides out there for running Drush on Dreamhost. Most require that you set an alias and run the drush.php file directly. However, you can do something just as easy and make it so that Drush will run via the provided shell script, which seems safer to me.

You'll want to do the following steps from the command line (skip to Step 3 if you already have Drush downloaded and just need to make it work properly):

  1. Download drush to your Dreamhost home directory:
  2. Unzip the file:
  3. Edit your ~/.bash_profile file:
    nano ~/.bash_profile
  4. Add the following line at the end:
    export DRUSH_PHP=/usr/local/php5/bin/php
  5. Optional: You can also add an alias for drush in the .bash_profile so that you run just run 'drush' at the command line without typing in the full path (IE ~/drush/drush):
    alias drush=$HOME/drush_path/drush
  6. From here you can either 'exit' and re-login or just run:
    source ~/.bash_profile
  7. Now you can run drush via the shell script:

Other Dreamhost Guides:

Remove a password from Git's commit history without deleting entire files

Ever accidentally committed a password in a file that you actually need? One of the advantages of using Git is that you can fix your horrible blunder with relative ease.

This example uses some code from Kevin van Zonneveld's helpful post and some knowledge from GitHub's guide to removing sensitive data. Kevin's example uses a new Git repo, but I used a similar process to fix an existing one.

I'm also using GitHub as my Git host, so there may be changes if you need to do this elsewhere.

WARNING: This can break your stuff. Backup and be careful.

Extending BioPortal's Rails UI

The BioPortal Web UI is primarily built using Ruby on Rails, which is an MVC framework built for the web. The views are plain HTML with the option to include Ruby in tags. We'll walk through a quick tutorial that will enable you to add your own sections of content on top of the existing Web UI. For our example, we'll add a section that will list tools that can be used to work with the ontologies in BioPortal. The Tools section will be a simple list that includes a tool's name, a web address, and a description. This list will be stored in a database and entries can be added, edited, and deleted.

For more guides on working with Rails, see

This tutorial assumes that you have version .4 of the NCBO Virtual Appliance running in your virtualization environment.

  1. Log into Virtual Appliance with the root user
  2. Run cd /srv/ncbo/rails/BioPortal/current
  3. Run script/generate scaffold Tool name:string website:string description:string
  4. For more information on scaffolding, see the Ruby on Rails Guides
  5. You will see some output that indicates that Rails is creating files for you. Rails scaffolding will create a default set of views templates, a controller, and a model. It will also create a migration for the corresponding database table. Our example will have three columns, 'name', 'website', and 'content', all of which have string values.
  6. To actually create the database table, you'll have to run the migration using rake db:migrate RAILS_ENV=production
  7. Now that we have setup our new content area, we can restart the server to get the changes to show up. To do that, run ncborestart
  8. Visiting http://yourappliance/tools will show an empty default page with options to create a new tool. However, it doesn't look like BioPortal and it's missing all of the links to other parts of the site. We'll need to change the layout that the section is using.
  9. If we open the newly-created 'tools_controller.rb' file under app/controllers we can change the layout used by adding layout "ontology" directly under the 'class ToolsController < ApplicationController' line. If you refresh the page you can see it now shows the usual BioPortal layout. You can also read more about Rails layouts.
    • Note: You may need to run ncborestart again for these changes to take effect. To avoid this, you can run Passenger in the development environment, which will reload all controllers and templates on each request.
  10. The Tools section is fully functional at this point, you can add and remove tools using the generated code. The final piece is adding a link at the top of the page for your new section.
  11. The header links are contained in a sprite located here: 'public/images/layout/bp_nav_sprite.png'. This can be extended to include your own button using the original PSD file.
  12. The sprite contains two images per button, one for the "normal" state (white) and one for the "rollover/active" state (gray).
  13. Once you've added the two new images for the button in the sprite, export it as a png replacing the original 'bp_nav_sprite.png' file.
  14. You can add the HTML for the new button in the 'app/views/layouts/_topnav.html.erb' file after the section for 'Projects' (line 50) as follows:
  15. Edit the file 'public/stylesheets/layout/bioportal.css' and look for the section titled 'Topnav header image sprite'. Add the following CSS code to enable the new sprite (this assumes you created the two new images below the existing 'Browse' buttons):
  16. Refreshing the page should show your changes, including the link in the top navigation

Update 11/28/11: Clarified instructions and added a screenshot of the Tools section

    Barry Smith's Course "An Introduction to Ontology: From Aristotle to the Universal Core" Available Online

    Barry Smith, a philosopher turned ontologist, has a two-day course that he's made available online that introduces the concept of ontology and delves into their history, practical application, and connection to computer science. For those wishing to get a firm grasp on what ontologies are and how they're used, I highly recommend that you take the time to give these lectures an in-depth listen.

    I find Barry to be a captivating lecturer capable of distilling some very abstract concepts down to understandable, actionable pieces of knowledge. I saw him speak to a group of developers as a part of the National Center for Biomedical Ontology's recent meeting, where he urged all developers to at least give his course some attention. From other people in the ontology community I understand that he has a very particular viewpoint on the ontology world. But I found it useful and hope others will too. Any pointers to other educational pieces will be welcomed in the comments.

    The videos, available for streaming below, are part of a two-day course that Barry teaches. Barry indicates that they are free to use in any capacity, so I took the liberty of uploading them to Viddler (the only free streaming video site I could find that would allow long videos with no special account).

    Ontology as a Branch of Philosophy

    Ontology and Logic

    The Ontology of Social Reality

    Why I Am No Longer a Philosopher (or: Ontology Leaving the Mother Ship of Philosophy)

    Why Computer Science Needs Philosophy

    Ontology and the Semantic Web

    Towards a Standard Upper Level Ontology

    Ontology and the US Federal Government Data Integration Initiative

    New BioPortal Deployment Options

    Hopefully this will be of use for those who are deploying BioPortal on their own servers:

    The BioPortal team is happy to announce some new changes that will ease the life of those who are deploying the BioPortal UI application in stand-alone instances. This should make it easier for you to upgrade BioPortal as we release new versions and lays down a framework for us to add new functionality without breaking your installations in the future. Previously, things like the autocomplete, found in the Jump To and Form Complete widgets, were hard-coded to use as their AJAX back-end. That's no longer the case.

    See more: BioPortal UI Now Supports Easier Deployment -- and Internationalization

    Adding a Footer with Additional Content to jQuery Autocomplete

    I was recently revamping some user-embedable widgets for BioPortal. In addition to formatting the results in the autocomplete, we decided to include a tag at the bottom with a link to the BioPortal site indicating where results were coming from.
    It wasn't immediately obvious how to insert data into the ac_results div. Using Javascript to programmatically place the required HTML was the first option I explored, but the autocomplete plugin won't work if information is placed into the div prior to the results being inserted there.

    As a second option, I tried overriding the formatItem function to insert an additional row of data containing the information. This worked, but the additional data was treated like a result entry, meaning that you could select it with the mouse or keyboard out of the list. I wanted the text to act and appear distinctly different that the results.

    Finally I resorted to modifying the jQuery autocomplete plugin to appropriately handle additional HTML passed as a string using a new 'footer' option. In the showResults function I added a line to check for the footer option and append the data provided if found.

    Then, when creating the autocomplete, you can add the footer information like this:

    The resulting output is shown above. Suggestions welcome!

    Web Services: Allow me to use sub-addressing in my email!

    I like using one email account for all of my email needs and one way I try to cut down on possible spam or unwanted emails is to use subaddressing when I sign up for a new service. It's certainly not foolproof but it does provide an easy way to filter mails when that "Unsubscribe Me" link suspiciously fails to work.

    Subaddressing is simple to use, that's the main reason I like it. Gmail supports this, as it should since it's included in the email RFC. To use subaddressing, simply add a plus sign after the "local" part of your email (the section before the So, if you were signing up for Great Minds, you could use this email:

    The problem comes in when people try to validate your email before allowing you to sign up. Many, many web services will not allow plus signs when you sign up, with varying degrees of success when it comes to handling a user who inputs the plus sign. Most services just reject the address as invalid and force you to try again. Some (Safeway) just strip the plus sign out without ever informing you, leaving you with an invalid email address in your account settings and, since you use your email to login, an invalid username as well.

    Today I was surprised by a new response: a "helpful" employee went in and changed my email for me, thinking it was a typo.

    You didn't receive an email with your decline notice and instructions to fix your [company-name] purchase because there was a typo in the email address you entered, specifically [my-email] (but with "[company-name]" written out in the address). I added the correct email address to your account...