NoSQL != No Indexes

MongoMapper has been a godsend for us. It has allowed us to adopt many of the conveniences of Rails’ ActiveRecord, while getting the benefits from MongoDB as a backend. Despite being super fast, you still need to make sure you think about database structure (just document-oriented) and those very important concepts like indexes.

Sometimes MongoDB is so fast, it’s easy to forget about things like this and assume everything will be fast all the time. We had the need to queue up about 100,000 Delayed Jobs for background processing. These jobs were fairly straight forward, but the process went less than ideal at first.

We created an admin method to first load the jobs into the queue:

def self.add_linkedin_industry_to_employers
Employer.all.each { |employer| employer.delay.update_industry_from_linkedin }
end

We unleashed Hirefire (awesome app, by the way…) on the queue and let 60 Heroku workers loose. As the jobs were loading they were flying out of the queue, that is until we got to about 20,000 jobs, then all hell broke loose. And by “all hell broke loose”, I mean nothing happened… Absolutely NOTHING. I think that maybe there was a little too much going on, so I stopped most of the workers and let the queue fill up. Bad idea…

Once the jobs were loaded, it was getting late and the effects of the jobs running weren’t anything to worry about so I went to bed hoping the queue would be empty in the morning and I would jump for joy at the success of running 100,000 jobs in my sleep, literally…

The morning came and there sat 100,000 queued jobs. In probably 7 hours, nothing had moved  (this is all with 60 workers). So clearly something was wrong. So I did what any “sane” developer would do and cleared the queue and did it all over again assuming it was just bad luck  ;)

Honestly, what came next was a foray of testing and measurement to find out what the hell as going on. In the end, it was one simple problem. An index…A stupid little index. Well, big index in this case. Turns out it was taking so long to pick a job off the queue and update it with that many jobs there, nothing was going anywhere.

Creating the proper index in the delayed_jobs collection fixed everything:

Delayed::Backend::MongoMapper::Job.ensure_index([[:priority, 1], [:run_at, 1]])

This caused us to review all of our indexes and created an index initializer that contains the indexes for our entire application. In the end, I feel like the application is safer and more fool-proof. Although the solution was simple, getting there was not.

Side note: Review the documentation of all your gems. This fix was realized after reviewing the gem we use for using MongoMapper with Delayed Job. There it is, front and center. Had we just followed the instructions from the start, we wouldn’t have been in this situation. But like reading manuals, there’s no fun in reading documentation when it can take you on a journey like this.


Guide to Making Your Upgrade to Rails 3.1 Not Suck. (It did for us)

December 20, 2011 Disqus Comments and Reactions

Rails 3.1 brought on a set of changes that really excited us. Because we interface with a number of social platforms, the number of asset includes in our application has spun wildly out of control. Combine that with the assets related to the styling and function of our application and the list looks like a short novel (maybe not that long, but let’s assume it is for effect). So the inclusion of the asset pipeline in Rails 3.1 made a lot of sense for us.

The transition was relatively straight forward, however, there were a few “gotchas”, which I’ll point out so your experience is hopefully smoother.

If you have a skeleton Rails application that doesn’t do anything out of the ordinary, everything’s as documented here. We were upgrading a legacy application so the notes at the bottom were particularly helpful.

As part of the process, we moved our static assets that used to reside in the /public directory to a new /app/assets directory. This was straight forward, however, the manifest file that’s created to tell sprockets what to scrunch together requires a little more thought and discussion.

The stock application.js looks like this:

// This is a manifest file that'll be compiled into including all the files listed below.
// Add new JavaScript/Coffee code in separate files in this directory and they'll automatically
// be included in the compiled file accessible from http://example.com/assets/application.js
// It's not advisable to add code directly here, but if you do, it'll appear at the bottom of the
// the compiled file.
//
//= require jquery
//= require jquery_ujs
//= require_tree .

As many articles will reference, “require_tree .” instructs sprockets to load all files in the directory (typically /app/assets/javascripts) and include them in your application. However, a problem arises if you have page-specific code in them because they’ll run on every page load. An alternate solution would be to create a directory within your /app/assets/javascripts directory that holds general script content for the entire application that won’t cause a problem if loaded on each page load. Combine that with a change in the manifests file like the following, and you’ve got yourself some working assets:

// This is a manifest file that'll be compiled into including all the files listed below.
// Add new JavaScript/Coffee code in separate files in this directory and they'll automatically
// be included in the compiled file accessible from http://example.com/assets/application.js
// It's not advisable to add code directly here, but if you do, it'll appear at the bottom of the
// the compiled file.
//
//= require jquery
//= require jquery_ujs
//= require_directory ./global

This will load only those assets in the /app/assets/javascripts/global directory. For page-specific content, you can use the typical include on that page that is directory to a file not loaded by the manifests file like:

= javascript_include_tag "users/index.js"

Gotcha #1

Using a Mongo-related ORM generally requires that you not load ActiveRecord. To do this, the following change is necessary in appliation.rb:

# require 'rails/all'
require "action_controller/railtie"
require "action_mailer/railtie"
require "active_resource/railtie"

We deploy to Heroku for staging and production. As part of the deployment and slug compilation process, Heroku runs the rake command to precompile assets. Sprockets is loaded as part of the “require ‘rails/all’ call that we commented out above. So if spockets isn’t loaded, Heroku won’t know how to do this. To include sprockets in the application for Heroku, the following addition should be made to application.rb:

# require 'rails/all'
require "action_controller/railtie"
require "action_mailer/railtie"
require "active_resource/railtie"
require "sprockets/railtie"

Gotcha #2

For some reason some of the file in our app/assets directory weren’t precompiled by default. To change this for our production environment, we made the following change to production.rb:

# Precompile additional assets (application.js, application.css, and all non-JS/CSS are already added)
config.assets.precompile += ["*.js", "*.css"]

The last line ensures that all javascript and css files are precompiled as part of Heroku’s precompile process. I’m still unsure why this wouldn’t be the default behavior, but it’s worth noting because it was a change that was necessary, and might be for you. We realized this after we got application errors pages on load of the application when we first deployed Rails 3.1 to our staging environment. The good news is, with the new asset pipeline, something like this can’t really slip by because if the application can’t load a particular asset, the entire application will freak out and let you know very quickly that that is the case, prompting an intense look at what you just broke.

Gotcha #3

This last one was something that baffled us for a few days. There’s very little documentation about it and even less about how to change it. This is partly due to the fact that it’s a behavior caused by the use of MongoMapper, Heroku, and Rails 3.1.

It goes like this: Heroku compiles your application slug without any ENV variables  (which we use to specify the URL of our database). The rake task that precompiles the assets loads the entire Rails environment, and as a result, attempts to connect to the database. Because there is no database URL, Heroku, smartly, attempts to create a fill-in connection string based on the database adapter in your gem file. The result for us was that it tried to connect to the MongoDB port at localhost. However, there’s nothing on localhost listening on that port. Mongomapper didn’t like that and blew up, which caused the asset compilation process to fail, and thus the application to behave very strangely without assets.

Turns out a simple config change in application.rb will tell the asset process to not initialize on precompile. It’s almost like the Rails gods were looking down on us, knowing this would be an issue and gave us a fix.

config.assets.initialize_on_precompile = false

Conclusion

With the changes mentioned above, we were on our way to smokin’ fast asset loading and far fewer asset includes than our application and users had ever seen.


Meeteor vs. The Social Giants

December 13, 2011 Disqus Comments and Reactions

At Meeteor, we spend a fair amount of time leveraging data from various social platforms.  Most have standardized the use of Oauth for authentication and this has allowed us to use similar methods for each one, but authentication is only the tip of the iceberg. Once authenticated, the idiosyncrasies of each service rear their head.

Facebook

To avoid the headache of supporting password resets, email confirmations, etc., and since Meeteor makes heavy use of your social graph, we rely on Facebook to manage users.

Because we use Rails, Facebook connection and authentication was a gem away. We started out using our own homegrown authentication code and data querying methods with the Oauth and Typhoeus gems, but quickly realized supporting error handling properly and the ever-changing Facebook API was better left for another tool. Enter Koala. Koala has allowed us to create simple cover methods for getting data to and from Facebook.

gem 'koala'

Once we receive back an access token after the initial authentication, we’re free to make whatever queries against Facebook the user has allowed. For example, to get friends’ data:

def graph(token = nil)
 Koala::Facebook::GraphAPI.new(token)
end

def make_friends_request(user)
  graph(user.fb_token).get_connections('me', 'friends')
end

Koala, conveniently, stuffs the returns JSON from Facebook into a hash that we can then manipulate in our application. By default, Koala will return the ID and name from Facebook. You can specify additional attributes to return if necessary:

def make_detailed_friends_request(token)
  graph(token).get_object("me/friends", {fields: "id,name,education,work"})
end

This allows us to get information about friends’ educational and work experience, which we can then leverage to help users visualize the strength and reach of their networks.

LinkedIn

Users can also connect their LinkedIn account because not everyone stores their work history in Facebook. I, personally, use LinkedIn as an online resume with complete details of my work experience, while my Facebook profile remains pretty bare. Connecting to LinkedIn once I’m logged in to Meeteor is therefore a one-click approach to filling in my Meeteor profile. Meeteor can then provide me with better recommendations and matches.

Because LinkedIn uses Oauth as well, authentication is very similar to Facebook. Unsurprisingly, we use the LinkedIn gem.

gem 'linkedin'

Once armed with an access token, we can query LinkedIn like this:

def client
   LinkedIn::Client.new(OAUTH_CREDENTIALS[:linkedin][:key], OAUTH_CREDENTIALS[:linkedin][:secret])
 end

 def make_profile_request(user)
   client = self.client
   client.authorize_from_access(user.linkedin_token, user.linkedin_secret)

   response = client.access_token.get("http://api.linkedin.com/v1/people/~:(id,first-name,last-name,headline,picture-url,specialties,educations,positions,connections,public-profile-url)").body

   REXML::Document.new(response)
 end

This are several things to note here:

  • Our helper method “client” needs our LinkedIn application credentials to connect and ultimately receive data back. Linkedin uses these credentials to track API and engage throttling if you go over the quota for that particular API call.
  • LinkedIn’s API is more than a simple get request with specifics about the request embedded in the URL. However, before you begin querying to your heart’s content, remember that gaining an access token before making this request is completely necessary and won’t get you anywhere with them if you don’t have it.
  • The responses from this particular query come back as XML. This has introduced another set of challenges while trying to simplify our codebase. Unfortunately, the code required to iterate through the data that comes back from this request is non-trivial but is also fairly predictable when dealing with detailed XML responses. I’ll leave it to your imagination to figure out the rest.

We’ve noticed that some of the newer LinkedIn API documentation references the use of JSON. For us, this would be ideal, but right now it doesn’t seem to cover the API calls that we use the most.

Twitter

Twitter, like Facebook, make it pretty simple to get data from their API. They’re like the happy little brother that just goes with the flow. Like both of the above, you must authenticate using Oauth before getting at any data. For this stream, we use the “oauth2″ gem:

gem 'oauth2'

After having received an access token, we can get at “friends” (Twitter calls the people you are following “friends”) of a user like:

def oauth_access_token(token, secret)
   OAuth::AccessToken.new(oauth_consumer, token, secret)
 end

 def make_user_request(user)
   access_token = oauth_access_token(user.twitter_token, user.twitter_secret)
   response = access_token.get("/1/account/verify_credentials.json")

   JSON.parse(response.body)
 end

Like Facebook, Twitter uses JSON to send data back from API requests. Rails makes it super easy to consume this data and turn it into hashes that you can quickly and easily manipulate and use elsewhere.

Conclusion

All these platforms provide a wealth of information about users—almost too much at times. But we’re in the business to make use of this and offer the best possible recommendations we can based on the information we receive. Luckily, that’s our problem and not yours.