harlow's comments

harlow · on Sept 7, 2015

We have a Git repo for each of the services, and we have a separate repo for each of the Fleet Unit files.

> Git push to only one repository.

This refers to a per-service repository in our deployer app.

Previously we had to push code to each of the servers running a service. Now we push to the deployer app and it leverages Fleet to distribute the code across the available boxes.

harlow · on Sept 7, 2015

Hi manigandham, Harlow here (author of the post). You are absolutely correct; serving 2M API requests from a local data store wouldn't be considered "at-scale" these days.

In hindsight I should have added more information about where the actual work is being done -- I definitely missed the mark on parts of this post.

The lions-share of our work is done in an async fashion. 2M API requests (lookups) turns into 40M+ background jobs. These jobs fetch, aggregate, and scrub data from a number of downstream providers.

harlow · on Aug 7, 2015

Great question. We have loads of Sinatra services running on CoreOS (using Fleet and Git for deploys) http://blog.clearbit.com/servers-part-one. From there we have a contrib Gem to share middleware between the services. These middlewares handle Auth, Rate limiting, CORS, etc.

harlow · on May 27, 2014

> Can you give any detail on what the end result is? What kind of data do you end up with in redshift, and what kind of queries?

I'll have to get our BI team to create a follow-up post.

In a lot of ways its the most interesting part of this project. I'm not entirely sure what data we can share; I'll push to get something out.

> What kind of data do you extract from mixpanel?

The mobile devices push user interactions with the app as events to Mixpanel. We pull that data daily into Redshift, and this allows us to run historical reports and discover patterns of user behavior within the mobile app.

> Is all this running continuously, or different schedules for each worker? Any of these event based rather than schedule based?

The Extractors are Schedule based. So with Mixpanel for example we do a daily dump around 4am (once all the Mixpanel data is available for export).

With our Rails events we push them to IronMQ and the scheduler kicks off workers every 15 mins to pull them off.

The `Transformers` and `Loaders` are event based. So the Extractors would kick them off when they have completed their work.

harlow · on Dec 13, 2012

> Overly short methods can make code much harder to read and follow and actually increase complexity, because now there are many more entry points.

Ideally there wouldn't be any more entry points. If we create multiple short well-named private methods they should make it easier to follow your code. Side benefit of well-named methods is they can act as simple documentation for readers.

> Case statements can be a bad code smell, just as is_a?(...), but not in every situation.

More often than not when doing a code review a case statement will catch my eye. Often they are OK, but more often than not they can be avoided.

> Read "Design Patterns" by the GoF and "Refactoring" by Martin Fowler also.

So important to read these. They are great for all developers. The idea behind Ruby Science is to show how to use these techniques in a Ruby on Rails application. With real code, and real refactoring (along with code, and a history of git Commits to show the exact changes we're making).