A Peek Behind the Curtain of Our New Open Source Page

We recently launched a new open-source page. While this blog post gives a good overview of why we chose to highlight particular content, we also faced some interesting technical challenges while building it.

Querying contributions with GraphQL

screenshot of the thoughtbot open source page showing some of our most popular projects

A lot of what occurs in the open-source world happens on GitHub, and thoughtbot is no exception. All our repositories are hosted there, so this was a natural place to start. We also wanted to highlight our team’s contributions to other open source projects in the community.

GitHub has a GraphQL API that allows us to query for all the information we need with a single request. We used the graphql-client gem to query the GitHub API. This gem allows us to declare queries as Ruby constants.

OpenSourceStatsQuery = Client.parse <<~GRAPHQL
  # GraphQL query in here
GRAPHQL

which we can use like:

github_response = Github::Client.query(OpenSourceStatsQuery)

This turned out to be a rather large query (there’s a reason it’s elided in the code sample above!). GitHub provides a convenient API explorer that allowed us to experiment and refine the query to get exactly the data we needed before plugging it into our code.

Download numbers

screenshot of banner showing aggregate stats on the thoughtbot open source page

We wanted to get download numbers for our projects. This is more challenging than getting the repo metadata because GitHub doesn’t have these numbers. Instead, they are tracked by various package repositories such as Rubygems, NPM, homebrew, etc.

Rubygems has an API that makes it easy to query downloads by owner. NPM on the other hand only has an endpoint for fetching stats on a per-package basis. We ended up having to follow a multi-step process:

Fetch the list of packages we own on NPM
Fetch download stats for each package
Sum results

This is less than ideal since it means we have to make N+1 API requests (where N is the number of NPM packages we own) in addition to the request to Rubygems.

Don’t trust the network

Our page depends on multiple network calls. We don’t want the open source page to crash just because one of these APIs happens to be down. To be more resilient, we cache the sum every time we successfully fetch from one of the APIs.

Now if an API request fails, we can log the error and fallback to the most recently cached value.

def total_downloads
  # Make API calls
  # Sum results
  Rails.cache.write("rubygems_total_downloads", total)
rescue => e
  ErrorTrackingService.capture_exception(e)
  Rails.cache.read("rubygems_total_downloads")
end

Modeling in Ruby

In Ruby, we have an OpenSourceStats object that takes in all these sources of data and provides a nice set of methods for interacting with it, including derived values such as total GitHub stars on thoughtbot projects. This is the facade pattern.

class OpenSourceStats
  attr_reader :projects, :contributions, :total_downloads

  def initialize(projects:, contributions:, total_downloads:)
    # set instance variables
  end

  def total_stars
    projects.sum(&:stargazer_count)
  end

  def total_forks
    projects.sum(&:fork_count)
  end
end

We also use a class method as an alternate constructor. The OpenSourceStats.fetch method works with other objects to fetch the appropriate data from different sources and then uses it to construct an instance of itself.

class OpenSourceStats
  def self.fetch
    # work with other objects to fetch data from variety of sources
    # then build an instance of self

    self.new(
      projects: projects,
      contributions: contributions,
      total_downloads: total_downloads
    )
  end

  # ...
end

Rendering in the view

A facade object like the one we used makes it easy to show values in the view, especially when combined with some of Rails’ number helpers.

<p>
  <%= number_to_human(
    @open_source_stats.total_downloads,
    format: '%n%u',
    precision: 2,
    units: { million: 'M' }
  ) %>
</p>

Future work

We might want to show download numbers on a per-project basis rather than just the aggregate. This would mean we need to match up the data we get from Rubygems and NPM to the data we get from GitHub. This has two interesting challenges:

How do we want to match the data? Can we get away with naively matching on names? Do we need to parse package-specific files like a .gemspec?
How do we want to model this in Ruby? The current repo objects returned by the GraphQL client gem are insufficient. Do we need to introduce something like a Package object that describes the combination of metadata from both GitHub and package repository sites?