James's Blog

Sharing random thoughts, stories and ideas.

Scale

Posted: Sep 21, 2019
◷ 4 minute read

A skilled software developer can hack together an app for sharing photos in a day, and it’ll work perfectly fine for a few hundred, maybe even a few thousand people. But when you want the same app to work for a few hundred million people - the scale of Instagram - it is not so easy. An experienced electronics engineer can probably design, source the parts, and assemble a smartphone in a week or two (faster if you are in a place like Shenzhen). But if you want to build millions of these devices per month - the scale of Samsung and Apple - it again becomes much trickier.

The scale of an operation can drastically, qualitatively change the nature and difficulty of the operation. Interestingly, the problems involved with scaling are usually orthogonal (independent, unrelated) to the actual problem you are trying to solve (i.e. the “core problem”). Scaling a photo sharing app is not really about how to take or upload photos, but rather about how to handle many millions of data changes per minute in a consistent and reliable way. Scaling a phone manufacturing business is not really about how to assemble the phone, but rather about supply chain management, shipping, and inventory management. In fact, the reason that a single person is able to build a photo sharing app or assemble a phone so quickly is that many others have already solved most of the unrelated problems of scale for them. Without the instant-provisioning cloud servers, mass-produced commoditized silicons, or the standardized communication protocols between components, even the core problems would not be so quick to solve.

I think it is the problem of scale, and not the core problem, that occupies most of the attention and resources at large or fast-growing organizations. Uber dedicates resources constantly to optimize routes for individual drivers and riders (part of their core problem), but probably spends more on making sure that the new optimized algorithm can be deployed to the millions of users without compromising quality or speed (a problem of scale that’s not really related to their core problem). Apple has a massive R&D budget for hardware, with a portion spent on creating new or improved components, but more is probably spent on making sure that they can manufacture millions of these new components per month to hit their demand.

The same things happen at the meta-organization level as well. As much as companies want to improve the productivity of individual employees and teams, at a certain stage, they will be mostly preoccupied with how to ensure that employees 100-200 are as productive as the first 10. This requires processes and organizational structures, things that are orthogonal to (and sometimes even work against) the actual improvement of productivity at the individual level.

And scale isn’t a binary thing, but has stages as well. Beyond the set of “core problems”, there isn’t just a single set of “problems of scale”, but rather many different sets of problems at different levels1. The scaling problem of getting from 1 to the first million users may be qualitatively different from the subsequent problem of getting to 100 million users, and may require completely different solutions and tools. Similarly at the organization level, processes and structures that worked scaling from 1 to 10 employees may be very different from what’s needed to get from 10 to 100.

So given this separation of core problems and scale problems, and the orthogonality between them, how do you tackle them? Many organizations choose to specialize, and solve the two types of problems independently. The dedicated core product team is focused on improving the value for the individual user, while a separate ops/scale team is responsible for making sure that the core product can be deployed widely and reliably. For organizations themselves, it means dedicated operations teams, and Chief People Officers. On the other hand, you can also mix the different problems, and have everyone tackle them together. This is in part what the DevOps movement has been about, getting the people who used to only care about the core product to care about the issues of scale as well. Like many other complex problems, there is no definitive best solution. But I think being aware of the differences between core problems and scaling problems will help, regardless of the approach chosen.


  1. This is one of the reasons that I personally don’t like comparisons between organizations, and sometimes nations, that vary greatly in size. People often compare some European country (smaller than certain states) with the United States as if they were on equal footing, and only look at the issue with the “core problem” (e.g. health care) in mind. But if we also consider the scale issue, then we realize that the two countries are actually facing very different problems that are outside the core problem domain. ↩︎