System design interview

Foreword

As a fairly senior software engineer in a large tech company, I get asked to do interviews of new candidates very often. For some reason, most of the time, I get asked to do the dreaded “system design” question. For those who are not in the industry, a “system design” question is one where the candidate is asked to design an entire system, as opposed to an algorithm, or just part (or even the crux) of the issue. The candidate has to consider all relevant parameters, and then come up with a solution that addresses everything.

There’s going to be a bit of rambling in this post, but I promise, there is a financial point to it all.

As usual, a reminder that I am not a financial professional by training — I am a software engineer by training, and by trade. The following is based on my personal understanding, which is gained through self-study and working in finance for a few years.

If you find anything that you feel is incorrect, please feel free to leave a comment, and discuss your thoughts.

Google

Let’s say we are doing a system design interview right now, and the question is:

We know that Google search sometimes returns wrong results. For example, sometimes the results are not personalized enough, returning results that are only relevant to people living on the other side of the world. Other times, it is too personalized, returning results that are just creepy. Describe a potential solution to address this.

As with all system design questions, it is generic, vague and requires the candidate to think a bit out of the box — there are generally no preset, “optimal” answer, and the solution is an exploration of the space with the interviewer.

Now, let’s say the candidate says something like this:

The problem is that Google cannot possibly understand the nuances of the user’s intent, and so the only solution, is to just create a new search engine. Let’s say we have a hypothetical search engine, where the entire repository is on every user’s machine. Then each user can simply run a grep (simple text search) to find the documents with the keywords. Each user can then write a small snippet of code that looks at each document, and determine which is preferable.

To which I’ll say

That’s an interesting idea. But for this idea to work, we’ll need to download the entire repository, which is a representation of the entire web, to every user’s machine. That alone will take decades per machine. Then we need to figure out how to store that much data in a single machine — no single machine on Earth currently has the disk space for this. We then need to address the issue of how grep can even search through the entire repository fast enough that user requests can be answered promptly, and finally, most people don’t know how to write code, how do you propose we fix that?

Now, a rational candidate (read: someone who isn’t definitely going to fail the interview) will realize the premise of their solution “download the web and have every user’s machine become a search engine” is flawed, and simply unworkable, even if it technically can solve the asked problem of search results personalization. They will then rethink, and hopefully come up with something better.

But let’s say our candidate says this:

First of all, we need to devote about 10% of humanity to researching better compression methods. If we can, say, compress data at a 1,000,000,000:1 ratio (that is, every piece of data can be compressed to 1 billionth its size on average), then we’ll significantly reduce the number of bytes we need to transfer and store.

Next, we’ll devote another 10% of humanity to researching better network transmission protocols. Currently, the fastest network link is on the order of 200 Tbps. We need to increase that to, say, 200 Zbps (1 Zetta = 1,000,000,000 Tera). This will let us transfer the repository 1 billion times faster.

Then, we’ll need to devote another 10% of humanity to research permanent storage. The current largest harddrive is about 20 TB, we’ll need to increase that to say, 2 ZB. This will let us store a few copies of the entire web on a single harddrive multiple times over, so that we can keep multiple copies for redundancy.

Next, we’ll devote another 10% of humanity to improving and optimizing grep, so that it can work in compressed space, as well as being a few orders of magnitude faster.

Finally, we’ll need to negotiate with every government on Earth, so that every human being is given a undergraduate level course in computer science, so that they can write their own search engine filtering code snippet.

The good news is, the transmission protocol of our repository is a solved problem. We’ll just put it on the blockchain.

Real world

One constant refrain from blockchain/crypto advocates, is that “blockchain can do X better”. Where “X” is some random facet of the financial system.

For example, corporate actions such as stock splits can take a day or two to sort out, and often, some broker will forget to update their database, and customers will be confused for a day or two more.

Now, a naive view is that “blockchain can do stock splits better” — just create a new token for the post split stock, and enforce an exchange of X old tokens for Y new tokens. The change is atomic (for each user), etc. All that good stuff.

Which is great… if the entire world of finance was invented simply to do stock splits. In that case, you have a winner!

But what if, just what if, we need the financial system to do… other things? Like, say, transact a few billion trades a second? Or being able to handle mutations because, you know, humans make mistakes and typos sometimes need to be fixed? Or provide privacy for the portfolios of private citizens? While providing transparency for the portfolios of certain public entities? Or provide regulators and other deputies a chance to veto/correct certain transactions? Or…

It’s still early days

And then you’ll get the “it’s still early days” argument (1). Fine. You have an idea, it’s still in its infancy, great.

But, you know, maybe don’t keep annoying the rest of us with it until you have it all figured out? Or, you know, at least know the parameters your solution must address.

BTW, I have this great idea for solving global warming. First, we need everyone to poop in their pants instead of bathrooms. There’s still some kinks, but it’s still early days. Trust me, though, it’ll definitely work.

Footnotes

  1. Bitcoin was invented in 2009, 13 years ago. Blockchain (or Merkle trees) was first invented in 1979, 43 years ago. Cryptography was invented centuries ago. Etc. It’s still early days.

Leave a comment