You’ll Probably Have to Unravel the Abstraction

Published: Sep 2, 2024

<aside> 💡

You shouldn’t need to know the nitty-gritty details, but you’ll probably have to, so it’s good to develop the skill of digging into code.

</aside>

The other day, I was in a meeting about one of my org’s efforts to rewrite a major component of our system. I imagine it’s the same story that happens at any large company: teams do things on the fly that made sense at the time, but now it’s clear that there’s a lot of overlap in functionality between projects without much code reuse— why not create a shared library? This way, teams can peak underneath the hood of the shared library if they’d like, but they can otherwise focus on delivering business requirements. My senior manager summed up the gist for me: you should have the option to learn about technology or library you’re operating, but you shouldn't be required to learn it if you’re a Product Engineer and just want to ship for client value.

I felt iffy about this remark— Specialization isn’t new by any means, but where does that leave the programmer? It feels as though I’m just monkeying around hooking up APIs until something works. Don’t we need to understand what goes on underneath the hood?

After some thought, I have boiled down why I felt iffy: no, theoretically you don’t need to understand what’s underneath. But if you do any sort of engineering, then practically you will in at least some cases.

Why? Because of two things:

One: abstractions are leaky.

Two: effective software development requires that you use abstractions. You can’t reinvent the wheel everyday— at some point, you say I want to get this app out and I don’t care what bits are on the circuit board, what the instruction set architecture is, what programming language we use. My value-add is not on knowing these things but by building on top of them.

So if you use lots of abstractions (point 2), and abstractions are leaky (point 1), the corollary is that you’ll see some leakage! Again, this is because point 1 is true—theoretically abstractions aren’t leaky since they should have encapsulated everything the client ever needs to know about in the abstractions. However, they are practically leaky since nothing exists in a void.

And if you see leakage, the only way to fix that is by digging underneath the hood.

There are some caveats though.

For one, some abstractions are more leaky than others. I doubt I will ever read documentation on how the Python interpreter parses python code to generate machine instructions. Why? Because it’s a battle-tested product that I can rely on and has been heavily audited by the Open Source Community– the equivalent of having 1000s of passionate engineers work on a project. Having a general idea of how interpreters create an AST is enough to satisfy me.

On the other hand, if I’m working on a data pipeline at work that’s throwing errors every 3 days, then I can’t just content myself with an abstraction like “this takes input of rows of form X and outputs rows of form Y that have additional properties which serve Z business purpose.” If the README says it does one thing but IRL it does another, then the README is a faulty abstraction, and the only way to fix it is (barring asking a maintainer to do so, if one exists), is to dig into the abstraction yourself.

So I guess the iffyness I felt to my manager’s statement comes from the real-world experience I have under my belt which is that things aren’t always neat or clean.

The way to combat needing every Engineer to dig into each other’s codebase is to create clear SLAs and APIs and ownership, but this isn’t always realistic. Designing actually good APIs requires collaboration on the Team Lead level, and the reality is that not all projects will reach that level of importance to merit sprint-cycles to do this formally. Perhaps the most you’ll get is you’re working on a project with a sister team and they’ll walk through how their system works, but you still got to figure stuff out yourself. That’s probably why good programmers know a bit about everything– because they’re curious, and at some point they need to dig.

At the foot-soldier level (i.e. entry-level), the ideal would be to have APIs and SLAs to operate on, but barring that formal system, the next best strategy is to reach out to the people who created whatever you’re working with. Latent knowledge is in their heads, and it’s more effective to have a guide in the jungle than to go in blind. Many people have written great overviews on how to do this (eg. Boz).