There’s a tradeoff to code reuse, insomuch as code written for a different problem is often a poor…

Using someone else’s code often requires writing wrapper code to change how certain things work; often, particularly if you have unusual…

There’s a tradeoff to code reuse, insomuch as code written for a different problem is often a poor fit.

Using someone else’s code often requires writing wrapper code to change how certain things work; often, particularly if you have unusual scale requirements or are doing other things that invalidate common assumptions, you will end up writing more code to use someone else’s library than you would be if you just implemented the functionality they provide yourself. Of course, if you are performing a very common task under very common constraints, using the same software as your competitors makes sense; it’s only when you go outside the normal range that things begin to fall apart & natural choices start to make less sense than strange choices.

Take, for instance, search. ElasticSearch is a pretty polished and reliable system — as long as you’re not pushing up against any hardware resource limits on your cluster, because running out of ram or hard disk space on any single node on even a large cluster can cause invisible data corruption & cascading failures. In other words, if your scale is normal ES makes sense, but if you have too much data and you want to wring more performance out of your hardware, you’ll want to homebrew something.

Using platforms whose facilities you don’t benefit from is how ecosystems like hadoop become nightmares. Hadoop makes sense on paper for a small set of very specific problems — bioinformatics stuff, and other situations where doing a small amount of simple math on very large numbers of very small wholly independent records is necessary. Outside of that domain (if your total amount of data is too small, or individual records are too large, or you have too many machines, or too few machines, or if the data is insufficiently independent), using hadoop on a thousand machines is usually slower than running simple command line tools on a single machine. Nevertheless, people attach to hadoop because they feel like they need power, and start using systems like hive (an RDBMS is the absolute last thing you want to integrate with a map reduce system), and suddenly something that would normally be a single line of shell and run in 30 seconds on your laptop becomes eight hundred lines of java and runs for eight hours on a four hundred node cluster.

In other words, often what matters is not writing less code but running less code, and often that involves avoiding using potentially useful libraries when they will add rather than remove complexity. Determining whether or not running someone else’s code is appropriate is complicated and can require a deep familiarity with the behavior of the library: usually, the best way to determine whether or not a library is worth using is having had already written something very similar yourself & knowing from experience how easy or hard certain things are to implement (along with knowing from experience the pitfalls of different implementations) — in other words, unless you’ve reinvented the wheel, you’re at a disadvantage when shopping for the car.