Oh my god. I am living this article right now. I have a team of three, expecting to grow to a team of five by the end of next year.
Note: I was told that not all data scientists report to me, but this is true.
You're a bit confused because you were told all data scientists would report into the data team, but apparently other functions have their own data scientists? You make a note to follow up.
You notice a a lot of the code starts with very complicated preprocessing steps, where data has to be fetched from many different systems. There appears to be several scripts that have to be run manually in the right order to run some of these things.
The entirety of the What's Happening So Far section.
In your weekly 1:1s with various stakeholders, you keep finding huge blind spots and opportunities for data to make a difference. You use these things as a forcing function for a lot of core platform work. In particular, many pipelines need to be built to produce “derived” datasets. There's a high upfront cost of those analyses, but subsequent analyses are much easier once the right datasets have been built.
You're starting to lay the most basic foundation of what is most critically needed: all the important data, in the same place, easily queryable. Opening up SQL access and training other teams to use it means a lot of the “SQL translation” goes away.
Okay I'm going to stop copy/pasting this entire article, because I pretty much could. That's how on-the-nose it is for my current job.