I've worked places of different sizes as data scientist (from start-up to mega-corp), and I've seen different ways to evaluate employee's performance (think yearly performance review). In my experience, it's ironically often difficult as a data scientist to demonstrate what you have or have not achieved during the year in a quantitative fashion. I would be curious to hear your input, what works and what doesn't in your opinion. Evaluation methodologies I have often seen are:<p>- Demonstration of impact on business: In this case it's up to the data scientist to justify as best as possible what business decision (or internal milestone) was made because of an analysis performed. In theory it makes sense (= your focus should be on impacting business), in practice I don't think I've ever seen a single analysis changing the course of anything; decisions are driven by many factors, your analysis being only one of them.<p>- Tool usage: I guess some programmers are evaluated the same way; basically, you develop a tool for co-workers to perform analyses with. The more the tool is used, the better it is for you (it's assumed that high usage = high business relevance). In this case the usage is sometimes easier to track and more impartial, but it's often difficult to develop a data science tool covering many use-cases, and one frequently ends-up with a niche product with low usage.
> Tool usage: I guess some programmers are evaluated the same way; basically, you develop a tool for co-workers to perform analyses with<p>> often difficult to develop a data science tool covering many use-cases<p>yes ; yes. if you find the right niche a tool can be very valuable, and it should be possible to estimate or compare the value provided by the tool to the existing process without using the tool.<p>I worked in a domain where software was used to automate or optimise business decisions as part of a large, expensive construction project. Some components of the work that my colleagues & I did could arguably be framed as data-science (more accurately operations research), although a lot of the work was just software development. Occasionally there were small consulting projects for clients where the output of the project was a report summarising some modelling/simulation with recommendations.<p>The bulk of the work was building software tools used by the client to automate and optimise business decisions. The value of such tools could be evaluated in a few of obvious ways: How much labour cost did the tool save the client by automating away previously manual processes? How much value did the tool provide the client by making better business decisions than the previous process? How much incidental value did the tool provide by forcing standarisation of previously ad-hoc processes? (e.g. capturing data required as inputs, data quality...) How much did the client pay for the tool? (the above points would inform this one!)<p>The tools I worked on were used as part of the planning / design process. When they were effective, these tools directly identified designs that would be cheaper to construct than designs produced by the previous process. The value of these construction savings could be estimated and was much larger than the value from automating previous manual work. In at least one case, prior to a sale to a very major client, there was a benchmark & comparison done between the client's existing process and the new process using the proposed tool as part of the business case to fund the sale of the product & related integration work.
I think this is a problem that most data science teams are facing due to the hype and pressure for ROIs to be generated.<p>DS teams might work on operational improvements or external customer problems. The same DS team is unlikely to be tasked to do both.<p>Fairly and factually measuring the impact a team has is not a new problem. Banks have used transfer pricing models to allocate revenue to non-front office teams. This requires a lot of buy-in from higher-ups and it is very sensitive. Management is unlikely to be familiar and comfortable with the notion that a model would calculate the implicit benefit each team is bringing to the table.<p>Ideas I've seen attempted with various outcomes are:<p>* If your dashboard leads to hours saved by your internal users, focus on that because this will mean that your DS team's time investment translated into X workhours saved per week or month. Multiply by average salary and you get a cost-saving estimate.<p>* If your model predicts or calculates something, then it's even easier. It's the same if you are forecasting. It's difficult to measure ROIs on a non-financial investments but it's feasible.<p>* If your solution does not address an existing modelling need or problem or operational bottleneck and simply modernizes or brings something to the table that was not around before, things are a bit trickier. You need to think about opportunity cost (what could the DS team have been doing instead of this solution) but also what the company's strategic direction is. You also need to address operational risk. If your tool helps minimize the risk, then that's worth something. It's measurable by comparing the data pre and post launch of the tool (maybe a 6 month window of running both is sufficient to compare and contrast).<p>If you are looking for early stage successes to build the DS team's goodwill - just focus on the first two bullet points. If you already have buy-in, then time is on your side as long as you are productive.<p>I would also advocate that for conservative companies it's best to ensure that you go to your internal clients and explicitly ask them to nominate projects or problems or issues they need help with. If you can help them with your solutions or data pipelines they will preach for the DS team doing your work for you. Of course, there are a lot of companies where cliques make decisions not just on merit. These companies are going to lose their best people and wither away in time.
Question related to this:<p>I'm applying to jobs in data science (straight outta grad school), and I'm not really sure how to market myself. The problem I keep running up against is that data science is such a broad term that it's hard for me to express how I can provide value to a company without speaking in empty generalities. From my reading of job postings, it seems as though what's called "data science" at one company is "software engineer" at another and "machine learning developer" at a third. How do working data scientists view their role within a company, especially how do their differentiate their purpose from the tools they use?