Becoming a data scientist
by Omoju Miller
I am a senior data scientist with a Ph.D. Like most data scientists, I didn't get my doctorate in computer science or data science. Instead, I got my doctorate in computer science education. This essay is about the transition from education researcher to data scientist.
Step one: Know the material
First things first, to make the transition, you've got to know the material. Data scientists need to be experts in machine learning and analytics.
I found machine learning easier to grok than analytics because I have a master’s degree in intelligent systems. It had been a while since my master's, so I needed to do some refreshing courses. I am dating myself here, but when I took my machine learning classes, there was no sci-kit learn, pandas, or numpy. I wrote my algorithms in C, C++, and Matlab. To refresh myself, I took some ML courses through Udacity. The good thing is that nothing has really changed; the only difference is the availability of better tools. There are now lots of R and Python packages that are available for machine learning. As a result, this time around, I focused more on general problem-solving.
With regards to analytics, as someone who advises startups, I understood the concepts broadly, but I lacked in understanding its business application. I found three books extremely useful in gaining an application based understanding for analytics: Lean StartUp, Lean Analytics, and Data Science for Business.
Step two: Get the interview
Don't be embarrassed to let people know that you are on the job market. While I was searching, I shared it on twitter. I got quite a few calls with great companies through social media.
"Find companies where your doctoral research can be an additional asset."
My advice is to find companies where your doctoral research can be an additional asset. If you did a Ph.D. in computational biology, a data scientist position at a bioengineering company would be a good fit. Go out of your way and have informal interviews with the data teams at these companies. Learn about their challenges and see where your skills and interest can help solve some of them. Befriend the data teams of these companies without necessarily looking for employment there. These data teams are going to be part of your professional network, the faster you see yourself as their potential colleague and start adding value to the eco-system, the quicker it will be for you to find the right fit.
Step three: Do the interviews
The biggest hurdle in gaining a foothold in the industry is learning to excel at the interviews. As with any interviewing process, it is not fun—and by being scientists with PhDs, it is even harder because we are not juniors. Instead, we are "untested" seniors.
"This decision by far was the most important decision I made with regards to entering the industry."
To ease my transition into industry, I applied for a post-doctoral data science fellowship. This decision by far was the most important decision I made with regards to entering the industry. From January till June of this year, I was a data science fellow at Insight Data Science in Palo Alto, California. What the fellowship provided was ample access to leading industry data science teams. This fellowship was not centered around learning the material; it assumed you already had the material but needed a guide to walk you through the process of gaining your first data science role. The fellowship is completely free to fellows, but the fellow is responsible for their own care for the duration of the fellowship.
The best part of my fellowship was working with my fellow fellows. We came from quantitative doctoral programs, some of us were even professors. For weeks we worked on our data projects and coached each other in interview prep. My cohort numbered around thirty. We all went through the interview process together. Having that group helped tremendously in accepting the inevitable rejection that comes through the interview process.
For my part, I started interviewing before I applied to my fellowship. I went on four interviews before the fellowship. For three of them, I got rejected at the data challenge stage, and for the final one, I got rejected after the onsite. I went to these interviews blind to the process. Even though those rejections hurt, it got me used to the language and thinking around data and products. It was these rejections that led me to apply to the fellowship. If I could do it again, I would go straight to the fellowship. By the time I finished with Insight, I was better prepared to interview. I understood the interview process as a dialogue between the company and I around what we could jointly do with data.
Companies whose interview process seemed designed around surfacing the old-school "most intelligent" candidate didn't work for me. These are companies that ask you to code a nearest neighbor algorithm from scratch and things like that. I was more interested in interviewing with companies whose data challenges were directly related to the kinds of data that was particular to that company.
Interviewing, in general, is hard. Interviewing as an untested senior is even harder. My advice is for candidates to take their time and know that ultimately they will match with the right company. If possible, have a lot of money saved up and start the process before you graduate. Realize that your training as a computational scientist puts you on a track where you can use your skills to do great things.
Step four: Choose a company
Be very careful where you choose to put your talent. While job searching, you can become tempted to accept a position that is not interesting to you because of desperation; as difficult as it to say no to a company who is interested in you, but you are not keen on, be brave and say no. If you are not naturally curious about the space the company is in, it will make your job as a data scientist painful.
"As difficult as it to say no to a company who is interested in you, but you are not keen on, be brave and say no."
In conclusion, if you are a doctoral student, make sure you understand your machine learning. If possible, take some classes at the business school of your institution if it has one. That will help you learn and understand the business case for data. When you are ready, apply for a post-doctoral fellowship like Insight Data Science. In the end, if this is something you want to do, know that there is a space for you. With patience and time, you will find the right match.
In my case, I ended up joining GitHub, a development platform for managing the process of software development. I ultimately made that decision based on my interest as well as my research background. For the entirety of my academic career, I spent my time studying the nature of computational intelligence. For the first half, I studied how to make machines reason like humans, for the latter half I studied how to make humans reason like machines.
Good luck with your transition from academia to industry.