A good business is all about people, process, and product, and above all I think people are the most important. And as we advance into a new age where data is king and we can make increasingly good predictions and have a significantly better understanding of our customers, we need to have the right data people on our team.
The core of a Big Data team is the data scientist: someone who knows how to understand data, has an inquisitive nature about them such that he or she knows the answers to questions you do not yet know to ask, and in general can lead the charge of the power of data in your business. But how do you go about hiring this person? What talents and elements do successful data scientists share that you should look for during the interview process?
I think above everything else there are three key elements of talent you will want to look for when hiring for the position of data scientist on your team. They are:
Someone who understands statistics and in particular has a good sense of large numbers.
A competency in statistics is certainly helpful, because the laws of mathematics and observation do not simply go away when you involve Big Data in the equation. But at the same time many statisticians do not have that “link” to vast quantities of data that are often part and parcel of teasing insights out of hundreds of millions of records consisting of unstructured data. You would ideally want to hire someone who has a thorough understanding of statistics but also can extrapolate that into working with billions of objects at a time. You want someone who understands how to tease out trends that traditional mathematical models do not necessarily support. You also want someone—for lack of a better way to describe this—that is more connected to real world, so while mathematical experience and competence is helpful, you may want to avoid candidates with advanced mathematical degrees and no experience outside academia in order to get a more well rounded hire who could contribute immediately to your projects and workload.
An ideal candidate would be someone who has a familiarity and an understanding about database design and implementation.
This sort of context will prove very helpful as the candidate could learn about your existing warehouses and then have a good sense of where unstructured data could be layered on top of that existing structured data in his or her analyses and presentations. In addition, new techniques can be tested and put into production with existing investments in database products like Oracle, MySQL, and SQL Server, and a working knowledge of these will be useful as your data scientist begins working on new initiatives. Often these sorts of experiments and tests can be run with smaller quantities of data in traditional solutions so an ability to get projects off the ground without spending a lot of money for specialized lab investments is a strong selling point for a candidate. Additionally, many NoSQL software vendors attempt to use SQL-like language in their products in an attempt to train traditional database administrators who have no desire to learn a MapReduce-like language. Knowledge of traditional SQL will continue to pay dividends, allowing data scientists to play nicely and integrate well with other database professionals that you already have on staff.
Someone who has a basic understanding of scripting and can work out problems in pseudocode.
Are you unfamiliar with pseudocode? It is essentially the prototype to a software routine or algorithm, where you lay out the logic of how the software would achieve its objective without getting bogged down in the specific syntax of the language that is being used—instead, the logic is just written out in plain English just to demonstrate the flow and the methods to arrive at a destination. If your data scientist understands scripting (Python, for instance, is hot right now in the data arena) and can also pseudocode his or her way through teasing out queries and building reports and analysis, then you have a solid candidate that is not afraid to show his logic and explain how he is going to get through a problem. It is also easier for your existing staff to collaborate with this new hire without necessarily needing for all of them to understand the same language. So your data scientist can talk about how to approach a problem and hand off the pseudocode to your IT guy who knows Python and away they both go.
These are just some qualities I believe are important when looking for a data scientist. Remember, when you are working with Big Data you want to make sure you are hiring the right person for the job!