Can Citizen Data Scientists be successful?

I came across a great article from the Harvard Business Review. The author, Kalyan Veeramachaneni, was part of a research group at MIT trying to understand what it would take for businesses to realise the full potential of their data by using machine learning techniques.

There were four principles from the MIT research, which are laid out well in the article you can read here.

Two of the four principles were,


  • “Stick with simple models”, and
  • “Explore more problems”


Interesting observation and feeds into my thinking at the moment. Although there is a need for data scientists testing and delivering models that have a high degree of accuracy in certain operational models, I don’t think business users are engaging with data enough. The amount of data that organisations have and continue to generate, together with the exponential ability of machines to process that data is not being met with a corresponding increase in questions being asked of that data. There just aren’t enough technically trained data scientists with a deep understanding of business available to address this.


Enter the Citizen Data Scientist

Gartner defines a citizen data scientist “as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics”. These are business users that have the ability to ask questions of the data – What is it that we don’t know? What questions have not been answered yet? – but don’t have the technical capability to connect to data, prepare it and build models to answer those questions.


What’s stopping us?

There are two main issues standing in the way of the Citizen Data Scientist:

One is an understanding of statistical modelling and the types of questions that can be asked of data. Let’s be honest. I’m sure Stats wasn’t the favourite subject of most sales and marketing professionals that are in business today! Organisations need to invest in training business users on data analysis techniques so they can understand what potential lies in the data. This doesn’t need to be a lengthy process. Training given to business users over one to two-day training courses can be enough to get them thinking about the possibilities and opportunities for analytics.

Secondly, teams need to take advantage of the huge leaps that have been made in technologies that enable analytics process automation. With these tools, any business user, considering they know where to look, can blend and prepare data sets and apply predictive analytics and machine learning models. These models help people generate deeper insights into their business by being able to identify the actual reasons for trends in data without having to apply opinion to a historical summarization of data in graphical form.

Think about using predictive analytics to pick up the tiny differences in transactions to determine whether they are fraudulent or being able to forecast the net profit of a customer using a future lifetime value model at the click of a button.

These answers don’t have to be 100% accurate. Just by getting business users to investigate data, the sheer volume of questions and answers that are going to be generated will uncover significant value. As long as logic and business understanding are applied to the answers, this can only be good for business.