Myths on Data Science

10 Myths of Data Science Exploded Totally

Myths on Data Science

There are multiple myths floating around that add a false aura around data science roles and the related industry.  The myths are built by people highlighting or even magnifying the benefits of data science with a number of use cases.  I am always using my own term “verbal experts” to classify such people in the market.

In this article, I try to take down some of these notions and give a much clearer picture of what data science really is.   Here are some misconceptions:

 

Buy Me A Coffee

    1. Best Tools Leads to Quick Return

There are lots of vendors highlighting their data analytics tools transforming business experts to data analytics experts.  Also, there are lots of training organization providing Python or R courses – linking to the term “data scientists”.

First of all, programming is not the core of the data science.  I am using IBM Cognos / IBM SPSS / Microsoft PowerBI with R programming or Python programming in data analytics by different cases for my clients.

Apart from the tools, tools are not directly related to the qualities.  There are quite a number of young people just picking the tools to “create” result set from the tools without the best practice.  It is common to see non-sense charting with tons of elements and everything in predictive analysis using either decision tree or regression.  Understanding how a certain technique works will help you become a better data scientist. This is why we encourage everyone to learn algorithms from scratch. Learn how changing a certain parameter will impact the final model. This will eventually pay off when you’re working on a large-scale project in the industry.

    1. IT Department Should Be Ready for Data Science Now

Most of the data science (or even business intelligence or data warehouse) projects are failed if only IT department is participated.  Technical people may never understand the business pain points even working in the organization because their role is never directly related to the business operations.  They are working on system side to support business operation.  There may be some IT experts with business knowledge but it is not expected IT department able to overcoming the gap of business knowledge and the ability on analyzing data.  Without proper training, there is no way for a successful analytic platform built by a team of IT experts.  There are lots of techniques involved.  I will try to share more on the related topics later.

  1. More Data, Higher Value Generated

Data is definitely the most important source of input in the data science applications.  It is not only the issue of quantity but the quality is more critical.   To take an example, the source data is only with 80% records guaranteed and you can’t expect your prediction result better than 80%.  In short, the good result is only riding on good quality data source in reasonable volume.  For instance, you can analyze 100 million transaction a day for a public transport corporations but it is far more different to analyze 3.5 billion Google search results.  Google may take sample for analysis rather than all of them.

    1. Business Intelligence Equals to Data Science

Some people are using Business Intelligence tools to do reports or analytics and then they are claiming themselves – “Data Scientists”.  For Business Intelligence, it is just related to “basic” data analytics.   However, the term Data Science is related to many different disciplines with processes, calculation methods and systems to extract knowledge and insights from data in all forms as both structured and unstructured.  With the involvement in advanced analytics, it is similar to data mining rather than Business Intelligence.

    1. Business Domain Experts Will Translate to Data Science Domain

There are lots of experienced people in different industries with operational and/or management practices for years.  However, it is changing your domain entirely and data science requires strong background and application abilities with the fields below:

  • Computer Science
  • Mathematics
  • Statistics
  • Programming

Personally, I was graduated long time ago with my bachelor degree in Information System (Computer Science related – but more focused on Database).  My first job was a programmer and I have found myself very interested in data analytics.  Then, I have joined SAS (Hong Kong) consulting team as a junior staff for the start of my transition.  After years, I have moved to a bank to do risk analytics and I was studying my part-time Master degree in Finance to help building my strong knowledge in statistics.  So, the transition is never a short journey.  Currently, I am also taking courses in Coursea and eDX with the latest technology and technique even managing a small team.

    1. Ph. D Degree as Primary Requirement of Data Science

First of all, it is very important to check with the research background of the doctor.  A business management expert could hold a doctor degree with very good management skills on data science project but it is never expected he could design a single data analytic solution.  Data Science is a combination of different skills by applications.  Data is never analyzed on a white-paper but with computer software, programming code and algorithms, etc.  However, it is good to have a very strong academic background with the fundamental mathematics, statistics and / or computing skills & techniques.

    1. A Full-time Data Science Degree is a must for making a Transition

There are lots of people applying data science jobs after gradated from a Data Science degree and thinking themselves are the best experts in the market.  Personally, I have got tens of such CV from different countries monthly but it is important to differentiate yourself with the competitions worldwide.  In my opinions & experiences, any technical skills could be trained and the best training environment is in the job with both formal training and practices in data science projects.

    1. Deep Learning Requires Computational Power that Only Top Companies Have

A deep learning model will always perform more efficiently when it is running on a powerful hardware.  However, it is not needed to have a supercomputer to work with deep learning and my demo machine for face recognition is just a PC with 16G Ram.  Also, it is important to understand the difference between Deep Learning, Machine Learning and AI.  You can check with my previous article to understand more at here.  On the other hand, you may get the required computing power flexibly from different cloud vendors including Amazon AWS, Google Cloud, IBM SoftLayer and Microsoft Azure, etc.

    1. Data Science is Only About Building Predictive Models

Predictive model is an important part of data science but not the fully coverage.

A typical data science lifecycle is:

  • Understanding the problem statement
  • Hypothesis building
  • Data collection
  • Verifying the data
  • Data cleaning
  • Exploratory analysis
  • Designing the model
  • Testing/Verifying the model
    • If an error is found, head back to the verification or cleaning stage
  • Putting it into production (deploying the model
  • Transforming Data Result into Action (such as suggest a new traveling route for a track)

Nothing is as straightforward as your classroom or virtual classroom.

    1. Data Science Competitions Translates to Real-Life Projects

Yes and no.  Most of the time, the competition environment is never similar to reality.  Data science competitions have clean and perfect source datasets.  For example, you have a robot to do stock trade with great algorithm but your robot may never experience the extreme environment like great depression.  Real world projects are involving bunches of stakeholders and people may far more difficult than your statistical models.  However, there are some competitions allowing individuals or teams picking different projects by solving real problems in the business world or society.  It will be better for a competition focused on solving a real problem.  The product built in the competition may be applied in the real world in the future.

 

To conclude, there are by some more myths in the data science area.  I would like to point out that every environment or individual is unique and other people’s successful case may not be applicable for your particular situation.  If you’re really interested in this area, you should read more books, attend courses and participating in real projects to build up your ability in the data science.

 

Samuel Sum

Data Science Evangelist (CDS, SDi)

Vice President (AS)

7Shares