Demystifying Data Science: Conversation with David Tan, Heicoders Academy

Interview with David Tan, Heicoders Academy

Data Science and Data Analytics are undoubtedly trending buzzwords in recent years. According to the NUS, the data analytics industry is reported to contribute at least S$1 billion per year to Singapore’s economy. And with a salary scale from S$50K – S$78K, many professionals and jobseekers alike are drawn to the prospect of upskilling and getting a head start in their careers.

In all honesty, as a layman, Data Science projects looks to us like really complicated mashups of Math and Programming codes. So how hard is Data Science exactly and how long does one take to learn Data Fundamentals?

To help us understand more on embarking on our Data Science journey, we managed to get hold of David, Co-founder of Heicoders Academy, a fast-growing tech-education academy that specialises in Data Science, to speak to us.

Hi David, thank you taking the time to speak with us!

Q. Tell us a little bit about Heicoders and why the company chose to focus on Data Science specialization.

Heicoders Academy is a tech education provider started by a group of friends who studied in renowned STEM universities in the US and worked in Silicon Valley for a while.

We chose to focus on Data Science as we were especially impressed by the quality of Data Science education and working experience we were exposed to, and we wanted to bring that element back to Singapore. Eventually, we plan to also provide training in other tech verticals like Fintech and Full-Stack Development.

Heicoders Academy
Screenshot of Heicoders Academy Website

Q. Help us understand a little about Data Science and why does it matter?

Data Science is an interdisciplinary field that taps on a combination of scientific methods, algorithms and computational tools to extract insights from data.

Let me provide a more relatable example — imagine a supermarket hires a sales promoter to recommend products to customers to increase sales. After observing 100 customer purchases, this intelligent promoter learned that customers who bought eggs tend to also be willing to buy bacon upon some promotion from him.

Now in Data Science we do that too. Instead of having a human learn these patterns, we leverage on computers and mathematical models.

Data Science matters because there is a limit to how much data a human can process, and how many relationships he/she can uncover from the data. With Data Science, by relying on the powers of computers and models which can be scaled indefinitely, there is no limit to the amount of data we can go through, nor the number of insights and hidden patterns we can uncover!

I would liken the advent of data science to the industrial revolution.

In those days, companies that were quick to adopt industrial machines thrived, while others that relied solely on physical labor quickly got displaced. Similarly, Data Science is our generation’s “industrial revolution” and companies that don’t adopt it may risk getting displaced.

Q. We noticed that your courses have a heavy emphasis on Python programming. Why Python? And can someone with no formal IT / programming education be able to learn Python?

We chose to emphasise on Python for 2 reasons.

Firstly, the Python ecosystem has some of the most comprehensive and well-documented libraries for data science.

Libraries are pre-written codes written by other programmers which we can use in our code, instead of having to code it from scratch. Let me give a simple analogy so that you can appreciate the importance of this.

A world without libraries is like a world where skyscrapers are built brick by brick. Eventually they do get built, but builders endure a long and tiring process. Some bricks are badly laid now and then, which could render the building structurally unsafe. Libraries, however, can be likened to prefabricated parts of the skyscraper, which can be used to build safer structures more effortlessly and in a much shorter amount of time.

Python Programming Language

In Python, developers have created many wonderful visualisation and machine learning libraries which significantly shorten the time required to create useful products. This allows us to stand on the shoulders of giants.

The second reason is because Python is quite a verbose programming language that reads somewhat like English. This makes Python easier to read and write, and less daunting for beginners. As such, even someone with no formal IT / programming education can easily learn Python, especially with the help of structured guidance in the form of courses.

Q. What outcomes can one expect from attending Heicoders’ courses?

Students who have taken our AI100: Python Programming & Data Visualisation course will walk away with a strong foundation in Python Programming, and a good intuition for problem solving with computational thinking.

Here are examples of data visualisations and geospatial visualisations that students will be able to build by the end of the course.

Example Data Visualisations

AI200: Applied Machine Learning builds on the technical skills students acquired in AI100, where students perform more advanced data wrangling techniques and build interactive visualisations.

They will also be equipped with the fundamental intuition and practical skillsets to build machine learning models — with a focus on those well-used in the industry and on competitive data science platforms like Kaggle. The course culminates in a Kaggle competition where students compete to build the best predictive model, which will instill familiarity in the end-to-end process of training and evaluating machine learning models.

Our courses are all structured such that students not only take away strong technical skills, but also a portfolio of cool projects that will help to boost their employment opportunities.

Q. In your opinion, what are the possibilities of AI in the future?

We’re seeing mass adoption of data warehousing pipelines as companies seek to store data in larger volumes and in a more accessible manner. With this increase in quantity and quality of data sources, the possibilities of AI applications will only continue to expand.

We can expect to see data science and analytics departments expand significantly, and a lot more demand for data professionals in the coming years.

For AI adoption to be more widespread, there are hotly discussed issues to be addressed, including the challenges of data privacy and audit requirements. For instance, machine learning models trained on sensitive data could be exploited to extract private information such as Social Security Numbers.

Privacy issues like this are continually being researched on by academics worldwide, which brings forth better solutions. For instance, all practitioners must be educated on best practices for masking any sensitive input data properly before usage.

We must also consider that the audit and verifiability of AI-driven decisions must be made mandatory in the coming years. While there is a variety of machine learning models available, some are not as widely used due to limitations in interpretability.

Many corporates today opt to adopt tree-based models for example, which are easier to interpret than other complex models like Neural Networks. With the creation of better tools by industry pioneers to interpret these powerful black-box models, this would unleash a lot more applications for AI, to be utilised not only by researchers and scientists, but also the layperson.

Q. Are there any books or resources you would like to recommend for Data Science enthusiasts?

Here are 3 machine learning textbooks I recommend (in increasing order of difficulty):

  1. An Introduction to Statistical Learning (by Gareth James, Daniela Witten, Trevor Hastie & Robert Tibshirani)
  2. Elements of Statistical Learning (by Trevor Hastie, Robert Tibshirani & Jerome Friedman)
  3. Pattern Recognition and Machine Learning (by Christopher M. Bishop)

The beginner should start with Introduction to Statistical Learning and before referring to the other advanced material given that you’ve acquired the college-level math prerequisites, such as linear algebra and multivariate calculus.

The first resource is a fantastic broad-based introduction to machine learning that provides helpful code examples for hands-on learners.

I also highly recommend that enthusiasts explore the mathematics behind Machine Learning model mechanics to gain a deeper appreciation of this discipline, after first understanding the intuition behind these models.

Q. Any advice for learners starting out on the Data Science Journey?

Firstly, regardless of whether you choose to self-learn or take up courses to learn Data Science, I would encourage you to find yourself a mentor that can point you in the right direction.

As with other technical fields, beginners often don’t know they what don’t know, so having a mentor around would significantly speed up your progress and smoothen your learning curve.

Another advice is, when learning about machine learning models, don’t dive headfirst into the math.

Much of the math behind these models can get very complicated, and what I have observed is that people who started learning this way get bored or defeated quickly, and give up over time.

A better approach is to first form a graphical intuition of how the models work, and then write code to build them. Through evaluating these models on a variety of datasets, one would be able to consolidate a deeper understanding and appreciation of the models.

Thank you David for answering our questions. We have certainly benefited from your detailed and insightful responses and guidance for aspiring Data Professionals!


We hope this article is helpful to you as it is for us. If you are looking for structured Data Science courses be sure to check out Heicoders Academy and our post on Best Data Analytics and Data Science Courses. Have an awesome learning journey!

About David Tan

David is the Co-founder of Heicoders Academy. Before he founded Heicoders Academy, he worked as a Machine Learning Engineer in Droice Labs, a New-York-based AI company that provides solutions to the healthcare sector. Prior to that, he was a technical consultant for Louis Vuitton in the US. He has a Master of Management Science & Engineering from Columbia University. In his free time, he contributes articles to Medium on topics like workplace automation and data science.