Demystifying The Study of Maths For Data Science

Coopper Kingsley - January 23, 2022

"You need to learn more maths"

If you’re anything like me, you have been told this statement over and over and over and over again. ‘But what the heck does it mean?’, ‘where do I begin?’ You may ask. I agree you with, maths is a massive and daunting domain to tackle.

I, like many of the readers on this blog, have recently embarked on a technological adventure of a life time. I started mine two years ago, I came from a very non-tech background and most certainly did not deal with maths past your classic times tables. So, you and I are probably in the same boat, but if you’re not, keep reading, you might find this perspective interesting.

That’s it, context is set; in this article I am going to discuss with you about how you could go about learning more maths, and how it is actually applicable! Yea I know right, crazy? Making something applicable, wish school did more of that.

Anyway, back to it.

In the data science world, we can safely delineate three main areas of maths you should learn,

  1. Statistics
  2. Linear algebra
  3. Calculous

In my opinion, in that order, but I am happy to be shown differently.

This is not to discount other areas, such as geometry and wider algebraic concepts, but these will give you a very large step up in designing algorithms, building analytical reports, and proving your hypotheses.

I think the most important first step is to answer, when would either of these concepts be used?

Statistics is broken into two primary components; we have descriptive and inferential. Stats as a whole is fantastic for summarising data into single, digestible notations which can tell you whether variables correlate, to what extent you can trust the determination, and so on. The descriptive component will describe the data as you have it, for example standard deviation, mean, medium and mode, etc. Inferential stats allow you understand more about why something is happening; this can be differentiated into frequentist and Bayesian statistics. Frequentist is the assessment that an event will probably by the historic frequency that the event had occurred. Bayesian, on the other hand, begins with a subjective prediction of the probability the event has occurred and is then tested; there is a lot more to Bayesian statistics, but the intent of the calculation is to give a more realistic and worldly assessment of probability than what frequentist has to offer.

Linear algebra on the other hand is the study of the straight lines of the data, and is best described as focusing on the operations of variables and numbers. This paradigm allows us to regress from the data and understand it as vector. A vector is simply understanding ‘something’ in the way it conforms to rules of arithmetic. By understanding the data as a vector, we can create algorithms to model the data, this is useful for prediction and optimisation.

Calculous, conversely, is the study of curves of the data, and is best described as the focus on the operations of functions and their derivatives. While linear algebra aims to create a vector of representation, calculous aims to identify the rate of change in the specific variable. This rate of change is very important in prediction and optimization as it tells us how a variable will be affected as the measure is changed. Calculous is said to have two stages; firstly differentiation, also coined as the cutting stage. This is to break the curve down into smaller pieces to assess the rate of change of an increasingly straighter line.  The second stage is integration, or rebuilding stage, which is the reformation of the smaller lines so to recreate the curve. This is important for data science as we can model a variables behaviour as reacting to very specific moments.

Now, the second most important question is, how do we go about learning to use these?

A traditionalist will tell you to go by text books, learn the theory and do the exercises. The realist will tell you not to worry about theoretical maths and go straight to applied, meaning learn and use as necessary. And then we have those who say you should learn maths then learn python and what not, and those others who will say no you should learn programming then go to the maths.

I disagree with all the above. My answer? Do it all at once 😉

No, you think I am joking but I am not, hear me out.

In the modern world we are extremely lucky to have a lot of technology that has already done the real technical, theoretical heavy work for us. The issue is, if we do not know what to apply, how could we apply it? Remember my last post when I spoke about labelling a carrot a tomato, and if you want to learn about carrots but you keep searching for the term tomato, you will end up learning an awful amount about tomato’s and nothing about carrots? Yea well same applies here.

So, this is what I recommend, which I will be helping you out with. Learn the topics of the above areas well enough so you can say “yes, if I want to solve this issue, I am going to need differential calculus”. Now, you do not need to necessarily understand how calculus solves the fictional problem, you only need to know the terms and roughly why they relate.

Now, if you couple this by learning programming like Python, you can start to look at learning libraries like NumPy which enables you to call on a range of mathematical formulas as functions, put in the different variables, and the computer will do the rest. However, I will reiterate, if you do not know the concepts of maths to be applied, you will not know which function to call upon in Python.

So, in my opinion, get the basics of Python sorted first, do not worry, it is really simple. Then start to read up on the above maths areas while you learn and practice in python. After a couple of weeks you will be great guns and you will be ready to move across or down the T (if you do not know my reference, you really should read my last article).

Further to, Microsoft Azure has a fantastic service called the Azure Machine Learning Studio. Similar to python, this studio has a range of functions to select from, and without understanding the maths or programming, you can be running your own algorithms in just a few days. But again, you need to understand what maths concept you ought to be engaging.

Now, over the next few weeks I will continue posting to demystify maths so to help you understand better what specific maths concept you should be using for a specific problem.

Stay tuned. 

Written by

Coopper Kingsley

Associate Instructor

Hit me up at coopper.kingsley@withyouwithme.com


Stefania Cristina 'Differential and Integral Calculus - Differentiate with Respect to Anything' (21 July, 2021) Machine Learning Mastery - URL https://machinelearningmastery.com/differential-and-integral-calculus-differentiate-with-respect-to-anything/

'Algebra vs Calculus' (23 October 2020) Curmath - URL https://www.cuemath.com/learn/mathematics/algebra-vs-calculus/

‘Analytics Vidhya 'How to Learn Mathematics For Machine Learning? What Concepts do You Need to Master in Data Science?' (June 2021) URL - https://www.analyticsvidhya.com/blog/2021/06/how-to-learn-mathematics-for-machine-learning-what-concepts-do-you-need-to-master-in-data-science/

Justin Zeltzer 'Introduction to Bayesian Statistics' (1 Oct 2018) Z Statistics - URL http://www.zstatistics.com/videos#/statistical-inference

If you want to break into the tech industry then sign up to our platform and begin your training today.

Join our community

We have a Discord server where you’ll be able to chat with your instructors and cohort. Stay active in your learning!
Join discord