In this blog series, I hope to demystify buzzwords like AI, machine learning, data science, deep learning and get you started with coding of deep neural networks.
Sounds familiar? If you are working in a techno-managerial position in Indian IT industry, you might have already been bombarded with buzzwords like AI, machine learning, deep learning, data science. In the past, you have encountered hundreds of buzzwords like OOP, agile, full stack, cloud, DevOps, IoT, NoSQL, big data, virtualization, containers, CICD, UML, UI/UX and so on. If you are wondering, why should you care about few more, then this blog series for you. If you are a fresher and have heard about deep learning or that data science is a lucrative career option, but have no idea what it entails, then this blog series for you. I hope that, by the end of the series, you will not only have basic understanding, but also be able to train a deep neural network by yourself. Although, the blog series has been designed keeping in mind Indian IT professionals, the blog series might be useful for a wider audience. The question-answer format is inspired by Prof. Stephen Hawking’s last book.
Artificial intelligence (AI) is already a big thing in the developed world.
The difference in excitement levels in China vs India is illustrated nicely Google Trends
In case you want to dig deeper, do check this excellent review of AI in China is by Siraj Raval.
Indian IT industry had been lagging behind in the adoption of AI. However, the NITI Ayog (Planning Commission) of India has already taken cognizance of the recent developments in AI and come up with National Strategy on Artificial Intelligence in June 2018. The list of countries which have come up with some vision document for AI in the past 2 years is long and growing fast. Indian strategy document acknowledges lag in adoption of AI technologies as compared to other nations.
From a technology perspective, the strategy is to maximise the late-movers’ advantage.
But, the resolve to make amends for the same is unambiguously stated.
Indian IT industry, is trying hard to catch up. At this time of this writing, Indian IT companies are already making attempts to use AI techniques in their solution offerings. So make sure that you are not left behind !
AI began with the idea of imitating human like intelligence. There have been many ups and downs in the interest levels over time. For significant milestones refer timeline of AI wikipedia page
Public perception of AI, thanks to movies like Terminator and science fiction books, is of a “cyborg” — a machine with biomechatronic parts capable of thinking and acting like a human. Fears have been expressed by the likes of eminent personalities like Prof. Hawking and Elon Musk about “technological singularity” — an event where a super-intelligent AI could turn hostile and wipe out human civilization.
TL;DR Summary: Always ask the practitioners, not the visionaries 😉
“Worrying about evil-killer AI today is like worrying about overpopulation on the planet Mars. Perhaps it’ll be a problem someday, but we haven’t even landed on the planet yet. This hype has been unnecessarily distracting everyone from the much bigger problem AI creates, which is job displacement.”
— Prof Andrew Ng, VP and chief scientist of Baidu; founder of Google Brain team, co-chair and co-founder of Coursera; adjunct professor at Stanford University; founder and CEO of Landing.ai
Machine learning (ML) field is a collection of problems like classification, regression, clustering, density estimation, dimensionality reduction, discriminant analysis, probabilistic graphical models, distribution learning, latent structure learning, feature learning and so on. These problems have some common characteristics:
In the recent years, algorithms from statistics and optimisation, have become popular in machine learning.
In case you have a mathematical and in particular statistical background, you might find this similar to “statistical modeling”. The key difference is the focus: in traditional statistical modeling, we try to model the relationship between dependent and independent variables. For further reading, I will refer you this blog
Example: Consider the problem of “churn prediction” — predicting which consumers are likely to unsubscribe from a service. We can make use of historical data, to collect “features” related to usage of service for each user and binary “labels” — churn / not churn. We “learn” parameters of a machine learning model (say a neural network) and make predictions on new unseen data to predict whether a customer will churn.
To understand “deep learning”, we need to be acquainted with some of the popular techniques of machine learning. As per Prof. Pedro Domingos, in his book “The Master Algorithm”, the machine learning research community is divided into five camps. These groups were led by eccentric genius professors seeking a master algorithm to emulate how brain works
Although, some people seek to very deep philosophical meanings and try to relate to working of the brain, at the end of the day, we only have some mathematical structures and techniques. We are nowhere close to the level of understanding of the brain. This point has been belabored, in this cartoon clip on the legendary researcher Prof. Geoff Hinton, called “The Deep Learning Saga”, in a very lighter note, by his own good friend and collaborator Prof. Yoshua Bengio.
The term “deep learning”, refers to the resurgence of the connectionist approach of “neural networks”. Before, getting into the meaning of the term “deep”, let’s try to understand neural networks. As you might have guessed (or recollected from what you studied in college) neural networks are mathematical models inspired by connections in the brain. But, the inspiration is only as much as aeroplanes are inspired by birds. Surely, aeroplanes do not flap their wings like birds. Similarly, neural networks don’t think like the brain. But they are extremely useful tool in many applications.
Let’s make things more concrete without getting into neurological inspirations. Lets say we wish to predict churn / not churn y, from all the features that we collected, arranged in vector denoted by x. We start by choosing a loss function like the familiar “mean squared error” between observations and predictions of our model (cross-entropy is often used in practice ). Next, we need to make choice of the model. Simplest one is to assume a linear relationship.
The machine learning problem is to find the best parameters W that explains our data (x, y) pairs. But, for complex problems such a nice linear relationship might not exist. So, we need to increase the complexity of the model. We can do so by adding non-linear “activation” functions. (Common choices are sigmoid, tanh, ReLU). Simplest model is “logistic regression”, as it is known in statistics. But, if we can make model complex quickly by composing nonlinear functions — output of one function becomes input of other. For example, a four layer network could be:
We can “learn” these parameters W from training data, by an algorithm called backpropagation with stochastic gradient descent. [ We will defer the discussion of the learning algorithms for later post. ] The compositional structure — function of a function of a function … — can be visualised as a graph structure where intermediate outputs are layers of nodes and parameter matrices (tensors in general) are the edges — usually shown in textbooks, articles, blogs etc
The “depth” of the neural network is the number of compositions of non-linear functions used in the neural network.
For example, the above network has a depth of 4.
I am sure, that this explanation must have sparked more questions in the neural networks of your brain — Why go deeper ? Why now ? What is so great about this ? and so on. We will discuss, these in the next blog.
As deep learning requires functions which are “differentiable” (have gradients aka derivatives), a more suitable term, as coined by Prof. Yann LeCun is “differentiable computing”
For a more astute reader:
Deep learning pioneers – Prof. Geoff Hinton, Prof. Yoshua Bengio and Prof. Yann LeCun have been awarded 2018 ACM Turing Award, which is considered equivalent to Nobel Prize for Computer Science.
A funny take on the term “data scientist”
On a more serious note, it is beyond doubt that technology in the first two decades is fuelled by data. Many solutions are based upon insights drawn from “big data” or even “complex data” (example, content inside a video clip). Traditionally, engineers specialising in computer science were trained in algorithms, asymptotic analysis, “discrete” mathematics etc while other engineering branches like civil, mechanical needed to learn “continuous” mathematics. But, as data takes centre stage, mathematics skills required by computer science has widened. Linear algebra, optimisation, multivariable calculus, probability and statistics have become important skills for analysing and drawing insights from data. Those with a math background might also pickup computer science skills and get into this kind of job. Data scientist is one who specialises in drawing insights from data.
Whether you use AI/ML techniques or just need to manage transactions, one thing that you can not escape is capturing, ingesting, storing, processing, retrieving large volumes of data. This has led to the development so called “big data” technologies. They involve “parallel and distributed” computing architectures, programming paradigms, databases to manage the needs of high volume data needs of businesses. A data engineer specializes in these parallel and distributed computing paradigms to manage huge volumes of data.
To gauge the importance of the engineering aspects involved, have a look at the figure from paper published at NIPS conference, by the one the world leaders in the field of AI/ML — Google.
Considering how data driven products and solutions have revolutionised our lives, skills that are perceived as specialisations today might become basic requirements from computer scientists and engineers in the near future. So, these are skills that one must pick up apart from those taught in the slow moving curriculums of engineering colleges in India. For the managers and developers, it is important to be updated, so that onecan adapt to the changing business ecosystem.
SUMMARY
In this blog post I tried to clear the halo surrounding the terms AI, ML, deep learning, data science and data engineering.
NEXT
In the next post, I will try to answer few more questions addressing some misconceptions on deep learning techniques. Then, we will “deep” dive into “learning” [pun intended] ! For now, I will leave you with a comical quote from an anonymous grad student:
Impressed with machine learning ? I am learning more than the machine, while training these networks, but no one seems to care 🙁