In this blog series, I hope to demystify buzzwords like AI, machine learning, data science, deep learning and get you started with coding of deep neural networks.

Sounds familiar? If you are working in a techno-managerial position in Indian IT industry, you might have already been bombarded with buzzwords like AI, machine learning, deep learning, data science. In the past, you have encountered hundreds of buzzwords like OOP, agile, full stack, cloud, DevOps, IoT, NoSQL, big data, virtualization, containers, CICD, UML, UI/UX and so on. If you are wondering, why should you care about few more, then this blog series for you. If you are a fresher and have heard about deep learning or that data science is a lucrative career option, but have no idea what it entails, then this blog series for you. I hope that, by the end of the series, you will not only have basic understanding, but also be able to train a deep neural network by yourself. Although, the blog series has been designed keeping in mind Indian IT professionals, the blog series might be useful for a wider audience. The question-answer format is inspired by Prof. Stephen Hawking’s last book.

Artificial intelligence (AI) is already a big thing in the developed world.

The difference in excitement levels in China vs India is illustrated nicely Google Trends

In case you want to dig deeper, do check this excellent review of AI in China is by Siraj Raval.

Indian IT industry had been lagging behind in the adoption of AI. However, the NITI Ayog (Planning Commission) of India has already taken cognizance of the recent developments in AI and come up with National Strategy on Artificial Intelligence in June 2018. The list of countries which have come up with some vision document for AI in the past 2 years is long and growing fast. Indian strategy document acknowledges lag in adoption of AI technologies as compared to other nations.

From a technology perspective, the strategy is to maximise the late-movers’ advantage.

But, the resolve to make amends for the same is unambiguously stated.

Indian IT industry, is trying hard to catch up. At this time of this writing, Indian IT companies are already making attempts to use AI techniques in their solution offerings. So make sure that you are not left behind !

AI began with the idea of imitating human like intelligence. There have been many ups and downs in the interest levels over time. For significant milestones refer timeline of AI wikipedia page

Public perception of AI, thanks to movies like Terminator and science fiction books, is of a “cyborg” — a machine with biomechatronic parts capable of thinking and acting like a human. Fears have been expressed by the likes of eminent personalities like Prof. Hawking and Elon Musk about “technological singularity” — an event where a super-intelligent AI could turn hostile and wipe out human civilization.

- Despite all the valiant efforts of building super intelligent machines and remarkable breakthroughs in machine learning
**we are nowhere close****to building human or superhuman like intelligent machines**. - The term AI used in the
**context of IT applications****does not refer to the original objective**of human or superhuman level intelligent machine. A few people very carefully choose terms to make the distinction clear — artificial general intelligence (AGI) vs artificial intelligence (AI), strong AI vs weak AI. Most people however prefer to use the term AI because it sounds cool. A very insightful article clarifying this confusion called “Artificial Intelligence — the revolution has not happened yet” by Prof. Michael I. Jordan - The efforts in AI led to development of many techniques like
**machine learning (ML), pattern recognition and data mining**. These**techniques are referred to as AI i**n the**context of IT applications**. Together with advances in the field of parallel and distributed computing and the field of storage, processing and management of large amounts of data, these AI techniques led to what Prof. Andrew Ng refers to as**“virtuous cycle of AI”**. Software products and services based on data are used by a large number of users. The users in turn produce a large amount of new data.**Applying****AI techniques on this data helps discover new information and insights which feedback into the product to improve the system.**For example, consider app like Uber / Ola which provides ride management services to drivers and passengers. The service is based upon data from two sources: GPS locations and Google maps. Each ride produces a trajectory of GPS coordinates. With AI techniques, new insights can be drawn like preferred routes, regions with greater volume of business, demand-supply dynamics, optimal routes etc. These insights enrich the service making it better over time.

TL;DR Summary: Always ask the practitioners, not the visionaries 😉

“Worrying about evil-killer AI today is like worrying about overpopulation on the planet Mars. Perhaps it’ll be a problem someday, but we haven’t even landed on the planet yet. This hype has been unnecessarily distracting everyone from the much bigger problem AI creates, which is job displacement.”

— Prof Andrew Ng,

VP and chief scientist of Baidu; founder of Google Brain team, co-chair and co-founder of Coursera; adjunct professor at Stanford University; founder and CEO of Landing.ai

Machine learning (ML) field is a collection of problems like classification, regression, clustering, density estimation, dimensionality reduction, discriminant analysis, probabilistic graphical models, distribution learning, latent structure learning, feature learning and so on. These problems have some common characteristics:

- There is an underlying unknown pattern (function) we are trying to discover
- It is hard to express this pattern analytically
- We have a set of examples (data) where this pattern is exhibited
- We assume a “model” which explains the pattern
- Treating the dataset as known we try to calculate the unknown model parameters. This process of fitting model to data is referred to as “training”.
- We define a “loss function” which measures how well our parameterised model fits the data.
- Thus, learning is often formulated as an optimisation problem of finding the best parameters for our choice of model.
- Once we have a trained model, we can make predictions on unseen data. This is often referred as “inference” in ML literature.

In the recent years, algorithms from statistics and optimisation, have become popular in machine learning.

In case you have a mathematical and in particular statistical background, you might find this similar to “statistical modeling”. The key difference is the focus: in traditional statistical modeling, we try to model the relationship between dependent and independent variables. For further reading, I will refer you this blog

**Example:** Consider the problem of **“churn prediction” **— predicting which consumers are likely to unsubscribe from a service. We can make use of historical data, to collect “features” related to usage of service for each user and binary “labels” — churn / not churn. We “learn” parameters of a machine learning model (say a neural network) and make predictions on new unseen data to predict whether a customer will churn.

To understand “deep learning”, we need to be acquainted with some of the popular techniques of machine learning. As per Prof. Pedro Domingos, in his book “The Master Algorithm”, the machine learning research community is divided into five camps. These groups were led by eccentric genius professors seeking a master algorithm to emulate how brain works

**Symbolists**inspired by logic and philosophy, believe in inverse deduction**Connectionists**inspired by neuroscience, tend to use data structures which are graphs where nodes and edges correspond to neurons and connections between them**Evolutionaries**have their bets onalgorithms inspired by biological evolution**Bayesians**take the statistical approach of probabilistic inference**Analogizers**inspired by psychology and tend touse notions of similarity, margins

Although, some people seek to very deep philosophical meanings and try to relate to working of the brain, at the end of the day, we only have some mathematical structures and techniques. We are nowhere close to the level of understanding of the brain. This point has been belabored, in this cartoon clip on the legendary researcher Prof. Geoff Hinton, called “The Deep Learning Saga”, in a very lighter note, by his own good friend and collaborator Prof. Yoshua Bengio.

**The term “deep learning”, refers to the resurgence of the connectionist approach of “neural networks”**. Before, getting into the meaning of the term “deep”, let’s try to understand neural networks. As you might have guessed (or recollected from what you studied in college) neural networks are mathematical models inspired by connections in the brain. But, the inspiration is only as much as aeroplanes are inspired by birds. Surely, aeroplanes do not flap their wings like birds. Similarly, neural networks don’t think like the brain. But they are extremely useful tool in many applications.

Let’s make things more concrete without getting into neurological inspirations. Lets say we wish to predict churn / not churn y, from all the features that we collected, arranged in **vector **denoted by** x**. We start by choosing a loss function like the familiar “mean squared error” between observations and predictions of our model (cross-entropy is often used in practice ). Next, we need to make choice of the model. Simplest one is to assume a **linear **relationship.

The machine learning problem is to find the best parameters W that explains our data (x, y) pairs. But, for complex problems such a nice linear relationship might not exist. So, we need to increase the complexity of the model. We can do so by adding **non-linear “activation” functions**. (Common choices are sigmoid, tanh, ReLU). Simplest model is “logistic regression”, as it is known in statistics. But, if we can make model complex quickly by composing nonlinear functions — output of one function becomes input of other. For example, a four layer network could be:

We can “learn” these parameters W from training data, by an algorithm called backpropagation with stochastic gradient descent. [ We will defer the discussion of the learning algorithms for later post. ] The compositional structure — function of a function of a function … — can be visualised as a graph structure where intermediate outputs are layers of nodes and parameter matrices (tensors in general) are the edges — usually shown in textbooks, articles, blogs etc

**The “depth” of the neural network is the number of compositions of non-linear functions used in the neural network.**

For example, the above network has a depth of 4.

I am sure, that this explanation must have sparked more questions in the neural networks of your brain — Why go deeper ? Why now ? What is so great about this ? and so on. We will discuss, these in the next blog.

As deep learning requires functions which are “differentiable” (have gradients aka derivatives), a more suitable term, as coined by Prof. Yann LeCun is “differentiable computing”

For a more astute reader:

- The functional composition need not be restricted to a chain structure, it could be a directed acyclic graph (DAG)
- If you are comparing neural network with a probabilistic graphical model (PGM), in the latter the nodes are random variables and edges represent dependencies, while in the former structure is not stochastic.

Deep learning pioneers – Prof. Geoff Hinton, Prof. Yoshua Bengio and Prof. Yann LeCun have been awarded 2018 ACM Turing Award, which is considered equivalent to Nobel Prize for Computer Science.

A funny take on the term “data scientist”

On a more serious note, it is beyond doubt that technology in the first two decades is fuelled by data. Many solutions are based upon insights drawn from **“big data” **or even **“complex data” **(example, content inside a video clip). Traditionally, engineers specialising in computer science were trained in **algorithms, asymptotic analysis**, **“discrete” mathematics** etc while other engineering branches like civil, mechanical needed to learn **“continuous” mathematics**. But, as data takes centre stage, mathematics skills required by computer science has widened.** Linear algebra, optimisation, multivariable calculus, probability and statistics** have become important skills for analysing and drawing insights from data. Those with a math background might also pickup computer science skills and get into this kind of job. Data scientist is one who specialises in drawing insights from data.

Whether you use AI/ML techniques or just need to manage transactions, one thing that you can not escape is capturing, ingesting, storing, processing, retrieving large volumes of data. This has led to the development so called **“big data” **technologies. They involve “**parallel and distributed”** computing architectures, programming paradigms, databases to manage the needs of high volume data needs of businesses. A data engineer specializes in these parallel and distributed computing paradigms to manage huge volumes of data.

To gauge the importance of the engineering aspects involved, have a look at the figure from paper published at NIPS conference, by the one the world leaders in the field of AI/ML — Google.

Considering how data driven products and solutions have revolutionised our lives, skills that are perceived as specialisations today might become basic requirements from computer scientists and engineers in the near future. So, these are skills that one must pick up apart from those taught in the slow moving curriculums of engineering colleges in India. For the managers and developers, it is important to be updated, so that onecan adapt to the changing business ecosystem.

**SUMMARY**

In this blog post I tried to clear the halo surrounding the terms AI, ML, deep learning, data science and data engineering.

**NEXT**

In the next post, I will try to answer few more questions addressing some misconceptions on deep learning techniques. Then, we will “deep” dive into “learning” [pun intended] ! For now, I will leave you with a comical quote from an anonymous grad student:

Impressed with machine learning ? I am learning more than the machine, while training these networks, but no one seems to care 🙁