# Learning from Data

Before we go into the into the nitty-gritty of deep learning models and algorithms, we must first set up the learning problem. Setting up the learning problem and getting machines to learn from it has been a multi-decade journey.&#x20;

Borrowing from "Deep Learning Book" and quoting Tom Mitchell's [Machine Learning](https://www.cs.cmu.edu/~tom/files/MachineLearningTomMitchell.pdf):&#x20;

> A computer program is said to learn from experience E with respect\
> to some class of tasks T and performance measure P, if its performance at tasks in\
> T, as measured by P, improves with experience E.

Even in 1997, we were using this broad definition. The experience, the task, and the performance measure can take many different forms. Perhaps unsurprisingly, we can understand the field of deep learning from the learning problem perspective. Instead of looking at the problem through the lens of types of models, we can look at deep learning as the types of problems being solved.

You have probably read about **supervised**, **unsupervised**, **semi-supervised**, and **self-supervised** learning.&#x20;

The prediction problem is generally the most natural.  Given some input information, **x**, we want to predict some value **y** associated with it. In other words, our **task** is to predict **y** given **x.** So, what is our **experience** and **performance measure**?  &#x20;

What if we had many examples of such tuples **(x, y)**?  **x** is data obtained from an unknown data distribution p(**x**).  We cannot write down an analytical definition of this distribution as it is too complicated, but we may generate millions of samples from it to create an empirical approximation. We can use this as our experience for our program. We can use techniques like maximum likelihood estimation (MLE) or maximum a posteriori estimation (MAP) for some statistical models to fit our data. We can use techniques from curve fitting to best our data. This is [**supervised learning.**](/intro-to-deep-learning/learning-tasks/supervised-learning.md)  If our dataset matches the real world, we can take this approach to learn predictors that can be used to predict data outside our dataset.

What if we didn't have labels in our dataset? Much of the world's data does not come with neat captions, classifications, or labels. Yet, there is still a lot of useful information that we could "learn". A possible solution is to fake supervision. A common scenario is when we only have a few labeled samples and many unlabeled samples. If we make certain assumptions about the distribution of the input **x,** we can learn models that use both unlabeled and labeled data for predictions. This is the premise for [semi-supervised learning](/intro-to-deep-learning/learning-tasks/semi-supervised-learning.md). What if we had no labels at all? Instead, we wanted to learn the structure of the data; project it to a latent space that is significantly smaller than the ambient space.  What if we wanted the model to be able to compare and contrast our inputs? [Semi-supervised learning](/intro-to-deep-learning/learning-tasks/self-supervised-learning.md) poses the learning task as a challenge to learn without any labels, but still differentiate between inputs. This results in developing interesting performance measures such as contrastive and denoising losses that we will cover in the future.&#x20;

What if our task were not to predict a value, but to obtain new samples from the data distribution p(**x**)? As you can probably guess, that is [unsupervised learning](/intro-to-deep-learning/learning-tasks/unsupervised-learning.md). This simple idea of generating from an underlying distribution underpins nearly all of our modern LLM advances. We will definitely dive deeper into&#x20;

*How does deep learning fit in here?* There is a whole world of predictors that we can use. Supervised deep learning is the study of using neural networks for the supervised learning problem. We use squared error, absolute error, or cross-entropy as the performance measure and use gradient descent to learn from the experience.   &#x20;

**Task:**

1\) What are some assumptions on the data distribution p(**x**) that would make semi-supervised learning possible?&#x20;

2\) Can you take this framework of experience, task, and performance and describe **unsupervised learning?**

3\) How does gradient descent fit into this way of describing learning?

4\) Can you suggest any other ways to search (hint) for supervised learning on data?&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://szaman.gitbook.io/intro-to-deep-learning/learning-tasks/learning-from-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
