# Getting Started with PyTorch

#### PyTorch is a Python (and C++) library for deep learning.

To be a good deep learning library, PyTorch accomplishes a few things.

1. A core data structure to represent the information **(tensors)**
2. Numerical Optimization \* **(autograd)**
3. Neural Network Library

(\*) In this case, optimization is related to the optimization of parameters for a learning problem. It is used in the same sense as a convex optimization.

A few other things PyTorch is rather good at are neural network modelling, optimized linear algebra, and hardware acceleration.

***

Today, we'll spend almost the entire time talking about the core data structure and autograd functionality of PyTorch. PyTorch is also a fully featured neural network library. And it has to be, to be a good deep learning library. We will leave the specifics of neural networks for the future, and only graze `torch.nn` module

### Tensors

***

Without falling into too deep a hole of statistical learning theory and machine learning theory, generally speaking, the task of supervised learning for a machine learning algorithm or any statistical model is the same. The task can simply be described for some input, output pair$$(x,y) \in D$$, learning $$D$$ is some set of data, we want to approximate a function $$\cdot$$ such that,

$$
y \approx f(x)
$$

$$
f(x) = x \* e^{2 pi i \xi x}
$$

Now keep in mind, $$x \in\mathbb{R}^N$$ ,$$y \in \mathbb{R}^C$$ and so $$f(\cdot)$$ is some arbitrary function $$\mathbb{R}^N \rightarrow \mathbb{R}^C$$.

Some terminology here: the input $$x$$ will often be called **feature set**, **examples,** and **inputs**. Each item in $$x$$ may be called a **feature**. The output $$y$$ will often be called **labels**, **predictions**, and **outputs**. Each of these names is often used interchangeably, causing unnecessary confusion.

***

Since deep learning is in the business of learning the aforementioned function, we notice that the inputs and outputs have a natural representation as vectors. Now, all data might not be serial and may have spatial connections between each feature (like pixels in an image, which are arranged in a grid). But generally turns out, machine learning and deep learning in particular require **multidimensional arrays** or **tensors**.

Now there may be some point of contention regarding calling N-dimensional arrays tensors in the physics community, but it's best not to dwell on it. The easiest way to imagine this data structure is:

1. Some scalars can be of varied data types such as integers, floats, doubles, and complex. (0D)
2. A collection of scalars is a vector (1D)
3. A collection of vectors is a matrix (2D)
4. A collection of matrices is a tensor (3D)
5. A collection of tensors is a tensor (Yeah I know ... we ran out of words)

For PyTorch, the basic building block is a tensor of N dimensions. Data being used in PyTorch libraries must be captured and represented by tensors. Tensors have **shapes** depending on their dimensionality.

A color image is a 3D tensor with the shape ( Channel, Height, Width).

***

### PyTorch vs Numpy

#### Introduction to the computation graph

If you've used Numpy (a popular multidimensional array library), you probably asked, how are PyTorch Tensors different from Numpy Arrays. The answer is ... **they aren't!!** BUT kind of are.

PyTorch and Numpy have seamless interoperability. Any PyTorch tensor can be cast down to a Numpy array, and a Numpy array can be cast up to a Torch tensor.

If you noticed, I somewhat implied, the tensors have a higher "ranking" than the Numpy arrays. By "ranking," I mean PyTorch Tensors have a superset of capabilities compared to Numpy arrays. The reason is they have one additional, all-important feature: **they keep track of the graph of computations that created them** (and also can be sent to a GPU for operations).

### Let's take a quick look at tensors

First, import the torch module and also check what version we are running

```python
import torch
print(torch.__version__)
```

Next, let's turn a Python list into a tensor

```python
# creates a tensor for Python List
torch_tensor = torch.tensor([1,2,3])

# Converts a numpy array to a Torch tensor
# torch_tensor_from_numpy = torch.from_numpy(np_array)

# Convert a torch tensor to Numpy
np_array_from_tensor = torch_tensor.numpy()

print(torch_tensor)
print(torch_tensor.shape)
```

### What can you do with tensors?

The following group of operations

1. Creations ops
2. Indexing + Slicing
3. Math ops
4. Random Sampling
5. Serialization

[The full exhaustive list and more details are here:](https://pytorch.org/docs/stable/tensors.html)\
\
\
(2) AutoGrad

***

We move on to the next most important building block of PyTorch, the `autograd`. If we remember, our main goal for machine learning is to model some function $$f(\cdot)$$ using some known information$$(x,y) \in \mathbb{D}$$. We've learned how to represent $$x$$ and $$y$$ in terms of `tensors` so that we can use the PyTorch library and perform mathematical operations. But how do we use that to **'learn'** the function?

Again, without going too much into statistics and machine learning theory, we cast it as a function fitting problem. **We transform the problem into a parameter estimation problem of neural networks, where some weights parameterize the network `W`.** One important theoretical finding of neural networks asserts that neural networks are universal approximators. Sufficiently wide or deep feed-forward neural networks with a nonlinear activation function are capable of approximating any d-dimensional continuous function.

The process of learning, especially in deep learning can be broken into 3 parts:

1. Parametrize your model (choose the number of neurons, hidden layers, architecture, etc.)
2. Compute the prediction based on the parameters
3. Using the true value, update the parameters of the model. RINSE. REPEAT.

***

#### A common line of questioning

**What do you mean by parameter estimation?**

We have a candidate function $$G\_{W}(x) = y\_{pred}$$, where the $$W$$ is updated such that some error between the predicted value $$y\_{pred}$$ and ground truth $$y$$ is minimized. The error is $$Err(y\_{pred},y)$$.

**How do we minimize the error?**

One common way is stochastic gradient descent. More generally, first-order, first-order or second-order (derivative) iterative optimization methods are the go-to for problems that don't have analytic solutions. This is the numerical optimization mentioned before. Very few ML models (linear regression, ridge regression) have analytic solutions. SGD iteratively updates the weights $W$ to find the minimum.

**How do we update the weights?**

Stochastic Gradient Descent gives us a formula,

$$
W^{t+1}*{i} = W^{t}*{i} - \eta \frac{\partial Err}{\partial W^{t}\_i}
$$

$$\eta$$ is the **learning rate** or step size for the updates.

**Wait, neural networks can have billions of weights and complicated computations. How can we calculate the gradient?**

Automatically, of course! Using PyTorch AutoGrad functionality. (The auto in autograd is automatic).

***

#### Enough theory. Let's do calculus homework!

The way autograd works is by keeping track of the history of the `tensor`. As we'd discussed before in PyTorch land, `tensors` are first-class citizens, the building blocks. `Tensors` have the ability to "remember" where they came from, or at least the order of operations it took from the parent tensors.

**Why is the history important?**

Well, if you know the history of the basic operations, and you know the derivatives of those operations, you can just utilize the `chain rule` to calculate your analytical derivative at that point!\
\
Let's see some `tensors` with the track history feature turned on.

```python
t = torch.tensor(2.0, requires_grad=True) # A scalar with the history turned on
x = t + 2
print(x) 
```

Torch provides quite a few mathematical operations that are differentiable\\

```python
t = torch.tensor(2.0, requires_grad=True) # A scalar with the history turned on
x = t*2
x.backward()
print(t.grad) # What do you expect?
```

\
Let's try something different. What if we function that is discontinuous and not differentiable at some points?\\

```python
def i_hate_threes(tensor):
  if tensor == 3.0:
    return torch.tensor(0.0, requires_grad=True)
  else:
    return tensor
```

```python
t_1 = torch.tensor(2.0, requires_grad=True) # A scalar with the history turned on
x_1 = i_hate_threes(t_1)
x_1.backward()
print(t_1.grad) # What do you expect?
```

Hmm, that didn't go as expected. Looks like the discontinuity affected our gradient. Well, fear not, we can have arbitrary definitions of derivatives!\
In the following example, we will create a custom function that will allow us to override the `backward` method that allows us to define a custom gradient\
for the function. The `backward` method is called during the backpropagation step of the training process.\\

```python
class i_hate_threes_v2(torch.autograd.Function):
  @staticmethod
  def forward(ctx, tensor):
    ctx.save_for_backward(tensor)
    if tensor == 3.0:
      tensor = torch.tensor(0.0, requires_grad=True)
    return tensor
  @staticmethod
  def backward(ctx, grad_output):
    input,  = ctx.saved_tensors
    grad_input = grad_output.clone()
    if input == 3.0:
      return torch.tensor(0.0, requires_grad=True)
    return grad_input
  def __call__(self, tensor):
    return self.apply(tensor)
```

Let's try the same example again, but this time with our custom function.\\

```python
t_1 = torch.tensor(2.0, requires_grad=True) # A scalar with the history turned on
func = i_hate_threes_v2()
x_1 = func(t_1)
x_1.backward()
print(t_1.grad) # What do you expect?

t_2 = torch.tensor(3.0, requires_grad=True) # A scalar with the history turned on
x_2 = func(t_2)
x_2.backward()
print(t_2.grad) # What do you expect?
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://szaman.gitbook.io/intro-to-deep-learning/deep-learning-engineering/getting-started-with-pytorch.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.