In my last blog post, we’ve learned how to work with PyTorch tensors, the most important object in the PyTorch library. Tensors are the backbone of deep learning models so naturally we can use them to fit simpler machine learning models to our datasets.

Although PyTorch is known for its Deep Learning capabilities, we are also able to fit simple linear models using the framework — and this is actually one of the best ways to get familiar with the `torch`

API!

In this blog post, we’re going to continue with the PyTorch introduction series by checking how we can develop a simple linear regression using the `torch`

library. In the process, we’ll learn about `torch`

Optimizers, Weights and other parameters of our learning model, something that will be extremely useful for more complex architectures.

Let’s start!

**Loading and Processing Data**

For this blog post, we’ll use the song popularity dataset where we’ll want to predict the popularity of a certain song based on some song features. Let’s take a peek at the head of the dataset below:

`songPopularity = pd.read_csv(‘./data/song_data.csv’)`

Some of the features of this dataset include interesting metrics about each song, for example:

- a level of song “energy”
- a label encoding of the key (for example, A, B, C, D, etc.) of the song
- Song loudness
- Song tempo.

Our goal is to use these features to predict the song popularity, an index ranging from 0 to 100. In the examples we show above, we are aiming to predict the following song popularity:

Instead of using `sklearn`

, we are going to use PyTorch modules to predict this continuous variable. The good part of learning how to fit linear regressions in `pytorch`

? The knowledge we’re going to gather can be applied with other complex models such as deep-layer neural networks!

Let’s start by preparing our dataset and subset the features and target first:

```
features = ['song_duration_ms','acousticness', 'danceability', 'energy', 'instrumentalness','key', 'liveness', 'loudness', 'audio_mode', 'speechiness','tempo', 'time_signature', 'audio_valence']
target = 'song_popularity'
songPopularityFeatures = songPopularity[features]
songPopularityTarget = songPopularity[target]
```

We’ll use`train_test_split`

to divide the data into train and test. We’ll perform this transformation before transforming our data into`tensors`

as `sklearn`

‘s method will automatically turn the data into `pandas`

or `numpy`

format:X_train, X_test, y_train, y_test = train_test_split(songPopularityFeatures, songPopularityTarget, test_size = 0.2)

With `X_train`

, `X_test`

, `y_train`

and `y_test`

created, we can now transform our data into `torch.tensor`

— doing that is easy by passing our data through the `torch.Tensor`

function:

```
import torch
def dataframe_to_tensor(df):
return torch.tensor(df.values, dtype=torch.float32)
X_train = dataframe_to_tensor(X_train)
X_test = dataframe_to_tensor(X_test)
y_train = dataframe_to_tensor(y_train)
y_test = dataframe_to_tensor(y_test)
```

Our objects are now in`torch.Tensor`

format, the format expected by the `nn.Module`

. Let’s see `X_train`

below:

Cool — so we have our train and test data in `tensor`

format. We’re ready to create our first `torch`

model, something that we will do next!

# Building our Linear Model

We’ll train our model using a `LinearRegressionModel class`

that inherits from the `nn.Module`

parent. The `nn.Module`

class is Pytorch’s base class for all neural networks.

```
import torch.nn as nn
class LinearRegressionModel(nn.Module):
'''
Torch Module class.
Initializes weights randomly and gets trained via the train method.
'''
def __init__(self, optimizer):
super().__init__()
self.optimizer = optimizer
# Initialize Weights and Bias
self.weights = nn.Parameter(
torch.randn(1, 5, dtype=torch.float),
requires_grad=True
)
self.bias = nn.Parameter(
torch.randn(1, 5, dtype=torch.float),
requires_grad=True
)
```

In this class, we only need a single argument when creating an object — the `optimizer`

. We’re doing this as we’ll want to test different optimizers during the training process. In the code above, let’s zoom in on the weight initialization right after `# Initialize Weights and Bias`

:

```
self.weights = nn.Parameter(
torch.randn(1, 13, dtype=torch.float),
requires_grad=True
)
self.bias = nn.Parameter(
torch.randn(1, dtype=torch.float),
requires_grad=True
)
```

A linear regression is a very simple function consisting of the formula `y = b0 + b1x1 + ... bnxn`

where:

- y is equal to the target we want to predict
- b0 is equal to the bias term.
- b1, …, bn are equal to the weights of the model (how much each variable should weight in the final decision and if it contributes negatively or positively).
- x1, …, xn are the values of the features.

The idea behind the `nn.Parameter`

is to initialize * b0* (the bias being initialized in

`self.bias`

) and the *(the weights being initialized in*

*b1, … , bn*`self.weights`

). We are initializing 13 weights, given that we have 13 features in our training dataset.As we’re dealing with a linear regression there’s only a single value for the bias, so we just initialize a single random scalar (check my first post if this name sounds alien to you!). Also, notice that we’re initializing these parameters randomly using `torch.randn`

.

Now, our goal is to optimize these weights via backpropagation — for that we need to setup our linear layer, consisting of the regression formula:

```
def forward(self, x: torch.Tensor) -> torch.Tensor:
return (self.weights * x + self.bias).sum(axis=1)
```

The `trainModel`

method will help us perform backpropagation and weight adjustment:

```
def trainModel(
self,
epochs: int,
X_train: torch.Tensor,
X_test: torch.Tensor,
y_train: torch.Tensor,
y_test: torch.Tensor,
lr: float
):
'''
Trains linear model using PyTorch.
Evaluates the model against the test set for every epoch.
'''
torch.manual_seed(42)
# Create empty loss lists to track values
self.train_loss_values = []
self.test_loss_values = []
loss_fn = nn.L1Loss()
if self.optimizer == 'SGD':
optimizer = torch.optim.SGD(
params=self.parameters(),
lr=lr
)
elif self.optimizer == 'Adam':
optimizer = torch.optim.Adam(
params=self.parameters(),
lr=lr
)
for epoch in range(epochs):
self.train()
y_pred = self(X_train)
loss = loss_fn(y_pred, y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Set the model in evaluation mode
self.eval()
with torch.inference_mode():
self.evaluate(X_test, y_test, epoch, loss_fn, loss)
```

In this method, we can choose between using between Stochastic Gradient Descent (SGD) or Adaptive Moment Estimation (* Adam*) optimizers. More importantly, let’s zoom in on what happens between each epoch (a pass on the entire dataset):

```
self.train()
y_pred = self(X_train)
loss = loss_fn(y_pred,y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

This code is extremely important in the context of Neural Networks. It consists of the training process of a typical `torch`

model:

- We set the model in training mode using
`self.train()`

- Next, we pass the data through the model using
`self(X_train)`

— this will pass the data through the forward layer. `loss_fn`

calculates the loss on the training data. Our loss is`torch.L1Loss`

that consists of the Mean Absolute Error.`optimizer.zero_grad()`

sets the gradients to zero (they accumulate every epoch, so we want them to start clean in every pass).`loss.backward()`

calculates the gradient for every weight with respect to the loss function. This is the step where our weights are optimized.- Finally, we update the model’s parameters using
`optimizer.step()`

The final touch is revealing how our model will be evaluated using the `evaluate`

method:

```
def evaluate(self, X_test, y_test, epoch_nb, loss_fn, train_loss):
'''
Evaluates current epoch performance on the test set.
'''
test_pred = self(X_test)
test_loss = loss_fn(test_pred, y_test.type(torch.float))
if epoch_nb % 10 == 0:
self.train_loss_values.append(train_loss.detach().numpy())
self.test_loss_values.append(test_loss.detach().numpy())
print(f"Epoch: {epoch_nb} - MAE Train Loss: {train_loss} - MAE Test Loss: {test_loss} ")
```

This code performs a calculation of the loss in the test set. Also, we’ll use this method to print our loss in both train and test sets for every 10 epochs.

Having our model ready, let’s train it on our data and visualize the learning curves for train and test!

# Fitting the Model

Let’s use the code we’ve built to train our model and see the training process — First, we’ll train a model using `Adam`

optimizer and `0.001`

learning rate for 500 epochs:

```
adam_model = LinearRegressionModel('Adam')
adam_model.trainModel(200, X_train, X_test, y_train, y_test, 0.001)
```

Here I’m training a model using the `adam`

optimizer for 200 epochs. The overview of the train and test loss is shown next:

We can also plot the training and test loss throughout the epochs:

Our loss is still a bit high (on the last epoch, an MAE of around 21) as a linear regression may not be able to solve this problem.

Finally, let’s just fit a model with `SGD`

:

```
sgd_model = LinearRegressionModel(‘SGD’)
sgd_model.trainModel(500, X_train, X_test, y_train, y_test, 0.001)
```

Interesting — train and test loss do not improve! This happens because SGD is very sensitive to feature scaling and may have trouble calculating the gradient with features that have a different scale. As a challenge, try to scale the features and check the results with `SGD`

. After scaling, you will also notice a stabler behavior on the `Adam`

optimizer model!

# Conclusion

Thank you for taking the time to read this post! In this blog post, we’ve checked how we can use `torch`

to train a simple linear regression model. While PyTorch is famous for its deep learning (more layers and complex functions), learning simple models is a great way to play around the framework. Also, this is an excellent use case to get familiar with the concept of “loss” function and gradients.

We’ve also seen how the `SGD`

and `Adam`

optimizers work, particularly how sensitive they are to unscaled features.

Finally, I want you to retain the process that can be expanded to other types of models, functions and processes:

`train()`

to set the model into training mode.- Pass the data through the model using
`torch.model`

- Using
`nn.L1Loss()`

for regression problems. Other loss functions are available here. `optimizer.zero_grad()`

sets the gradients to zero.`loss.backward()`

calculates the gradient for every weight with respect to the loss function.- Using
`optimizer.step()`

to update the weights of the model.

See you on the next PyTorch post! I also recommend that you visit PyTorch Zero to Mastery Course, an amazing free resource that inspired the methodology behind this post.

If you would like to know more about how we use `torch`

in our projects, or would like to capacitate your team on deep learning, contact me at ivo@daredata.engineering - looking forward to speaking with you!