Post

PyTorch 101 - Part 1

Basics of pytorch

PyTorch 101 - Part 1

Overview

Why PyTorch?

  • N-dimensional Tensor computation on GPUs
  • Automatic differentiation for training deep neural networks

Training & Testing

Tools

  • torch.utils.data.Dataset: stores data samples and expected values
  • torch.utils.data.DataLoader: groups data in batches

    Usage

  • dataset = MyDataset(file)
  • dataloader = DataLoader(dataset, batch_size, suffle=True) // training: shuffle=True; Testing: suffle=False£¢¢4
  • MyDataset:

    1
    2
    3
    4
    5
    6
    7
    
      class MyDataset(dataset):
          def __init__(self, file):
              self.data = ...
          def __getitem__(self, index):
              return self.data[index]
          def __len__(self):
              return len(self.data)
    

    alt text

Tensor

high-dimensional matrices(or arrays)

Shape

alt text

Creation

  • directly from data:
    1
    2
    
          x = torch.tensor([[1,-1],[-1,1]]) 
          x = torch.from_numpy(np.array())
    
  • tensor of constant zeros/ones
    1
    2
    
          x = torch.zeros([2,2]) # shapes, will create "tensor([[0., 0.], [0., 0.]])"
          x = torch.ones([1,2,5]) # likewise
    

Operation

  • Addition
    1
    
          z = x + y
    
  • Subtraction
    1
    
          z = x - y
    
  • Power
    1
    
          y = x.pow(2)
    
  • Summation
    1
    
          y = x.sum()
    
  • Mean
    1
    
          y = x.mean()
    
  • Transpose: transpose 2 specific dimensions
    1
    
          x = x.transpose(0,1) # select two dimensions and switch
    
  • Squeeze: remove the specified dimension with length = 1 alt text
    1
    2
    3
    
          x = torch.zeros([1,2,3])
          x = x.squeeze(0)
          x.shape # will be torch.Size([2,3])
    
  • Unsqueeze: expand a new dimension
  • Cat: concatenate multiple tensors

alt text

1
2
3
4
5
6
7
``` python
    x = torch.zeros([2, 1, 3])
    y = torch.zeros([2, 3, 3])
    z = torch.zeros([2, 2, 3])
    w = torch.cat([x,y,z], dim=1)
    w.shape() # will be torch.Size([2,6,3])
```

Gradient Calc

gradient

1
2
3
4
    x = torch.tensor([[1.,0.],[-1.,1]], requires_grad=True)
    z = x.pow(2).sum()
    z.backward()
    x.grad # tensor([[2.,0.][-2.,2.])

Neural Network

torch.nn - network layers

Example - Linear Layer

1
2
3
4
    # nn.Linear(in_features, out_features)
    layer = torch.nn.Linear(32,64)
    layer.weight.shape # torch.Size([64,32])
    layer.bias.shape # torch.Size([64])

linear

Build a NN

1
2
3
4
5
6
7
8
9
10
11
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(10,32),
            nn.Sigmoid(), # nn.ReLu()
            nn.Linear(32,1)
        )
    
    def forward(self,x):
        return self.net(x)

Loss Function

  • Mean Squared Error(for regression tasks)
    1
    
          criterion = nn.MSELoss()
    
  • Cross Entropy(for classification tasks)
    1
    
          criterion = nn.CrossEntropyLoss()
    
  • loss = criterion(model_output, expected_value)

Optimizer

Gradient-based optimization algorithms that adjust network parameters to reduce error.

Implementation

  1. Call optimizer.zero_grad() to reset gradients of model parameters
  2. Call loss.backward() to backpropagate gradients of prediction loss
  3. Call optimizer.step() to adjust model parameters

Neural Network Training Steps

a simple training procedure of a neural network using torch

Setting up

1
2
3
4
5
    dataset = MyDataSet(file)   # read data via MyDataSet as defined above
    training_set = DataLoader(dataset, 16, shuffle=True)    # put dataset into DataLoader
    model = MyModel().to(device)     # construct model and move to device(cpu/cuda)
    criterion = nn.MSELoss()    # set loss function
    optimizer = torch.optim.SGD(model.parameters(), 0.1)    # set optimizer

Training Loops

1
2
3
4
5
6
7
8
9
    for epoch in range(n_epochs):
        model.train()   # set model to training mode
        for x,y in training_set:    # iterate through the DataLoader
            optimizer.zero_grad()
            x, y = x.to(device), y.to(device)
            pred = model(x)     # forward pass(compute output)
            loss = criterion(pred, y)
            loss.backward()
            optimizer.step()    # update model with optimizer

Validation Loops

1
2
3
4
5
6
7
8
9
    model.eval()    # set model to evaluation mode
    total_loss = 0
    for x,y in dv_set:
        x, y = x.to(device), y.to(device)
        with torch.no_grad():   # disable gradient calculation
            pred = model(x)     # forward pass
            loss = criterion(pred, y)
        total_loss += loss.cpu().item() * len(x)
        avg_loss = total_loss / len(dv_set.dataset)

Testing Loops

1
2
3
4
5
6
7
    model.eval()
    preds = []
    for x in tt_set:
        x = x.to(device)
        with torch.no_grad():
            pred = model(x)
            preds.append(ptrd.cpu())    # collect prediction

why model.eval(): changes behaviour of some model layers, such as dropout and batch normalization

why with torch.no_grad(): prevents calcilations from being added into gradient computation graph, usually used to prevent accidental training on validation/testing data

S/L Trained model

1
2
3
4
5
    # saving
    torch.save(model.state_dict(), path)
    # loading
    ckpt = torch.load(path)
    model.load_state_dict(ckpt)
This post is licensed under CC BY 4.0 by the author.