PyTorch

PyTorch Core Components:

Tensor: Multi-dimensional array with automatic differentiation support

│

├── Members:

│ ├── data : underlying storage (values)

│ ├── dtype : data type (float32, int64, etc.)

│ ├── shape : size of each dimension

│ ├── device : cpu / cuda

│ └── requires_grad : flag for autograd

│

└── Methods:

├── backward() : compute gradients

├── detach() : return tensor without grad

├── to(device) : move to CPU/GPU

├── view(shape) : reshape

├── permute(dims) : reorder dimensions

├── item() : get single Python value

└── clone() : deep copy

Model (nn.Module): Base class for all NN models. Stores parameters, layers, and forward computation graph

│

├── Members:

│ ├── _modules : dict of child layers

│ ├── _parameters : dict of learnable tensors

│ ├── training : bool flag for train/eval mode

│ └── buffers : running stats (e.g., batchnorm)

│

└── Methods:

├── forward(x) : define computation

├── parameters() : iterate learnable params

├── to(device) : move model to CPU/GPU

├── train() : set training mode

└── eval() : set eval mode

(Conceptual base structure)

Layer (nn.Module): Building blocks of models (linear layers, conv layers, dropout, normalization, etc.)

│

├── Members:

│ ├── weights : learnable parameters (if applicable)

│ ├── bias : optional learnable bias

│ ├── hyperparams : e.g., kernel_size, stride, in/out features

│ └── buffers : e.g., running_mean in BatchNorm

│

└── Important Methods:

├── forward(x) : compute layer output

├── reset_parameters(): initialize weights

└── __call__() : wrapper that runs hooks + forward

nn.Linear

├── weight : (out_features, in_features)

└── bias : (out_features)

nn.Conv2d

├── weight : (out_channels, in_channels, kH, kW)

└── bias : (out_channels)

ActivationFunction (nn.Module): Non-linear functions applied to layer outputs

│

├── Members (usually none):

│ └── inplace : whether to modify in-place (for some activations)

│

└── Methods:

└── forward(x) : apply activation

ReLU

├── inplace : bool

└── forward(x) : max(0, x)

Sigmoid

└── forward(x) : 1 / (1 + exp(-x))

Softmax

└── forward(x) : exp(x) / sum(exp(x))

LossFunction (nn.Module): Computes a scalar value measuring prediction error

│

├── Members:

│ ├── reduction : 'mean' | 'sum' | 'none'

│ └── weight : optional class/element weights

│

└── Methods:

└── forward(pred, target) : return loss value

CrossEntropyLoss

├── weight : class weights

└── reduction

MSELoss

└── reduction

Optimizer (torch.optim.Optimizer): Updates model parameters based on gradients

│

├── Members:

│ ├── param_groups : list of param sets + hyperparameters

│ ├── state : per-parameter state (e.g., moments in Adam)

│ └── defaults : default hyperparameters (lr, momentum, etc.)

│

└── Methods:

├── step() : apply gradient update

├── zero_grad() : clear accumulated gradients

└── add_param_group() : add parameters post-init

SGD

├── lr : learning rate

├── momentum : momentum factor

└── weight_decay

Adam

├── lr

├── betas : exponential decay rates

├── eps

└── weight_decay

Dynamic Computational Graph (Define-by-Run)

The graph is built on-the-fly as operations are executed (Contrasts older frameworks like TensorFlow 1.x that used static graphs.)
- This makes debugging and model development more intuitive and "Pythonic."
- This allows things like loops and conditionals to naturally be part of the model.

Autograd: Automatic Differentiation

PyTorch's autograd system automatically computes gradients needed for backpropagation.

y = x.sum()

y.backward() # computes dy/dx

print(x.grad)

Libraries

TorchVision — Image tasks (datasets, transforms, pretrained models)
TorchText — NLP tasks
TorchAudio — Speech/audio processing
PyTorch Lightning — High-level training framework
Hugging Face Transformers — Large pre-trained language models
TorchServe — Serve models in production