Posts tagged "ai":

A Beginner's Guide to Optuna

24 Apr 2025

Many sophisticated algorithms, particularly deep learning-based ones, have various hyperparameters that control the running process. Those hyperparameters often have a significant impact on performance. Manually searching for a good combination of those hyperparameters is tedious, and even keeping track of all results becomes unwieldy as the number of hyperparameters grows beyond 10. Today's topic, Optuna, is such a tool that helps automate the search process and stores all results in a structured database. Even better, its web-based dashboard lets us explore these results intuitively with just a few clicks.

...

Gradients of Convolution: Direct Computation and Linear Algebra Perspective

20 Feb 2025

Convolution operations are foundational in deep learning for extracting features in image tasks. Calculating their gradients is critical for training convolutional neural networks, enabling backpropagation to update parameters and minimize loss. This post derives these gradients through two complementary approaches: direct differentiation of the convolution definition, and a linear algebra perspective which naturally introduces the transposed convolution (also known as deconvolution).

...

Practical Einops: Tensor Operations Based on Indices

28 Nov 2024

People are familiar with vectors and matrices operations but are less familiar with tensor operations. In machine learning, tensors often refer to batched vectors or batched matrices and are represented by an array-like object with multiple indices. Due to this reason, tensors operations in most Python packages, including NumPy, PyTorch and TensorFlow, are typically named after vectors and matrices operations. However, tensors themselves have a particular useful operation, called contraction, which uses index-based notations and can cover most vectors and matrices operations. This index-based notations intuitively and verbosely describe the relationship between the components of input and output tensors. Today's topic, the Python's einops package, extends these notations and provides an elegant API for flexible and powerful tensor operations.

...

What Happens in A Transformer Layer

30 Oct 2024

Transformers serve as the backbone of large language models. Just as convolutional networks revolutionized image processing, transformers have significantly advanced natural language processing since their introduction. The efficient parallel computation and transfer learning capabilities of transformers have led to the rise of pre-trained paradigm. In this approach, a large-scale transformer-based model, referred to as the foundation model, is trained on a significant volume of data and subsequently utilized for downstream tasks through some form of fine-tuning. Our familiar friend ChatGPT is such an example, where GPT stands for generative pre-trained transformers. Meanwhile, transformer-base models achieve state-of-the-art performace for many different modalities, including text, image, video, point cloud, and audio data, and have been used for both discriminative and generative applications.

...

An In-Depth Introduction to Backpropagation and Automatic Differentiation

24 Sep 2024

Backpropagation and automatic differentiation (AD) are fundamental components of modern deep learning frameworks. However, many practitioners pay little attention to their implementations and may regard them as some sort of "black magic". It indeed looks like magic that PyTorch can virtually calculate derivatives of an arbitrary function defined by the user, and even accommodate flow control elements like conditional execution, which is mathematically not differentiable. Although we understand that mathematically they primarily employ the chain rule, it remains unclear how they efficiently apply it to a function whose form is entirely unknown and will be determined by the user.

...

MNIST: the Hello World Example in Image Recognition

19 Feb 2024

In this post we will train a simple CNN (Convolutional Neural Network) classifier in PyTorch to recognize handwritten digits in MNIST dataset.

...

Backpropagation Formula: An Optimal Control View

07 Nov 2021

Consider the following optimal control problem (or equivalently, a constrained optimization problem),

...

Convolution in CNN

04 Dec 2020

这篇笔记是对 torch.nn.Conv2d [ 官方文档 ] 的解释，主要目标是弄清楚在这层 layer 中发生了什么：输入输出是什么？是怎么从输入到输出的？最后补充了一个困扰我很久的问题：为什么要叫 convolution?

...