Recent Posts

Gradients of Convolution: Direct Computation and Linear Algebra Perspective

20 Feb 2025

Convolution operations are foundational in deep learning for extracting features in image tasks. Calculating their gradients is critical for training convolutional neural networks, enabling backpropagation to update parameters and minimize loss. This post derives these gradients through two complementary approaches: direct differentiation of the convolution definition, and a linear algebra perspective which naturally introduces the transposed convolution (also known as deconvolution).

...
Tags: ai

Add A Comment Section to My Blog with Giscus

01 Dec 2024

Early this year, I created this blog site and share some well-written notes on it. Since then, I really enjoy blogging and continue to post interesting things I learn. Recently, I came across the giscus project, a comment system powered by GitHub Discussions, which allows me to add a comment section to my posts. It seems to work well on my site after trying it, so I believe it presents a good opportunity to enhance this site further and encourage more exchanges of ideas.

...
Tags: emacs

Practical Einops: Tensor Operations Based on Indices

28 Nov 2024

People are familiar with vectors and matrices operations but are less familiar with tensor operations. In machine learning, tensors often refer to batched vectors or batched matrices and are represented by an array-like object with multiple indices. Due to this reason, tensors operations in most Python packages, including NumPy, PyTorch and TensorFlow, are typically named after vectors and matrices operations. However, tensors themselves have a particular useful operation, called contraction, which uses index-based notations and can cover most vectors and matrices operations. This index-based notations intuitively and verbosely describe the relationship between the components of input and output tensors. Today's topic, the Python's einops package, extends these notations and provides an elegant API for flexible and powerful tensor operations.

...
Tags: ai

What Happens in A Transformer Layer

30 Oct 2024

Transformers serve as the backbone of large language models. Just as convolutional networks revolutionized image processing, transformers have significantly advanced natural language processing since their introduction. The efficient parallel computation and transfer learning capabilities of transformers have led to the rise of pre-trained paradigm. In this approach, a large-scale transformer-based model, referred to as the foundation model, is trained on a significant volume of data and subsequently utilized for downstream tasks through some form of fine-tuning. Our familiar friend ChatGPT is such an example, where GPT stands for generative pre-trained transformers. Meanwhile, transformer-base models achieve state-of-the-art performace for many different modalities, including text, image, video, point cloud, and audio data, and have been used for both discriminative and generative applications.

...
Tags: ai

An In-Depth Introduction to Backpropagation and Automatic Differentiation

24 Sep 2024

Backpropagation and automatic differentiation (AD) are fundamental components of modern deep learning frameworks. However, many practitioners pay little attention to their implementations and may regard them as some sort of "black magic". It indeed looks like magic that PyTorch can virtually calculate derivatives of an arbitrary function defined by the user, and even accommodate flow control elements like conditional execution, which is mathematically not differentiable. Although we understand that mathematically they primarily employ the chain rule, it remains unclear how they efficiently apply it to a function whose form is entirely unknown and will be determined by the user.

...
Tags: ai
Other posts
Created by Org Static Blog