26 Apr 2025

Display Unicode Characters in Emacs

Motivation. I prefer Unicode symbols over LaTeX sequences. In this old post , I discussed how to display LaTeX commands as Unicode symbols in Emacs. However, after applying these settings, my Emacs renders both \mathcal{A} and \mathscr{A} identically as the U+1D4D0 character from the Libertinus Math font.

After investigation, I figure out that this behavior might be set by my ~/.emacs.d/pretty-symbols.csv, where both LaTeX commands map to the same Unicode character. I did this because Libertinus Math lacks dedicated glyphs for \mathcal characters. Indeed, this issue can also be demonstrated by the following TeX file (compiled by XeLaTeX).

\documentclass{article}
\usepackage{unicode-math}
\begin{document}
\setmathfont{Libertinus Math}
Libertinus Math:
\verb|\mathcal{A}| produces $\mathcal{A}$;
\verb|\mathscr{A}| produces $\mathscr{A}$;
Unicode symbol \verb|U+1D4D0| produces $𝓐$;
Unicode symbol \verb|U+1D49C| produces $𝒜$.
\end{document}
1.png

Backgrounds

  1. How a font selects the glyph for a given code point?

    Answer. We have to distinguish the concepts of characaters, code points, glyphs and fonts. See a brief explanation in the footnote1; see also the related references in section Useful Links.

    In the simplest case, a code point corresponds to a single glyph. For example, U+0041 is displayed as a glyph like A in most fonts. However, a font might also provide alternative glyphs for the same code point (e.g., for ligatures), or display multiple code points as a single glyph (e.g., for accents). See more discussions in Appendix: Use OpenType Font Features.

  2. How to customize the appearance of a Unicode symbol in Emacs?

    Answer. The simplest way is to customize the font for displaying the Unicode symbol via set-fontset-font. For example, the following snippet tells Emacs to display U+1D4D0 with font STIX Two Math2.

    (set-fontset-font "fontset-default" #x1D4D0 "STIX Two Math")
    

    A slightly complicated yet more flexible way is to specify the glyph for displaying a unicode character using Display Tables. For example, the following snippet tells Emacs to display U+1D4D1 with the glyph corresponding to U+1D4D0 in Libertinus Math. See the following section Appendix: Notes on Display Tables for more explanations.

    (aset standard-display-table #x1D4D1
          (vector (make-glyph-code #x1D4D0 'libertinus-math-face)))
    
  3. How a LaTeX command is displayed as another Unicode symbol?

    Answer. After toggling prettify-symbols-mode, matched strings will be composed to a single character defined in the variable prettify-symbols-alist. See also my post Display LaTeX Command with Unicode Characters in Emacs.

A Simple Solution

Perhaps the simplest solution is set-font-fontset.

First, I notice that the glyphs of Libertinus Math in range (#x1D4D0 . #x1D503) are suitable for displaying \mathscr characters. So, I use the below settings.

;; for \mathscr letters
(set-fontset-font "fontset-default" '(#x1D4D0 . #x1D503) "Libertinus Math")

Then, I find that STIX Two Math contains glyphs for displaying \mathcal characters. To avoid overriding the \mathscr characters, this time the range is chosen as (#x1D49C . #x1D4CF).

;; for \mathcal letters
(set-fontset-font "fontset-default" '(#x1D49C . #x1D4CF) "STIX Two Math")

Finally, I generate lines for my ~/.emacs.d/pretty-symbols.csv file. Each line consists of a LaTeX command and its Unicode symbol, looking like \mathcal{A}, 𝒜. See the Appendix: Useful Scripts for the python script to generate these lines. It should be noted that some code points in the range (#x1D49C . #x1D4CF) are glyphless and are manually replaced. To ensure the consistency, the font to display these code points has to be manually set by set-fontset-font as well, e.g.,

(set-fontset-font "fontset-default" #x212C "STIX Two Math") ;; mathcal B
(set-fontset-font "fontset-default" #x2130 "STIX Two Math") ;; mathcal E
(set-fontset-font "fontset-default" #x2131 "STIX Two Math") ;; mathcal F
(set-fontset-font "fontset-default" #x210B "STIX Two Math") ;; mathcal H
(set-fontset-font "fontset-default" #x2110 "STIX Two Math") ;; mathcal I
(set-fontset-font "fontset-default" #x2112 "STIX Two Math") ;; mathcal L
(set-fontset-font "fontset-default" #x2133 "STIX Two Math") ;; mathcal M
(set-fontset-font "fontset-default" #x211B "STIX Two Math") ;; mathcal R
(set-fontset-font "fontset-default" #x212F "STIX Two Math") ;; mathcal e
(set-fontset-font "fontset-default" #x210A "STIX Two Math") ;; mathcal g
(set-fontset-font "fontset-default" #x2113 "STIX Two Math") ;; mathcal o

The final result is

before.png
after.png

Appendix: Notes on Display Tables

The section 42.23.2 Display Tables and 42.23.3 Active Display Table in the manual have covered the basics of display tables. Here I add a few additional notes.

First, the function make-glyph-code does not seem to accept anonymous faces as the value of its FACE argument, though section 42.12 Faces mentioned that

One way to represent a face is as a property list of attributes, like (:foreground "red" :weight bold). Such a list is called an anonymous face.

Therefore, I have to explicitly define named faces by defface.

(defface libertinus-math-face
  '((t :family "Libertinus Math"))
  "Face for Libertinus Math glyphs.")

(defface stix-math-face
  '((t :family "STIX Two Math"))
  "Face for STIX Two Math glyphs.")

Second, even if the current active display table maps U+1D4D1 to the glyph described by code U+1D4D0 along with a customized face, the font family specified in the customized face might be overridden by fontsets specifications. For example, if the following configurations are both applied, then U+1D4D1 will be displayed as U+1D4D0 in STIX Two Math instead of Libertinus Math.

(aset standard-display-table #x1D4D1
      (vector (make-glyph-code #x1D4D0 'libertinus-math-face)))
(set-fontset-font "fontset-default" #x1D4D0 "STIX Two Math")

You may verify this by describe-char and will see something like this. That is, even the character is displayed with the customized face, the font family specified by the face has been overridden by set-font-fontset. desc-char.png

Third, only the active display table takes effect and other display tables are completely ignored. As described in the manual, the priority is window display table > buffer display table > standard display table. In my settings, buffers in Org mode will automatically register their buffer display tables to display invisible lines as , while buffers in other modes seem to use the standard display table.

Appendix: Useful Scripts

A python script to write unicode characters and their code points.

with open("/tmp/unicode_chars.txt", "w", encoding="utf-8") as file:
    for code_point in range(0x1D49C, 0x1D4CF + 1):
        file.write(f"{chr(code_point)}, U+{code_point:05X}\n")
    for code_point in range(0x1D4D0, 0x1D503 + 1):
        file.write(f"{chr(code_point)}, U+{code_point:05X}\n")

A python script to generate lines for ~/.emacs.d/pretty-symbols.csv. Note that some code points in range (#x1D49C . #x1D4CF) are glyphless. For example, the code point U+1D49D after U+1D49C 𝒜 is glyphless in most fonts, as the right code point for is U+212C.

with open("/tmp/script_chars.csv", "w", encoding="utf-8") as file:
    glyphless_char_map = {
        # Uppercase substitutions
        0x1D49D: 0x212C,  # B
        0x1D4A0: 0x2130,  # E
        0x1D4A1: 0x2131,  # F
        0x1D4A3: 0x210B,  # H
        0x1D4A4: 0x2110,  # I
        0x1D4A7: 0x2112,  # L
        0x1D4A8: 0x2133,  # M
        0x1D4AD: 0x211B,  # R

        # Lowercase substitutions
        0x1D4BA: 0x212F,  # e
        0x1D4BC: 0x210A,  # g
        0x1D4C4: 0x2113,  # o
    }

    # generate \mathcal mappings (uppercase)
    start = 0x1D49C
    for i in range(26):
        letter = i + ord("A")
        code = i + start
        if code in glyphless_char_map:
            code = glyphless_char_map[code]
        file.write(f"\\mathcal{{{chr(letter)}}}, {chr(code)}\n")

    # generate \mathcal mappings (lowercase)
    start = 0x1D4B6
    for i in range(26):
        letter = i + ord("a")
        code = i + start
        if code in glyphless_char_map:
            code = glyphless_char_map[code]
        file.write(f"\\mathcal{{{chr(letter)}}}, {chr(code)}\n")

    # generate \mathscr mappings (uppercase)
    start = 0x1D4D0
    for i in range(26):
        letter = i + ord("A")
        file.write(f"\\mathscr{{{chr(letter)}}}, {chr(i + start)}\n")

    # generate \mathscr mappings (lowercase)
    start = 0x1D4EA
    for i in range(26):
        letter = i + ord("a")
        file.write(f"\\mathscr{{{chr(letter)}}}, {chr(i + start)}\n")

A python script to write all LaTeX commands for verification.

with open("/tmp/check_results.org", "w") as fpr:
    for i in range(26):
        fpr.write(f"\\mathcal{{{chr(i + ord('A'))}}} ")
    fpr.write("\n")

    for i in range(26):
        fpr.write(f"\\mathcal{{{chr(i + ord('a'))}}} ")
    fpr.write("\n")

    for i in range(26):
        fpr.write(f"\\mathscr{{{chr(i + ord('A'))}}} ")
    fpr.write("\n")

    for i in range(26):
        fpr.write(f"\\mathscr{{{chr(i + ord('a'))}}} ")

A Lisp function to edit a given display and map (#x1D4D0 . #x1D503) to corresponding glyphs in Libertinus Math and maps (#x1D49C . #x1D4CF) to glyphs in (#x1D4D0 . #x1D503) in STIX Two Math. Sadly, it seems that the characters composed by prettify-symbols-mode do not respect display tables and apply the default fontset anyway.

(defun setup-math-display-table (current-display-table)
  "Configure display table for math script characters."
  (unless current-display-table
    (setq current-display-table (make-display-table)))

  ;; Map \mathscr (U+1D4D0 to U+1D503) via Libertinus
  (dotimes (i (- #x1D504 #x1D4D0))
    (let ((code (+ #x1D4D0 i)))
      (aset current-display-table code
            (vector (make-glyph-code code 'libertinus-math-face)))))

  ;; Map \mathcal (U+1D49C to U+1D4CF) via STIX (offset mapping)
  (dotimes (i (- #x1D4D0 #x1D49C))
    (let* ((src-code (+ #x1D49C i))
           (stix-code (+ #x1D4D0 i)))  ; Offset mapping
      (aset current-display-table src-code
            (vector (make-glyph-code stix-code 'stix-math-face))))))

Appendix: Use OpenType Font Features

To support ligatures, a font might have different versions of i to display a standalone i and a ligature fi. Besides ligature, many modern OpenType fonts include optional glyph substitution controlled by stylistic sets. For example, by default STIX Two Math display script characters as \mathcal characters and provides the ss01 stylistic set to display \mathscr characters. The code charts for STIX Two Math contains detailed descriptions for OpenType features supported by this font.

In XeLaTeX, we can choose the stylistic set by \setmathfont.

\documentclass{article}
\usepackage{unicode-math}
\begin{document}
\setmathfont{STIX Two Math}
STIX Two Math:
\verb|\mathcal{A}| produces $\mathcal{A}$;
\verb|\mathscr{A}| produces $\mathscr{A}$;
Unicode symbol \verb|U+1D4D0| produces $𝓐$;
Unicode symbol \verb|U+1D49C| produces $𝒜$.

\setmathfont[StylisticSet=1]{STIX Two Math}
STIX Two Math (\textit{ss01}):
\verb|\mathcal{A}| produces $\mathcal{A}$;
\verb|\mathscr{A}| produces $\mathscr{A}$;
Unicode symbol \verb|U+1D4D0| produces $𝓐$;
Unicode symbol \verb|U+1D49C| produces $𝒜$.

\setmathfont{STIX Two Math}
\setmathfont[StylisticSet=1,range=scr]{STIX Two Math}
STIX Two Math (apply \textit{ss01} variant for scr characters):
\verb|\mathcal{A}| produces $\mathcal{A}$;
\verb|\mathscr{A}| produces $\mathscr{A}$;
Unicode symbol \verb|U+1D4D0| produces $𝓐$;
Unicode symbol \verb|U+1D49C| produces $𝒜$.
\end{document}
3.png

Unfortunately, it seems that we cannot toggle OpenType features in Emacs in this way. A workaround is to use OpenType Feature Freezer to manually modify a font and let selected features on by default.

Useful Links

A discussion on choosing a particular glyph
Aad, E. (2015). How to set the glyph of a unicode character? Emacs Stack Exchange. https://emacs.stackexchange.com/questions/6052/how-to-set-the-glyph-of-a-unicode-character
Emacs lisp manual covering glyphs and display tables
GNU. (2025). GNU Emacs Lisp Reference Manual (Emacs version 30.1). https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Display.html
A post about font basics in Emacs
Idiocy. (2019). Emacs, fonts and fontsets. https://idiocy.org/emacs-fonts-and-fontsets.html
A Stack Overflow discussion on code points and glyphs
Amery, M. (2014). What's the difference between a character, a code point, a glyph and a grapheme? Stack Overflow. https://stackoverflow.com/questions/27331819/whats-the-difference-between-a-character-a-code-point-a-glyph-and-a-grapheme
A brief post on distinguishing code points and glyphs
Dkf. (2019). Characters, glyphs, code-points, and byte-sequences. https://wiki.tcl-lang.org/page/Characters%2C+glyphs%2C+code-points%2C+and+byte-sequences
A post with in-depth introduction on what are code points and how they are rendered
Litherum. (2017). Relationship Between Glyphs and Code Points. https://litherum.blogspot.com/2017/05/relationship-between-glyphs-and-code.html
A post about using OpenType feature freezer
Catalfamo, D. (2022). Exploring OpenType Font Features. https://blog.lambda.cx/posts/opentype-font-exploration/

Footnotes:

1

The following explanations are summarized by AI.

  1. Code Point. A numerical identifier in the Unicode standard (e.g., U+1D4D0 = 𝓐).
  2. Character. An abstract concept, like "MATHEMATICAL BOLD SCRIPT CAPITAL A".
  3. Glyph. The visual representation (shape) of a character on a screen.
  4. Font. A collection of glyphs and metadata for rendering text.
2

To list all font families in Emacs, go to a buffer and enable the lisp-interaction-mode. Then, navigate to the end of the following form and press C-j.

(mapconcat 'identity (font-family-list) "\n")

Once the font family name is obtained, use (describe-font name) to inspect more info about the font, like the file path.

Tags: emacs
Created by Org Static Blog