24 Feb 2024

Display LaTeX Command with Unicode Characters in Emacs

If the minor mode prettify-symbols-mode is enabled, Emacs will display certain strings with more attractive versions according to prettify-symbols-alist. For example, $\mathbb{P}(\Omega) \leq 1$ might be displayed as $ℙ(Ω) ≤ 1$. This happends without modifying the content and could be disabled by turning off the prettify-symbols-mode if necessary. This feature may be very useful in writing LaTeX formulae.

This is the effect after incorporating the settings introduced in this post.

Before prettify before-prettify.png

After prettify after-prettify.png

Create the list of pretty symbols

Each element prettify-symbols-alist looks like (SYMBOL . CHARACTER), where the symbol matching SYMBOL (a string, not a regexp) will be shown as CHARACTER instead.

I create a CSV file to store the symbols and associate pretty characters. Below is a lisp function which parse such a CSV file and return a list suitable for prettify-symbols-alist.

(defun dms/load-prettify-symbols (file)
  "Load a CSV file and return a suitable list for `prettify-symbols-alist`.

The CSV file should be separated by `, `, where the space after
comma is mandatory. In each line, the string before the comma
will be displayed by the pretty symbol after the comma."
  (with-temp-buffer
    (insert-file-contents file)
    (setq contents (split-string (buffer-string) "\n" t))
    (setq loaded-prettify-symbols-alist '())
    (dolist (line contents loaded-prettify-symbols-alist)
      (let* ((pair (split-string line ", " t))
             (original-string (car pair))
             (pretty-symbol (cadr pair))
             ;; Convert string to a char as `prettify-symbols-alist` uses chars not strings
             (pretty-char-symbol (string-to-char pretty-symbol)))
        (push (cons original-string pretty-char-symbol) loaded-prettify-symbols-alist)))))

The CSV file looks like

\mathbb{P}, ℙ
\leq, ≤
\geq, ≥
...

Decide whether to compose

The variable prettify-symbols-compose-predicate is a predicate for deciding if the currently matched symbol is to be composed. By default, not all appearance will be prettified. For example, the string \mathbb{P}, will not be prettified as the \mathbb{P} is followed by a comma. However, this situation happens frequently in writing formulae. So I overwrite the rule to allow these cases.

(defun dms/lax-prettify-symbols-compose-p (start end _match)
  "A more lax compose predicate that allows compositing even when
 the match is followed by digits, parentheses, punctuation,
 or whitespace characters."
  (let ((next-char (char-after end)))
    (or
     (and next-char (string-match-p "[[:digit:]()[:punct:][:space:]]" (char-to-string next-char)))
     (prettify-symbols-default-compose-p start end _match))))

Tweak into the mode hook

As both prettify-symbols-alist and prettify-symbols-compose-predicate are buffer-local variables, it is recommended to set them in a mode hook. Below I set them in the org-mode hook. It can also be set in latex-mode hook if necessary.

(defun dms/tweak-prettify-symbols-mode ()
  "Set values of prettify-symbols-alist and prettify-symbols-compose-predicate"
  (setq prettify-symbols-alist
        (dms/load-prettify-symbols "~/.emacs.d/pretty-symbols.csv"))
  (setq prettify-symbols-compose-predicate
        'dms/lax-prettify-symbols-compose-p))

(add-hook 'org-mode-hook 'dms/tweak-prettify-symbols-mode)

Choose appropriate fonts

In most cases, many mathematical symbols are not included in the main font. Fortunately, Emacs has the abilit to display selected characters with certain fonts, achieving an effect of combining fonts; see the fontset concept in the doc.

Below I patch the default fontset in order to

  1. display unicode characters within the range U2100 to U214F with font DejaVu Sans;
  2. display unicode characters within the range U1D7D8 to U1D7#1 with font DejaVu Sans;
  3. display unicode characters within the range U1D538 to U1D56B with font DeJaVu Sans;
  4. display unicode characters within the range U1D4D0 to U1D4E9 with font Libertinus Math
;; refer to https://dejavu.sourceforge.net/samples/DejaVuSans.pdf
;; Unicode Letterlike symbols
(set-fontset-font "fontset-default" '(#x2100 . #x214F) "DejaVu Sans")
;; Blackboard letters 0 to 9
(set-fontset-font "fontset-default" '(#x1D7D8 . #x1D7E1) "DejaVu Sans")
;; Blackboard letters A to Z and a to z
(set-fontset-font "fontset-default" '(#x1D538 . #x1D56B) "DejaVu Sans")

;; Bold script letters
(set-fontset-font "fontset-default" '(#x1D4D0 . #x1D4E9) "Libertinus Math")

Add more symbols

In general, the following things are needed to display unicode characters by composition:

  1. the string to be replaced, like \mathscr{L};
  2. the symbol to be rendered, like 𝓛;
  3. (optional) an appropriate font which can display the symbol.

There is a convenient way to find the unicode symbol, i.e., the second thing. In Emacs, there is a builtin shortcut C-x 8, which can insert the unicode character from its codepoint or its name. For example, the command C-x 8 RET MATHEMATICAL BOLD SCRIPT CAPITAL L will insert the script letter 𝓛 (actually the bold version here as the normal version is too thin). In fact, if you type C-x 8 RET MATHEMATICAL TAB then Emacs will pops up a list of mathematical symbols for selection.

script-letters.png

Alternative implementation: hard replacing

The advantage of prettify-symbols-mode is that it is only a way of rendering. The file content will not be changed when the minor mode is toggled. However, the disadvantage is that it works on the whole buffer and, to the best of my knowledge, cannot be restricted to a region.

If necessary, one can choose another implementation to translate these LaTeX commands to their unicode counterparts, i.e., simply finding and replacing. One can implement a function named toggle-unicode-representation, which can replace commands with unicode characters in a region, or vice versa.

Useful links   refs

Tags: emacs
Created by Org Static Blog