Refactoring “Beginning Emacs Lisp”: I: Adding Tests

On Friday I sat down with Sacha Chua for some emacs coaching. We talked about org-mode and about the emacs lisp I’d written for reformatting citations. In this entry I’ll talk about refactoring that emacs lisp code.

Ah, Refactoring. We’ll Need Some Tests…

Refactoring, by definition, is improving the internal structure of code without altering the external behaviour. Equally by definition, before you start you need thorough automated tests, because that’s how you tell that you haven’t altered the external behaviour.

I wrote the reformatting citations emacs lisp as an exploratory spike, looking up commands as I went, and manually testing the results. Time to get more rigorous. What is emacs lisp’s equivalent of JUnit, or MiniTest or RSpec?

ERT: Emacs Lisp Regression Testing

In JUnit you annotate test methods with “@Test”:

public void formatingRemoved {

In MiniTest, you start the method definitions with “test_”:

def test_formatting_removed

In ERT, where you would define a normal lisp function, with “defun”:

(defun remove-formatting ()

you can define a lisp test with “ert-deftest”:

(ert-deftest remove-formatting ()

Inside the test, you can call the function under test and compare the actual and expected results with the “should” macro. For instance, given a function that takes a string and removes some formatting (using replace-regexp-in-string):

(defun remove-formatting (string)
(replace-regexp-in-string "^> " ""
(replace-regexp-in-string "\s*<br/?>" "" string)))

We could write a test that checks that it does what it says:

(ert-deftest remove-formatting ()
(should (string= (remove-formatting "> Elþeodigra eard<br/>")
"Elþeodigra eard")))

To run all the tests, we can type:

M-x ert RET RET

The second RET accepts the default, t, and runs all tests. In this case there’s only one: if we have a large suite and we only want to run a subset, say those with “formatting” in the test name, we would type instead:

M-x ert RET "*formatting*" RET

In either case, another buffer is opened up with the results, for instance

Selector: t
Passed:  1
Failed:  0
Skipped: 0
Total:   1/1

Started at:   2015-02-02 16:03:15-0500
Finished at:  2015-02-02 16:03:15-0500


And yes, as you would expect from other languages, that’s a dot per passing test, and an F for any failing test, so with twelve tests and two failures you might instead see:


In the ERT results buffer, with the cursor on any . or F test result you have several options, including:

. ;; jump to that test’s source code
l ;; list the assertions run by the test
h ;; see the description string for the test, if any
b ;; view backtrace
r ;; re-run this test

As Nic Ferrier notes, the runner doesn’t automatically recognize when you delete a test, and you need to delete it in the ERT results buffer. I had trouble with it recognizing added tests as well, and ended up closing and restarting emacs to make sure it picked up the latest list. This is obviously not scalable. There’s a separate tool called ert-runner which fixes this, as I discover here.

Additional Complications

If we were testing code that took a string and reformatted it, like the example above, we could just write (should (string= examples for all the edge cases we could think of using what we already know, and we’d be set. Unfortunately, the code to be put under test also makes changes to the buffer, varies its behaviour depending on whether a region is selected when it is called, and modifies the cursor position. How do we handle that?

The with-temp-buffer macro saves the current buffer, creates an (empty) temporary buffer, marks it current, uses it inside the body of the macro, and on exit switches back to the previous current buffer. You can return the contents of the temporary buffer by using (buffer-string) as the last form, and the position of the cursor within the temporary buffer by using (point) as the last form. This lets us write tests for (begin-end-quote) when a region is not selected like so:

(ert-deftest test-begin-end-quote-new-content ()
"Tests begin-end-quote without preselected text string"
(should (string= (with-temp-buffer

(ert-deftest test-begin-end-quote-new-point ()
"Tests begin-end-quote without preselected text cursor position"
(should (equal (with-temp-buffer
(length "#+begin_quote\n\n"))))

There’s a further complication for the case where a region is selected before calling it. We can include text in the temporary buffer before calling begin-end-quote by using insert, and then (set-mark .N.) to set the mark at the nth character, and then either goto-char .N. to the select the region from the first mark up to character n, or just do end-of-buffer to select the region to the end of the buffer. So to insert the text “> Dear Sir, your astonishment’s odd;\n” into the temporary buffer and select the whole region, we could do the following:

(insert "> Dear Sir, your astonishment’s odd;\n")
(goto-char (point-min))
(set-mark-command nil)
(goto-char (point-max))
(transient-mark-mode 1)

With that extra information, the tests of behaviour with a selected region become simple too:

(ert-deftest test-begin-end-quote-region ()
"Tests begin-end-quote with selected region"
(should (string= (with-temp-buffer
(insert "> Dear Sir, your astonishment’s odd;\n")
(goto-char (point-min))
(set-mark-command nil)
(goto-char (point-max))
(transient-mark-mode 1)
"#+begin_quote\n Dear Sir, your astonishment’s odd;\n#+end_quote\n")))

The commit with the full set of twelve tests is here.

As well as adding tests this commit makes a code change, because in the three weeks since writing it I have discovered that the archive files don’t always have exactly two spaces between the end of the text and the “<br/>”, so I wrote tests to expose that (so that two of them were failing, as in the example above), and then changed the regexp so that they passed.

Now that we’ve got full automated tests, we can start refactoring the code.