The Cryptic Crossword Corpus Project: first steps in establishing a markup vocabulary

Bethan S. T. Tovey


In a quick crossword, the relationship between clue and answer is usually simply definitional: the clue offers a definition or description of, or a synonym for, the answer. In a cryptic crossword, the relationship between clue and answer is significantly more complex, and may be one of many types. Anagrams, puns, hidden words, deletions, reversals, and other techniques are used, often in combination. Every word in a clue does one of three jobs: a) defining the answer (a synonym or definition of the answer word(s)); b) producing part or all of the answer word; or c) indicating how the words in b) are to be manipulated. It follows that each part of an answer will relate back to one or more words in the clue from category b), while the whole answer will relate to the word(s) in category a).

This paper describes the early work of creating a markup scheme which relates each part of a cryptic crossword clue to the relevant part of its answer, as well as giving full linguistic detail for each word. The aim of this project is to create a corpus of cryptic clues and answers which can be explored to answer linguistic questions.

Table of Contents

A brief explanation of cryptic crosswords
Developing the markup vocabulary for CCCP