The markup vocabulary I have sketched out above is by no means fixed at this stage. As more clues from more sources are added to the corpus, it may be that changes are needed in order to accommodate the quirks of a particular author. There are also some issues for which I am not sure that satisfactory solutions have yet been found. Perhaps the largest of these issues is that of reference. Some clues may refer to other clues within the same puzzle, using the answer to another clue to construct their own definitions or subsidiary indications. It is clear that there needs to be a way to indicate such cross-reference within a puzzle. But what about exophoric reference (reference to things outside the text)? If we want to understand how linguistic differences affect the difficulty level of various puzzles, for example, we might take a discourse-analytic perspective and consider how the clues reference culture outside the bounds of the puzzle itself. For example, the "LONDON" clues discussed above refer to two cultural figures: Don DeLillo and Julie London. These are evidently figures from different domains - literature and popular music, respectively. Is this information useful? If so, how can it be captured? Depth of information is also a question: is it enough to specify the domain from which these figures come, or would it also be useful to know the period in which they were active? Should their status as real-life figures (as opposed to fictional characters) also be captured?

The answer to these questions, and to others that will undoubtedly arise, will need a certain amount of trial and error, and answers will emerge during the process of marking up more and more clues. Evidently, the initial stages outlined here are just the beginning of what promises to be a complex but rewarding project.