CSS3 selectors are complex. For example,
`blockquote > div p`

^{[31]},
`div.stub *:not(:lang(fr))`

^{[32]},
`*|*[a|foo~="bar"], *|*[|class~="bar"]`

^{[33]}, and
`stub ~ [|attribute^=start]:not([|attribute~=mid])[|attribute*=dle][|attribute$=end] ~ t`

^{[34]}
are all valid CSS3 selectors. And while these are probably somewhat complicated for
real-life applications, they are simple compared to what a CSS3 selector *could* be.

So how does one write a regular expression for something
this complex? The answer, of course, is rather than trying to
write the regular expression directly, you write a program to
generate the regular expression. I have used this approach in the
past, finding that it is generally not too difficult to manually
convert a small EBNF grammar or other set of formal rules into a
small program to generate a corresponding regular
expression.^{[35]} Typically each
non-terminal becomes a variable, defined in terms of constants
(for the terminals) and the variables that have been defined so
far (for the non-terminals).

As a trivial example, Example 2, “A Perl program that generates a regular expression” is a
small Perl program that generates a POSIX extended regular
expression that matches an integer, as defined by the EBNF
provided in the Wikipedia page on
EBNF.^{[36]}

**Example 2. A Perl program that generates a regular expression**

#!/usr/bin/env perl # # Copyleft 2019 Syd Bauman and Northeastern University Digital # Scholarship Group. # # No parameters; reads no input. Writes out a regular expression # that matches an integer, where integer is defined by the EBNF # in https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form: # | digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; # | digit = "0" | digit excluding zero ; # | natural number = digit excluding zero, { digit } ; # | integer = "0" | [ "-" ], natural number ; # The resulting regexp is intended to be a POSIX ERE, but would # also work as a PCRE or a W3C regular expression, and probably # lots of others. (But not a POSIX BRE or an Emacs LISP regexp.) $digit_sans_zero = "(1|2|3|4|5|6|7|8|9)"; # could be just "[1-9]" :-) $digit = "(0|$digit_sans_zero)"; $natural_number = "($digit_sans_zero($digit)*)"; $integer = "(0|(-?$natural_number))"; print STDOUT "$integer\n"; exit 0;

While I am sure there has been much
written on this general approach,^{[37]} I was not looking for general-purpose (regular)
grammar to regular expression conversion, I was just looking to
convert a particular grammar to a regular expression.

^{[35] }Of course, since an EBNF grammar can
represent any context-free language (Chomsky Type 2), there are
some EBNFs that cannot be represented by a regular language
(Chomsky Type 3), although some regular expression languages
(e.g., PCRE) have extensions that allow them to represent any
context-free grammar.

^{[36] }Readers who are well versed in PCRE will know that the EBNF
can be represented directly in the regular expression, e.g.:

(?(DEFINE) (?<digit_sans_zero> (1|2|3|4|5|6|7|8|9) ) (?<digit> (0|(?&digit_sans_zero)) ) (?<natural_number> (?&digit_sans_zero)(?&digit)* ) (?<integer> (0|(-?(?&natural_number))) ) )^(?&integer)$

While this is impressive, and very useful in its own right, it is not helpful to me here as I am interested in generating a W3C regular expression, not in using PCRE.