41

Predefined Character Sets

HTML is the language used to write websites. HTML tags start with < end and with >. in between there is text. Here, you can test if lines contain invalid HTML tags.

[^<>]* is normal text.

I used ^[^<>]*(</?\w+[^>]*>[^<>]*)*$.

Noam Chomsky created a hierarchy of grammers. At the bottom you can find the regular expressions. The language HTML can not be described using a regular expression. But, parts if the language like tags can be described using a regular expression.

Next Step