A simple implementation of an XML lexer. It handles most cases. It is not a validating lexer, meaning it will happily process invalid XML without complaining.

Methods
Public Instance methods
setup()

Initialize the lexer.

    # File lib/syntax/lang/xml.rb, line 11
11:     def setup
12:       @in_tag = false
13:     end
step()

Step through a single iteration of the tokenization process. This will yield (potentially) many tokens, and possibly zero tokens.

    # File lib/syntax/lang/xml.rb, line 17
17:     def step
18:       start_group :normal, matched if scan( /\s+/ )
19:       if @in_tag
20:         case
21:           when scan( /([-\w]+):([-\w]+)/ )
22:             start_group :namespace, subgroup(1)
23:             start_group :punct, ":"
24:             start_group :attribute, subgroup(2)
25:           when scan( /\d+/ )
26:             start_group :number, matched
27:           when scan( /[-\w]+/ )
28:             start_group :attribute, matched
29:           when scan( %r{[/?]?>} )
30:             @in_tag = false
31:             start_group :punct, matched
32:           when scan( /=/ )
33:             start_group :punct, matched
34:           when scan( /["']/ )
35:             scan_string matched
36:           else
37:             append getch
38:         end
39:       elsif ( text = scan_until( /(?=[<&])/ ) )
40:         start_group :normal, text unless text.empty?
41:         if scan(/<!--.*?(-->|\Z)/m)
42:           start_group :comment, matched
43:         else
44:           case peek(1)
45:             when "<"
46:               start_group :punct, getch
47:               case peek(1)
48:                 when "?"
49:                   append getch
50:                 when "/"
51:                   append getch
52:                 when "!"
53:                   append getch
54:               end
55:               start_group :normal, matched if scan( /\s+/ )
56:               if scan( /([-\w]+):([-\w]+)/ )
57:                 start_group :namespace, subgroup(1)
58:                 start_group :punct, ":"
59:                 start_group :tag, subgroup(2)
60:               elsif scan( /[-\w]+/ )
61:                 start_group :tag, matched
62:               end
63:               @in_tag = true
64:             when "&"
65:               if scan( /&\S{1,10};/ )
66:                 start_group :entity, matched
67:               else
68:                 start_group :normal, scan( /&/ )
69:               end
70:           end
71:         end
72:       else
73:         append scan_until( /\Z/ )
74:       end
75:     end