A simple implementation of an XML lexer. It handles most cases. It is not a validating lexer, meaning it will happily process invalid XML without complaining.
Methods
Public Instance methods
Initialize the lexer.
[ show source ]
# File lib/syntax/lang/xml.rb, line 11 11: def setup 12: @in_tag = false 13: end
Step through a single iteration of the tokenization process. This will yield (potentially) many tokens, and possibly zero tokens.
[ show source ]
# File lib/syntax/lang/xml.rb, line 17 17: def step 18: start_group :normal, matched if scan( /\s+/ ) 19: if @in_tag 20: case 21: when scan( /([-\w]+):([-\w]+)/ ) 22: start_group :namespace, subgroup(1) 23: start_group :punct, ":" 24: start_group :attribute, subgroup(2) 25: when scan( /\d+/ ) 26: start_group :number, matched 27: when scan( /[-\w]+/ ) 28: start_group :attribute, matched 29: when scan( %r{[/?]?>} ) 30: @in_tag = false 31: start_group :punct, matched 32: when scan( /=/ ) 33: start_group :punct, matched 34: when scan( /["']/ ) 35: scan_string matched 36: else 37: append getch 38: end 39: elsif ( text = scan_until( /(?=[<&])/ ) ) 40: start_group :normal, text unless text.empty? 41: if scan(/<!--.*?(-->|\Z)/m) 42: start_group :comment, matched 43: else 44: case peek(1) 45: when "<" 46: start_group :punct, getch 47: case peek(1) 48: when "?" 49: append getch 50: when "/" 51: append getch 52: when "!" 53: append getch 54: end 55: start_group :normal, matched if scan( /\s+/ ) 56: if scan( /([-\w]+):([-\w]+)/ ) 57: start_group :namespace, subgroup(1) 58: start_group :punct, ":" 59: start_group :tag, subgroup(2) 60: elsif scan( /[-\w]+/ ) 61: start_group :tag, matched 62: end 63: @in_tag = true 64: when "&" 65: if scan( /&\S{1,10};/ ) 66: start_group :entity, matched 67: else 68: start_group :normal, scan( /&/ ) 69: end 70: end 71: end 72: else 73: append scan_until( /\Z/ ) 74: end 75: end