The base class of all tokenizers. It sets up the scanner and manages the looping until all tokens have been extracted. It also provides convenience methods to make sure adjacent tokens of identical groups are returned as a single token.
EOL | = | /(?=\r\n?|\n|$)/ |
[R] | chunk | The current chunk of text being accumulated |
[R] | group | The current group being processed by the tokenizer |
Finish tokenizing. This flushes the buffer, yielding any remaining text to the client.
[ show source ]
# File lib/syntax/common.rb, line 57 57: def finish 58: start_group nil 59: teardown 60: end
Get the value of the specified option.
[ show source ]
# File lib/syntax/common.rb, line 89 89: def option(opt) 90: @options ? @options[opt] : nil 91: end
Specify a set of tokenizer-specific options. Each tokenizer may (or may not) publish any options, but if a tokenizer does those options may be used to specify optional behavior.
[ show source ]
# File lib/syntax/common.rb, line 84 84: def set( opts={} ) 85: ( @options ||= Hash.new ).update opts 86: end
Subclasses may override this method to provide implementation-specific setup logic.
[ show source ]
# File lib/syntax/common.rb, line 52 52: def setup 53: end
Start tokenizing. This sets up the state in preparation for tokenization, such as creating a new scanner for the text and saving the callback block. The block will be invoked for each token extracted.
[ show source ]
# File lib/syntax/common.rb, line 42 42: def start( text, &block ) 43: @chunk = "" 44: @group = :normal 45: @callback = block 46: @text = StringScanner.new( text ) 47: setup 48: end
Subclasses must implement this method, which is called for each iteration of the tokenization process. This method may extract multiple tokens.
[ show source ]
# File lib/syntax/common.rb, line 69 69: def step 70: raise NotImplementedError, "subclasses must implement #step" 71: end
Subclasses may override this method to provide implementation-specific teardown logic.
[ show source ]
# File lib/syntax/common.rb, line 64 64: def teardown 65: end
Begins tokenizing the given text, calling step until the text has been exhausted.
[ show source ]
# File lib/syntax/common.rb, line 75 75: def tokenize( text, &block ) 76: start text, &block 77: step until @text.eos? 78: finish 79: end