Class | CodeRay::Scanners::Scanner |
In: |
lib/coderay/scanner.rb
|
Parent: | StringScanner |
The base class for all Scanners.
It is a subclass of Ruby‘s great StringScanner, which makes it easy to access the scanning methods inside.
It is also Enumerable, so you can use it like an Array of Tokens:
require 'coderay' c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;" for text, kind in c_scanner puts text if kind == :operator end # prints: (*==)++;
OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.
ScanError | = | Class.new(Exception) | Raised if a Scanner fails while scanning | |
DEFAULT_OPTIONS | = | { :stream => false } |
The default options for all scanner classes.
Define @default_options for subclasses. |
|
KINDS_NOT_LOC | = | [:comment, :doctype] |
string | -> | code |
More mnemonic accessor name for the input string. |
# File lib/coderay/scanner.rb, line 86 86: def file_extension extension = nil 87: if extension 88: @file_extension = extension.to_s 89: else 90: @file_extension ||= plugin_id.to_s 91: end 92: end
If you set :stream to true in the options, the Scanner uses a TokenStream with the block as callback to handle the tokens.
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 120 120: def initialize code='', options = {}, &block 121: raise "I am only the basic Scanner class. I can't scan "\ 122: "anything. :( Use my subclasses." if self.class == Scanner 123: 124: @options = self.class::DEFAULT_OPTIONS.merge options 125: 126: super Scanner.normify(code) 127: 128: @tokens = options[:tokens] 129: if @options[:stream] 130: warn "warning in CodeRay::Scanner.new: :stream is set, "\ 131: "but no block was given" unless block_given? 132: raise NotStreamableError, self unless kind_of? Streamable 133: @tokens ||= TokenStream.new(&block) 134: else 135: warn "warning in CodeRay::Scanner.new: Block given, "\ 136: "but :stream is #{@options[:stream]}" if block_given? 137: @tokens ||= Tokens.new 138: end 139: @tokens.scanner = self 140: 141: setup 142: end
# File lib/coderay/scanner.rb, line 69 69: def normify code 70: code = code.to_s 71: if code.respond_to?(:encoding) && (code.encoding.name != 'UTF-8' || !code.valid_encoding?) 72: code = code.dup 73: original_encoding = code.encoding 74: code.force_encoding 'Windows-1252' 75: unless code.valid_encoding? 76: code.force_encoding original_encoding 77: if code.encoding.name == 'UTF-8' 78: code.encode! 'UTF-16BE', :invalid => :replace, :undef => :replace, :replace => '?' 79: end 80: code.encode! 'UTF-8', :invalid => :replace, :undef => :replace, :replace => '?' 81: end 82: end 83: code.to_unix 84: end
# File lib/coderay/scanner.rb, line 208 208: def column pos = self.pos 209: return 0 if pos <= 0 210: string = string() 211: if string.respond_to?(:bytesize) && (defined?(@bin_string) || string.bytesize != string.size) 212: @bin_string ||= string.dup.force_encoding('binary') 213: string = @bin_string 214: end 215: pos - (string.rindex(?\n, pos) || 0) 216: end
# File lib/coderay/scanner.rb, line 222 222: def marshal_load options 223: @options = options 224: end
Whether the scanner is in streaming mode.
# File lib/coderay/scanner.rb, line 188 188: def streaming? 189: !!@options[:stream] 190: end
# File lib/coderay/scanner.rb, line 149 149: def string= code 150: code = Scanner.normify(code) 151: if defined?(RUBY_DESCRIPTION) && RUBY_DESCRIPTION['rubinius 1.0.1'] 152: reset_state 153: @string = code 154: else 155: super code 156: end 157: reset_instance 158: end
Scans the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 170 170: def tokenize new_string=nil, options = {} 171: options = @options.merge(options) 172: self.string = new_string if new_string 173: @cached_tokens = 174: if @options[:stream] # :stream must have been set already 175: reset unless new_string 176: scan_tokens @tokens, options 177: @tokens 178: else 179: scan_tokens @tokens, options 180: end 181: end
Scanner error with additional status information
# File lib/coderay/scanner.rb, line 253 253: def raise_inspect msg, tokens, state = 'No state given!', ambit = 30 254: raise ScanError, "\n\n***ERROR in %s: %s (after %d tokens)\n\ntokens:\n%s\n\ncurrent line: %d column: %d pos: %d\nmatched: %p state: %p\nbol? = %p, eos? = %p\n\nsurrounding code:\n%p ~~ %p\n\n\n***ERROR***\n\n" % [ 255: File.basename(caller[0]), 256: msg, 257: tokens.size, 258: tokens.last(10).map { |t| t.inspect }.join("\n"), 259: line, column, pos, 260: matched, state, bol?, eos?, 261: string[pos - ambit, ambit], 262: string[pos, ambit], 263: ] 264: end
# File lib/coderay/scanner.rb, line 246 246: def reset_instance 247: @tokens.clear unless @options[:keep_tokens] 248: @cached_tokens = nil 249: @bin_string = nil if defined? @bin_string 250: end
Shorthand for scan_until(/\z/). This method also avoids a JRuby 1.9 mode bug.
# File lib/coderay/scanner.rb, line 287 287: def scan_rest 288: rest = self.rest 289: terminate 290: rest 291: end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 241 241: def scan_tokens tokens, options 242: raise NotImplementedError, 243: "#{self.class}#scan_tokens not implemented." 244: end