Did you know ... | Search Documentation: |
Pack le -- tokenize/design_notes.md |
Initially extracted from conversation with [@Annieppo](https://github.com/Anniepoo) and [@nicoabie](https://github.com/nicoabie) in ##prolog on freenode.
The library started as a very simple and lightweight set of predicates for a common, but very limited, form of lexing. As we extend it, we aim to maintain a modest scope in order to achieve a sweet spot between ease of use and powerful flexibility.
tokenize
does not aspire to become an industrial strength lexer generator. We
aim to serve most users needs between raw input and a structured form ready for
parsing by a DCG.
If a user is parsing a language with keywords such as class
, module
, etc.,
and wants to distinguish these from variable names, tokenize
isn't going to
give you this out of the box. But, it should provide an easy means of achieving
this result through a subsequent lexing pass.
space(' ')
.escape
.
tokenization need to return tokens represented with the same arity)"-12.3"
, numbers, punctuation
should yield [pnct('-'),
number(12), pnct('.'), number(3)]
while punctuation, numbers
should yield
[number(-12.3)]
.