True when Tokens is unified with a list of tokens representing the text from
Text, according to the options specified in Options.
Each token in Tokens will be one of:
- word(W)
- Where W is comprised of contiguous alpha-numeric chars.
- punct(P)
- Where
char_type(P, punct)
.
- cntrl(C)
- Where
char_type(C, cntrl)
.
- space(S)
- Where
S == ' '
.
- number(N)
- Where
number(N)
.
- string(S)
- Where S was a sequence of bytes enclosed by double quotation marks.
Note that the above describes the default behavior, in which the token is
represented as an atom
. This representation can be changed by using the
to
option described below.
Valid Options are:
- cased(+boolean)
- Determines whether tokens perserve cases of the source text. Defaults to
cased(false)
.
- spaces(+boolean)
- Determines whether spaces are represted as tokens or discarded. Defaults to
spaces(true)
.
- cntrl(+boolean)
- Determines whether control characters are represented as tokens or discarded. Defaults to
cntrl(true)
.
- punct(+boolean)
- Determines whether punctuation characters are represented as tokens or discarded. Defaults to
punct(true)
.
- numbers(+boolean)
- Determines whether the tokenizer represents and tags numbers. Defaults to
numbers(true)
.
- strings(+boolean)
- Determines whether the tokenizer represents and tags strings. Defaults to
strings(true)
.
- pack(+boolean)
- Determines whether tokens are packed or repeated. Defaults to
pack(false)
.
- to(+one_of([strings, atoms, chars, codes]))
- Determines the representation format used for the tokens. Defaults to
to(atoms)
.