Pack tokenize -- prolog/tokenize.pl

tokenize(+Text:text, -Tokens:list(term)) is semidet

See also: - tokenize/3 when called with an empty list of options: thus, with defaults.

tokenize(+Text:text, -Tokens:list(term), +Options:list(term)) is semidet

True when Tokens is unified with a list of tokens representing the text from Text, according to the options specified in Options.

Each token in Tokens will be one of:

word(W): Where W is comprised of contiguous alpha-numeric chars.
punct(P): Where char_type(P, punct).
cntrl(C): Where char_type(C, cntrl).
space(S): Where S == ' '.
number(N): Where number(N).
string(S): Where S was a sequence of bytes enclosed by double quotation marks.

Note that the above describes the default behavior, in which the token is represented as an atom. This representation can be changed by using the to option described below.

Valid Options are:

cased(+boolean): Determines whether tokens perserve cases of the source text. Defaults to cased(false).
spaces(+boolean): Determines whether spaces are represted as tokens or discarded. Defaults to spaces(true).
cntrl(+boolean): Determines whether control characters are represented as tokens or discarded. Defaults to cntrl(true).
punct(+boolean): Determines whether punctuation characters are represented as tokens or discarded. Defaults to punct(true).
numbers(+boolean): Determines whether the tokenizer represents and tags numbers. Defaults to numbers(true).
strings(+boolean): Determines whether the tokenizer represents and tags strings. Defaults to strings(true).
pack(+boolean): Determines whether tokens are packed or repeated. Defaults to pack(false).
to(+one_of([strings, atoms, chars, codes])): Determines the representation format used for the tokens. Defaults to to(atoms).

tokenize_file(+File:atom, -Tokens:list(term)) is semidet

See also: - tokenize_file/3 when called with an empty list of options: thus, with defaults.

tokenize_file(+File:atom, -Tokens:list(term), +Options:list(term)) is semidet

True when Tokens is unified with a list of tokens represening the text of File.

See also: - tokenize/3 which has the same available options and behavior.

untokenize(+Tokens:list(term), -Untokens:list(codes)) is semidet

True when Untokens is unified with a code list representation of each token in Tokens.