|Did you know ...||Search Documentation:|
|The string type and its double quoted syntax|
As of SWI-Prolog version 7, text enclosed in double quotes
"Hello world") is read as objects of the type string.
Strings are distinct from lists, which makes it possible to recognize
them at runtime and print them using the string syntax:
?- write("Hello world!"). Hello world! ?- writeq("Hello world!"). "Hello world!"
A string is a compact representation of a character sequence that lives on the global (term) stack. Strings are represented by sequences of Unicode character codes including the character code 0 (zero). The length of strings is limited by the available space on the global (term) stack (see set_prolog_stack/2). Section 5.2.3 motivates the introduction of strings and mapping double quoted text to this type.
Whereas in version 7, double-quoted text is mapped to strings,
back-quoted text (as in
`text`) is mapped to a
character codes, i.e. integers that are Unicode code points. In
a traditional setting, back-quoted would be mapped to a list of
characters (also known as chars), which are atoms of
The settings for the flags that control how double- and back-quoted text is read is summarised in table 8. Programs that aim for compatibility should realise that the ISO standard defines back-quoted text, but does not define the back_quotes Prolog flag and does not define the term that is produced by back-quoted text.
|Version 7 default||string||codes|
With the introduction of strings as a Prolog data type, there are three main ways to represent text: using strings, using atoms and using lists of character codes. As a fourth way, one may also use lists of chars. This section explains what to choose for what purpose. Both strings and atoms are atomic objects: you can only look inside them using dedicated predicates, while lists of character codes or chars are compound data structures forming an extended structure that must follow a convention.
`hello`) can be used to easily specify a list of character codes. The
0'cnotation can be used to specify a single character code.
'Boeing 747'), but also individual words in a natural language processing system. They are also used where other languages would use enumerated types, such as the names of days in the week. Unlike enumerated types, Prolog atoms do not form a fixed set and the same atom can represent different things in different contexts.
Strings are manipulated using a set of predicates that mirrors the set of predicates used for manipulating atoms. In addition to the list below, string/1 performs the type check for this type and is described in section 4.5.
SWI-Prolog's string primitives are being synchronized with ECLiPSe. We expect the set of predicates documented in this section to be stable, although it might be expanded. In general, SWI-Prolog's text manipulation predicates accept any form of text as input argument - they accept anytext input. anytext comprises:
The predicates produce the type indicated by the predicate name as output. This policy simplifies migration and writing programs that can run unmodified or with minor modifications on systems that do not support strings. Code should avoid relying on this feature as much as possible for clarity as well as to facilitate a more strict mode and/or type checking in future releases.
atom_string("x",'x'). atom_string('x',"x"). atom_string(3.1415,3.1415). atom_string('3r2',3r2). atom_string(3r2,'3r2'). atom_string(6r4,3r2).
and the number.
"1e10"is a valid number.
Unlike other predicates of this family, if instantiated, String cannot be an atom.
The corresponding‘atom-handling' predicate is atom_number/2, with reversed argument order.
quoted(true)and the result is converted to String.
?- term_string(Term, 'a(A)', [variable_names(VNames)]). Term = a(_9674), VNames = ['A'=_9674].
See also: atom_chars/2.
utf8. All valid stream encodings except for
wchar_tare supported. See section 2.19.1. Note that this translation is only provided for strings. Creating an atom from bytes requires atom_string/2.170Strings are an efficient intermediate and this conversion is needed only in some uncommon scenarios.
''is ambiguous and interpreted as an empty string.
string_code(-,+,+)is deterministic if the searched-for Code appears only once in String. See also sub_string/5.
A simple split wherever there is a‘.':
?- split_string("a.b.c.d", ".", "", L). L = ["a", "b", "c", "d"].
Consider sequences of separators as a single one:
?- split_string("/home//jan///nice/path", "/", "/", L). L = ["home", "jan", "nice", "path"].
Split and remove white space:
?- split_string("SWI-Prolog, 7.0", ",", " ", L). L = ["SWI-Prolog", "7.0"].
Only remove leading and trailing white space (trim the string):
?- split_string(" SWI-Prolog ", "", "\s\t\n", L). L = ["SWI-Prolog"].
In the typical use cases, SepChars either does not overlap PadChars or is equivalent to handle multiple adjacent separators as a single (often white space). The behaviour with partially overlapping sets of padding and separators should be considered undefined. See also read_string/5.
name_value(String, Name, Value) :- sub_string(String, Before, _, After, "="), !, sub_atom(String, 0, Before, _, Name), sub_string(String, _, After, 0, Value).
?- atomics_to_string([gnu, "gnat", 1], ', ', A). A = "gnu, gnat, 1"
The predicate read_string/5 called repeatedly on an input until Sep is -1 (end of file) is equivalent to reading the entire file into a string and calling split_string/4, provided that SepChars and PadChars are not partially overlapping.171Behaviour that is fully compatible would require unlimited look-ahead. Below are some examples:
Read a line:
read_string(Input, "\n", "\r", Sep, String)
Read a line, stripping leading and trailing white space:
read_string(Input, "\n", "\r\t ", Sep, String)
Read up to‘
unifying Sep with
0', i.e. Unicode 44, or
i.e. Unicode 41:
read_string(Input, ",)", "\t ", Sep, String)
repositionproperty (see stream_property/2). Note that the internal encoding of the data is either ISO Latin 1 or UTF-8.
Prolog defines two forms of quoted text. Traditionally, single quoted text is mapped to atoms while double quoted text is mapped to a list of character codes (integers) or characters (atoms of length 1). Representing text using atoms is often considered inadequate for several reasons:
Representing text as lists, be it of character codes or characters, also comes at a price:
s("hello world")could be used to indicate that we are dealing with a string.
Lacking runtime information, debuggers and the toplevel can only use heuristics to decide whether to print a list of integers as such or as a string (see portray_text/1).
While experienced Prolog programmers have learned to cope with this, we still consider this an unfortunate situation.
We observe that in many programs, most strings are only handled as a single unit during their lifetime. Examining real code tells us that double quoted strings typically appear in one of the following roles:
[X] = "a"is a commonly used template for getting the character code of the letter’a'. ISO Prolog defines the syntax
0'afor this purpose. Code using this must be modified. The modified code will run on any ISO compliant Prolog Processor.
append("name:", Rest, Codes). Such code needs to be modified. In this particular example, the following is a good portable alternative:
phrase("name:", Codes, Rest)
memberchk(C, "~!@#$"). This is a rather inefficient check in a traditional Prolog system because it pushes a list of character codes cell-by-cell onto the Prolog stack and then traverses this list cell-by-cell to see whether one of the cells unifies with C. If the test is successful, the string will eventually be subject to garbage collection. The best code for this is to write a predicate as below, which pushes nothing on the stack and performs an indexed lookup to see whether the character code is in‘my_class'.
my_class(0'~). my_class(0'!). ...
An alternative to reach the same effect is to use term expansion to create the clauses:
term_expansion(my_class(_), Clauses) :- findall(my_class(C), string_code(_, "~!@#$", C), Clauses). my_class(_).
Finally, the predicate string_code/3 can be exploited directly as a replacement for the memberchk/2 on a list of codes. Although the string is still pushed onto the stack, it is more compact and only a single entity.
The predicates in this section can help adapting your program to the new convention for handling double quoted strings. We have adapted a huge code base with which we were not familiar in about half a day.
[X] = "a"into
X = 0'a.
:- set_prolog_flag(double_quotes, codes).
`text`). Note that this will not make your code run regardless of the --traditional command line option and code exploiting this mapping is also not portable to ISO compliant systems.
:- multifile check:string_predicate/1. check:string_predicate(user:help_info/2).
:- multifile check:valid_string_goal/1. check:valid_string_goal(system:format(_,S,_)) :- string(S).