Related forum post:
Preserving CR with LF when converting text files to character codes (ref)
string_codes/2 with escape sequences \xXX for CR with LF (ref)
Did you know ... | Search Documentation: |
Character Escape Syntax |
Within quoted atoms (using single quotes: ’<atom>’
)
special characters are represented using escape sequences. An escape
sequence is led in by the backslash (
)
character. The list of escape sequences is compatible with the ISO
standard but contains some extensions, and the interpretation of
numerically specified characters is slightly more flexible to improve
compatibility. Undefined escape characters raise a \
syntax_error
exception.29Up to SWI-Prolog 6.1.9,
undefined escape characters were copied verbatim, i.e., removing the
backslash.
\a
\b
\c
format('This is a long line that looks better if it was \c split across multiple physical lines in the input')
\<NEWLINE>
\c
. We
advise using \c
or putting the layout before the \
,
as shown below. Using
\c
is supported by various other Prolog implementations and
will remain supported by SWI-Prolog. The style shown below is the most
compatible solution.30Future
versions will interpret \
<return>
according to ISO.
format('This is a long line that looks better if it was \ split across multiple physical lines in the input')
instead of
format('This is a long line that looks better if it was\ split across multiple physical lines in the input')
Note that SWI-Prolog also allows unescaped newlines to appear in quoted material. This is not allowed by the ISO standard, but used to be common practice before.
\e
\f
\n
\r
\s
0'\s
to get the
character code of the space character. Not ISO.\t
\v
\xXX..\
\
is
obligatory according to the ISO standard, but optional in SWI-Prolog to
enhance compatibility with the older Edinburgh standard. The code
\xa\3
emits the character 10 (hexadecimal‘a’)
followed by‘3’. Characters specified this way are
interpreted as Unicode characters. See also \u
.\uXXXX
\x
defines a
numeric character code, it doesn't specify the character set in which
the character should be interpreted. Second, it is not needed to use the
idiosyncratic closing \
ISO Prolog syntax.\UXXXXXXXX
\uXXXX
, but using 8 digits to cover the whole
Unicode set.\40
\\
'\\'
is an atom
consisting of a single \
.\’
'\''
and ''''
both
describe the atom with a single ’
, i.e., '\'' == ''''
is true.\"
\‘
Character escaping is only available if
current_prolog_flag(character_escapes, true)
is active
(default). See current_prolog_flag/2.
Character escapes conflict with writef/2
in two ways: \40
is interpreted as decimal 40 by writef/2,
but as octal 40 (decimal 32) by read
. Also, the writef/2
sequence
\l
is illegal. It is advised to use the more widely
supported
format/[2,3]
predicate instead. If you insist upon using writef/2,
either switch character_escapes
to false
, or use double \\
, as in writef('\\l')
.
Related forum post:
Preserving CR with LF when converting text files to character codes (ref)
string_codes/2 with escape sequences \xXX for CR with LF (ref)
This section applies to Strings, too
?- format(">\c | hello<"). >hello< true.
Jan Wielemaker writes: "Although incomplete in the implementation, the overall idea is that predicates that require text input accept all text representations and produce the documented type as output."
The \uXXXX encoding (really, NAMING of a glyph in the Unicode BMP) is called "UCS-2"
https://en.wikipedia.org/wiki/Universal_Coded_Character_Set
More on Unicode here:
https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology
Note that the unicode escapes are meant to be used in the context of strings, atoms or chars (atoms of length 1).
Observe:
?- [user]. |: özi(11,1). |: % user://1 compiled 0.02 sec, 1 clauses true.
BUT
?- \u00F6zi(11,1). ERROR: Unknown procedure: (\)/1 (DWIM could not correct goal)
The \u00F6 is not translated "early" by the tokenizer, it can only appear inside a string or atom.
This unlike in Java, where is \u00F6 appearing in the source would be interpreted by the tokenizer as the character "ö", which is why you can write.
void \u00F6zi(int x, int y) { }
See https://www.sitepoint.com/java-unicode-mysterious-compile-error/
OTOH, some documentation talks about what write/2 or format/2 will do when they encounter escape sequences in their argument strings. Well, those predicates will generally not see any escape sequences because these are translated into their corresponding characters in the same way as "\uXXXX".
Note: \c does not exclude comments
When using \c e.g.
Input = "\c Line 1\n\c Line 2\n\c "
comments are included in string, e.g.
Input = "\c Line 1\n\c % Comment Line 2\n\c "