SWI-Prolog -- Character Escape Syntax

Did you know ...

Search Documentation:

Character Escape Syntax

HOME
DOWNLOAD
DOCUMENTATION
- Manual
- Packages
- FAQ
- Command line
- PlDoc
- Bluffers▶
- License
- Publications
- Rev 7 Extensions
TUTORIALS
- Beginner▶
- Advanced▶
- Web applications▶
- Semantic web▶
  - ClioPatria
  - RDF namespaces
- Graphics▶
  - XPCE
  - GUI options
- Machine learning▶
  - Probabilistic Logic Programming
- External collections▶
  - Meta level tutorials
- For packagers▶
  - Linux packages
COMMUNITY
COMMERCIAL
WIKI
- Login
- View changes
- Sandbox
- Wiki help
- All tags

Documentation
- Reference manual
  - Overview
    - The SWI-Prolog syntax
      - ISO Syntax Support
        
        Processor Character Set
        
        Nested comments
        
        Character Escape Syntax
        
        Syntax for non-decimal numbers
        
        Using digit groups in large integers
        
        Rational number syntax
        
        NaN and Infinity floats and their syntax
        
        Force only underscore to introduce a variable
        
        Unicode Prolog source
        
        Singleton variable checking
- Packages

2.15.1.3 Character Escape Syntax

Within quoted atoms (using single quotes: ’<atom>’) special characters are represented using escape sequences. An escape sequence is led in by the backslash (\) character. The list of escape sequences is compatible with the ISO standard but contains some extensions, and the interpretation of numerically specified characters is slightly more flexible to improve compatibility. Undefined escape characters raise a syntax_error exception.^{29Up to SWI-Prolog 6.1.9,
undefined escape characters were copied verbatim, i.e., removing the
backslash.}

\a

Alert character. Normally the ASCII character 7 (beep).

\b

Backspace character.

\c

No output. All input characters up to but not including the first non-layout character are skipped. This allows for the specification of pretty-looking long lines. Not supported by ISO. Example:

format('This is a long line that looks better if it was \c
       split across multiple physical lines in the input')

\<NEWLINE>

When in ISO mode (see the Prolog flag iso), only skip this sequence. In native mode, white space that follows the newline is skipped as well and a warning is printed, indicating that this construct is deprecated and advising to use \c. We advise using \c or putting the layout before the \, as shown below. Using \c is supported by various other Prolog implementations and will remain supported by SWI-Prolog. The style shown below is the most compatible solution.^{30Future
versions will interpret \<return>
according to ISO.}

format('This is a long line that looks better if it was \
split across multiple physical lines in the input')

instead of

format('This is a long line that looks better if it was\
 split across multiple physical lines in the input')

Note that SWI-Prolog also allows unescaped newlines to appear in quoted material. This is not allowed by the ISO standard, but used to be common practice before.

\e

Escape character (ASCII 27). Not ISO, but widely supported.

\f

Form-feed character.

\n

Next-line character.

\r

Carriage-return only (i.e., go back to the start of the line).

\s

Space character. Intended to allow writing 0'\s to get the character code of the space character. Not ISO.

\t

Horizontal tab character.

\v

Vertical tab character (ASCII 11).

\xXX..\

Hexadecimal specification of a character. The closing \ is obligatory according to the ISO standard, but optional in SWI-Prolog to enhance compatibility with the older Edinburgh standard. The code \xa\3 emits the character 10 (hexadecimal‘a’) followed by‘3’. Characters specified this way are interpreted as Unicode characters. See also \u.

\uXXXX

Unicode character specification where the character is specified using exactly 4 hexadecimal digits. This is an extension to the ISO standard, fixing two problems. First, where \x defines a numeric character code, it doesn't specify the character set in which the character should be interpreted. Second, it is not needed to use the idiosyncratic closing \ ISO Prolog syntax.

\UXXXXXXXX

Same as \uXXXX, but using 8 digits to cover the whole Unicode set.

\40

Octal character specification. The rules and remarks for hexadecimal specifications apply to octal specifications as well.

\\

Escapes the backslash itself. Thus, '\\' is an atom consisting of a single \.

\’

Single quote. Note that '\'' and '''' both describe the atom with a single ’, i.e., '\'' == '''' is true.

\"

Double quote.

\‘

Back quote.

Character escaping is only available if current_prolog_flag(character_escapes, true) is active (default). See current_prolog_flag/2. Character escapes conflict with writef/2 in two ways: \40 is interpreted as decimal 40 by writef/2, but as octal 40 (decimal 32) by read. Also, the writef/2 sequence \l is illegal. It is advised to use the more widely supported format/[2,3] predicate instead. If you insist upon using writef/2, either switch character_escapes to false, or use double \\, as in writef('\\l').

Tags are associated to your profile if you are logged in|Report abuse

Tags:

doc-needs-help

EricGT said (2020-07-03T12:48:25):

0

Related forum post:

Preserving CR with LF when converting text files to character codes (ref)

string_codes/2 with escape sequences \xXX for CR with LF (ref)

LogicalCaptain said (2020-01-20T23:23:36):

0

This section applies to Strings, too

?- format(">\c
|    hello<").
>hello<
true.

Jan Wielemaker writes: "Although incomplete in the implementation, the overall idea is that predicates that require text input accept all text representations and produce the documented type as output."

The \uXXXX encoding (really, NAMING of a glyph in the Unicode BMP) is called "UCS-2"

https://en.wikipedia.org/wiki/Universal_Coded_Character_Set

More on Unicode here:

https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology

Note that the unicode escapes are meant to be used in the context of strings, atoms or chars (atoms of length 1).

Observe:

?- [user].
|: özi(11,1).
|: % user://1 compiled 0.02 sec, 1 clauses
true.

BUT

?- \u00F6zi(11,1).
ERROR: Unknown procedure: (\)/1 (DWIM could not correct goal)

The \u00F6 is not translated "early" by the tokenizer, it can only appear inside a string or atom.

This unlike in Java, where is \u00F6 appearing in the source would be interpreted by the tokenizer as the character "ö", which is why you can write.

void \u00F6zi(int x, int y) {
}

See https://www.sitepoint.com/java-unicode-mysterious-compile-error/

OTOH, some documentation talks about what write/2 or format/2 will do when they encounter escape sequences in their argument strings. Well, those predicates will generally not see any escape sequences because these are translated into their corresponding characters in the same way as "\uXXXX".

EricGT said (2019-05-18T13:15:33):

0

Note: \c does not exclude comments

When using \c e.g.

Input = "\c
  Line 1\n\c
  Line 2\n\c
"

comments are included in string, e.g.

Input = "\c
  Line 1\n\c   % Comment
  Line 2\n\c
"