SWI-Prolog -- Rationale for changes from version 1

1.8 Rationale for changes from version 1

1.8.1 Implicit constructors and conversion operators

The original version of the C++ interface heavily used implicit constructors and conversion operators. This allowed, for example:

PREDICATE(hello, 1)
{ cout << "Hello " << (char *)A1 << endl; // Deprecated
  return true;
}

PREDICATE(add, 3)
{ return A3 = (long)A1 + (long)A2; // Deprecated
}

Version 2 is a bit more verbose:

PREDICATE(hello, 1)
{ cout << "Hello " << A1.as_string() << endl;
  return true;
}

PREDICATE(add, 3)
{ return A3.unify_int(A1.as_long() + A2.as_long());
}

There are a few reasons for this:

The implicit constructors and conversion operators, combined with the C++ conversion rules for integers and floats, could sometimes lead to subtle bugs that were difficult to find -- in one case, a typo resulted in terms being unified with floating point values when the code intended them to be atoms. This was mainly because the underlying C types for terms, atoms, etc. are unsigned integers, leading to confusion between numeric values and Prolog terms and atoms.
The overloaded assignment operator for unification changed the usual C++ semantics for assignments from returning a reference to the left-hand-side to returning a bool. In addition, the result of unification should always be checked (e.g., an “always succeed” unification could fail due to an out-of-memory error); the unify_XXX() methods return a bool and they can be wrapped inside a PlCheckFail() to raise an exception on unification failure.
The C-style of casts is deprecated in C++, so the expression (char*)A1 becomes the more verbose static_cast<std::string>(A1), which is longer than A1.as_string(). Also, the string casts don't allow for specifying encoding.
The implicit constructors and conversion operators were attractive because they allowed directly calling the foreign language interface functions, for example:
```
PlTerm t;
Pl_put_atom_chars(t, "someName");
```
whereas this is now required:
```
PlTerm t;
Pl_put_atom_chars(t.as_term_t(), "someName");
```
However, this is mostly avoided by methods and constructors that wrap the foreign language functions:
```
PlTerm_atom t("someName");
```
or
```
auto t = PlTerm_atom("someName");
```
Additionally, there are now wrappers for most of the PL_*() functions that check the error return and throw a C++ exception as appropriate.

Over time, it is expected that some of these restrictions will be eased, to allow a more compact coding style that was the intent of the original API. However, too much use of overloaded methods/constructors, implicit conversions and constructors can result in code that's difficult to understand, so a balance needs to be struck between compactness of code and understandability.

For backwards compatibility, much of the version 1 interface is still available (except for the implicit constructors and operators), but marked as “deprecated” ; code that depends on the parts that have been removed can be easily changed to use the new interface.

1.8.2 Strings

The version API often used char* for both setting and setting string values. This is not a problem for setting (although encodings can be an issue), but can introduce subtle bugs in the lifetimes of pointers if the buffer stack isn't used properly. PlStringBuffers makes the buffer stack easier to use, but it would be preferable to avoid its use altogether. C++, unlike C, has a standard string that allows easily keeping a copy rather than dealing with a pointer that might become invalid. (Also, C++ strings can contain null characters.)

C++ has default conversion operators from char* to std::string, so some of the API support only std::string, even though this can cause a small inefficiency. If this proves to be a problem, additional overloaded functions and methods can be provided in future (note that some compilers have optimizations that reduce the overheads of using std::string); but for performance-critical code, the C functions can still be used.

Unicode and encodings are handled as follows. std::wstring and wchar_t* carry full Unicode and need no encoding. For interfaces that use std::string or char* the byte encoding is given by an optional PlEncoding argument:

typedef enum class PlEncoding
{ Latin1 = REP_ISO_LATIN_1,
  UTF8   = REP_UTF8,
  Locale = REP_MB
} PlEncoding;
static constexpr PlEncoding ENC_INPUT  = PlEncoding::Latin1;
static constexpr PlEncoding ENC_OUTPUT = PlEncoding::Locale;

Methods that construct Prolog text from std::string or char* (the PlAtom, PlTerm_atom, PlTerm_string and PlCompound constructors and the PlTerm::unify_*() methods) default to ENC_INPUT (PlEncoding::Latin1), which is byte-compatible with the pre-encoding API. Methods that return text (PlTerm::as_string(), PlAtom::as_string()) default to ENC_OUTPUT (PlEncoding::Locale). Pass PlEncoding::UTF8 explicitly when the char* or std::string holds UTF-8.

Identifiers that are normally program constants rather than data derived from arbitrarily encoded external input --- the names passed to PlModule and PlPredicate --- are hard-wired to UTF-8, matching the PL_predicate() C API, and therefore take no PlEncoding argument.

The earlier argument order PlAtom(PlEncoding, len, s) and PlAtom(PlEncoding, std::string&) is [[deprecated]] in favour of the trailing-PlEncoding forms.