SWI-Prolog's
memory management is based on the C runtime malloc() function and
related functions. The characteristics of the malloc()
implementation may affect performance and overall memory usage of the
system. For most Prolog programs the performance impact of the allocator
is small.166Multi-threaded
applications may suffer from allocators that do not effectively avoid false
sharing that affect CPU cache behaviour or operate using a single
lock to provide thread safety. Such allocators should be rare in modern
OSes. The impact on total memory usage can be significant
though, in particular for multi-threaded applications. This is due to
two aspects of SWI-Prolog memory management:
- The Prolog stacks are allocated using malloc(). The stacks
can be extremely large. SWI-Prolog assumes malloc() will use a
mechanism that allows returning this memory to the OS. Most todays
allocators satisfy this requirement.
- Atoms and clauses are allocated by the thread that requires them,
but this memory is freed by the thread running the atom or clause
garbage collector (see garbage_collect_atoms/0
and
garbage_collect_clauses/0).
Normally these run in the thread
gc
, which means that all deallocation happens in this
thread. Notably the ptmalloc
implementation used by the GNU C library (glibc) seems to handle this
poorly.
Starting with version 8.1.27, SWI-Prolog by default links against
tcmalloc
when available. Note that changing the allocator can only be done by
linking the main executable (swipl) to an alternative library.
When embedded (see section
12.4.25) the main program that embeds libswipl
must be
linked with tcmalloc. On ELF based systems (Linux), this effect can also
be achieved using the environment variable LD_PRELOAD
:
% LD_PRELOAD=/path/to/libtcmalloc.so swipl ...
SWI-Prolog attempts to detect the currently active allocator and sets
the Prolog flag malloc
if the detection succeeds. regardless of the malloc implementation, trim_heap/0
is provided.
- [det]trim_heap
- his predicate attempts to return heap memory to the operating system.
There is no portable way of doing so. If the system detects tcmalloc it
calls MallocExtension_ReleaseFreeMemory(). If the system detects
ptmalloc as provided by the GNU runtime library it calls malloc_trim().
In other cases this predicate simply succeeds. See also trim_stacks/0
If SWI-Prolog core detects that tcmalloc is the current allocator and
provides the following additional predicates.
- [nondet]malloc_property(?Property)
- True when Property is a property of the current allocator.
The properties are defined by the allocator. The properties of tcmalloc
are defined in
gperftools/malloc_extension.h
:167Documentation
copied from the header.
- ’generic.current_allocated_bytes’(-Int)
- Number of bytes currently allocated by application.
- ’generic.heap_size’(-Int)
- Number of bytes in the heap (= current_allocated_bytes + fragmentation +
freed memory regions).
- ’tcmalloc.max_total_thread_cache_bytes’(-Int)
- Upper limit on total number of bytes stored across all thread caches.
- ’tcmalloc.current_total_thread_cache_bytes’(-Int)
- Number of bytes used across all thread caches.
- ’tcmalloc.central_cache_free_bytes’(-Int)
- Number of free bytes in the central cache that have been assigned to
size classes. They always count towards virtual memory usage, and unless
the underlying memory is swapped out by the OS, they also count towards
physical memory usage.
- ’tcmalloc.transfer_cache_free_bytes’(-Int)
- Number of free bytes that are waiting to be transferred between the
central cache and a thread cache. They always count towards virtual
memory usage, and unless the underlying memory is swapped out by the OS,
they also count towards physical
- ’tcmalloc.thread_cache_free_bytes’(-Int)
- Number of free bytes in thread caches. They always count towards virtual
memory usage, and unless the underlying memory is swapped out by the OS,
they also count towards physical memory usage.
- ’tcmalloc.pageheap_free_bytes’(-Int)
- Number of bytes in free, mapped pages in page heap. These bytes can be
used to fulfill allocation requests. They always count towards virtual
memory usage, and unless the underlying memory is swapped out by the OS,
they also count towards physical memory usage. This property is not
writable.
- ’tcmalloc.pageheap_unmapped_bytes’(-Int)
- Number of bytes in free, unmapped pages in page heap. These are bytes
that have been released back to the OS, possibly by one of the
MallocExtension "Release" calls. They can be used to fulfill allocation
requests, but typically incur a page fault. They always count towards
virtual memory usage, and depending on the OS, typically do not count
towards physical memory usage.
- [det]set_malloc(+Property)
- Set properties described in malloc_property/1.
Currently the only writable property is
tcmalloc.max_total_thread_cache_bytes
. Setting an unknown
property raises a domain_error
and setting a read-only
property raises a permission_error
exception.
- [semidet]thread_idle(:Goal,
+Duration)
- Indicates to the system that the calling thread will idle for some time
while calling Goal as once/1.
This call releases resources to the OS to minimise the footprint of the
calling thread while it waits. Despite the name this predicate is always
provided, also if the system is not configured with tcmalloc or is
single threaded.
Duration is one of
- short
- Calls trim_stacks/0
and, if tcmalloc is used, calls
MallocExtension_MarkThreadTemporarilyIdle() which empties the
thread's malloc cache but preserves the cache itself.
- long
- Calls garbage_collect/0
and trim_stacks/0
and, if tcmalloc is used, calls MallocExtension_MarkThreadIdle()
which releases all thread-specific allocation data structures.