This library defines http_open/3, which opens an URL as a Prolog stream.
The functionality of the library can be extended by loading two
additional modules that act as plugins:
- Loading this library causes http_open/3 to handle HTTPS connections.
Relevant options for SSL certificate handling are handed to
ssl_context/3. This plugin is loaded automatically if the scheme
https is requested using a default SSL context. See the plugin for
additional information regarding security.
- Loading this library supports the
gzip transfer encoding. This
plugin is lazily loaded if a connection is opened that claims this
- Loading this library adds tracking cookies to http_open/3. Returned
cookies are collected in the Prolog database and supplied for
- This library adds support for chunked encoding. It is lazily
loaded if the server sends a
Transfer-encoding: chunked header.
Here is a simple example to fetch a web-page:
?- http_open('http://www.google.com/search?q=prolog', In, ),
<!doctype html><head><title>prolog - Google Search</title><script>
The example below fetches the modification time of a web-page. Note that
'' (the empty atom) if the web-server does not provide a
time-stamp for the resource. See also parse_time/2.
modified(URL, Stamp) :-
Modified \== '',
Then next example uses Google search. It exploits library(uri) to manage
URIs, library(sgml) to load an HTML document and library(xpath) to
navigate the parsed HTML. Note that you may need to adjust the XPath
queries if the data returned by Google changes (this example indeed
no longer works and currently fails at the first xpath/3 call)
google(For, Title, HREF) :-
uri_encoded(query_value, For, Encoded),
atom_concat('http://www.google.com/search?q=', Encoded, URL),
http_open(URL, In, ),
load_html(In, DOM, ),
xpath(DOM, //h3(@class=r), Result),
xpath(Result, //a(@href=HREF0, text), Title),
uri_data(search, Components, Query),
An example query is below:
?- google(prolog, Title, HREF).
Title = 'SWI-Prolog',
HREF = 'http://www.swi-prolog.org/' ;
Title = 'Prolog - Wikipedia',
HREF = 'https://nl.wikipedia.org/wiki/Prolog' ;
Title = 'Prolog - Wikipedia, the free encyclopedia',
HREF = 'https://en.wikipedia.org/wiki/Prolog' ;
Title = 'Pro-Log is logistiek dienstverlener m.b.t. vervoer over water.',
HREF = 'http://www.pro-log.nl/' ;
Title = 'Learn Prolog Now!',
HREF = 'http://www.learnprolognow.org/' ;
Title = 'Free Online Version - Learn Prolog
- See also
- - load_html/3 and xpath/3 can be used to parse and navigate HTML
- - http_get/3 and http_post/4 provide an alternative interface that
convert the reply depending on the
- user_agent(-Agent) is det[private]
- Default value for
User-Agent, can be overruled using the
user_agent(Agent) of http_open/3.
- http_open(+URL, -Stream, +Options) is det
- Open the data at the HTTP server as a Prolog stream. URL is
either an atom specifying a URL or a list representing a
broken-down URL as specified below. After this predicate
succeeds the data can be read from Stream. After completion this
stream must be closed using the built-in Prolog predicate
close/1. Options provides additional options:
true), do not try to automatically
authenticate the client if a 401 (Unauthorized) status code
- Send authorization. See also http_set_authorization/2. Supported
- basic(+User, +Password)
- HTTP Basic authentication.
- HTTP Bearer authentication.
- digest(+User, +Password)
- HTTP Digest authentication. This option is only provided
if the plugin library(http/http_digest) is also loaded.
- Connect to the given Unix domain socket. In this scenario
the host name and port or ignored. If the server replies
with a redirect message and the host differs from the
original host as normal TCP connection is used to handle
the redirect. This option is inspired by
- Specify the
Connection header. Default is
Keep-alive. This maintains a pool of
available connections as determined by keep_connection/1.
Keep-alive connections can be closed explicitly using
http_close_keep_alive/1. Keep-alive connections may
significantly improve repetitive requests on the same server,
especially if the IP route is long, HTTPS is used or the
connection uses a proxy.
- Unify FinalURL with the final destination. This differs from
the original URL if the returned head of the original
indicates an HTTP redirect (codes 301, 302 or 303). Without a
redirect, FinalURL is the same as URL if URL is an atom, or a
URL constructed from the parts.
- header(Name, -AtomValue)
- If provided, AtomValue is unified with the value of the
indicated field in the reply header. Name is matched
case-insensitive and the underscore (_) matches the hyphen
(-). Multiple of these options may be provided to extract
multiple header fields. If the header is not available
AtomValue is unified to the empty atom ('').
- If provided, List is unified with a list of Name(Value) pairs
corresponding to fields in the reply header. Name and Value
follow the same conventions used by the
option. A pseudo header
status_code(Code) is added to provide
the HTTP status as an integer. See also
which provides the entire HTTP reply header in unparsed
- One of
head message can be
used in combination with the
header(Name, Value) option to
access information on the resource without actually fetching
the resource itself. The returned stream must be closed
post(Data) is provided, the default is
- Size is unified with the integer value of
in the reply header.
- Version is a pair
Major-Minor, where Major and Minor
are integers representing the HTTP version in the reply header.
- Ask for partial content. Range is a term Unit(From,To),
where From is an integer and To is either an integer or
end. HTTP 1.1 only supports Unit =
to ask for bytes 1000-1999, use the option
- Do not install a decoding filter for Encoding. For example,
raw_encoding('applocation/gzip') the system will not
decompress the stream if it is compressed using
- Unify Lines with a list of strings that represents the complete
reply header returned by the server. See also
true), do not automatically redirect
if a 3XX code is received. Must be combined with
status_code(Code) and one of the header options to read the
redirect reply. In particular, without
redirect is mapped to an exception.
- If this option is present and Code unifies with the HTTP
status code, do not translate errors (4xx, 5xx) into an
exception. Instead, http_open/3 behaves as if 2xx (success) is
returned, providing the application to read the error document
from the returned stream.
- Unify the output stream with Out and do not close it. This can
be used to upgrade a connection.
- If provided, set a timeout on the stream using set_stream/2.
With this option if no new data arrives within Timeout seconds
the stream raises an exception. Default is to wait forever
- Issue a
POST request on the HTTP server. Data is
handed to http_post_data/3.
- Use an HTTP proxy to connect to the outside world. See also
socket:proxy_for_url/3. This option overrules the proxy
specification defined by socket:proxy_for_url/3.
- proxy(+Host, +Port)
- Synonym for
- Send authorization to the proxy. Otherwise the same as the
true, bypass proxy hooks. Default is
- Additional name-value parts are added in the order of
appearance to the HTTP request header. No interpretation is
- Sets the maximum length of a redirection chain. This is needed
for some IRIs that redirect indefinitely to other IRIs without
looping (e.g., redirecting to IRIs with a random element in them).
Max must be either a non-negative integer or the atom
The default value is
- Defines the value of the
User-Agent field of the HTTP
header. Default is
The hook http:open_options/2 can be used to provide default
options based on the broken-down URL. The option
status_code(-Code) is particularly useful to query REST
interfaces that commonly return status codes other than
that need to be be processed by the client code.
|URL||- is either an atom or string (url) or a list of parts.
When provided, this list may contain the fields
query_string (whose argument is an atom)
search (whose argument is a list of
Name=Value compound terms).
host is mandatory. The example below opens the
Note that values must not be quoted because the
library inserts the required quotes.
search([ q='Hello world',
error(existence_error(url, Id),Context) is raised if the
HTTP result code is not in the range 200..299. Context has the
context(Message, status(Code, TextCode)), where Code is the
numeric HTTP code and TextCode is the textual description thereof
provided by the server. Message may provide additional details or
may be unbound.
- See also
- - ssl_context/3 for SSL related options if
library(http/http_ssl_plugin) is loaded.
- hooked_options(+Parts, -Options) is nondet[private]
- Calls http:open_options/2 and if necessary upgrades old SSL
cacerts_file(File) option to a
cacerts(List) option to ensure proper
merging of options.
- autoload_https(+Parts) is det[private]
- If the requested scheme is https or wss, load the HTTPS plugin.
- send_rec_header(+StreamPair, -Stream, +Host, +RequestURI, +Parts, +Options) is det[private]
- Send header to Out and process reply. If there is an error or
failure, close In and Out and return the error or failure.
- http_version(-Version:atom) is det[private]
- HTTP version we publish. We can only use 1.1 if we support
- map_method(+MethodID, -Method)[multifile]
- Support additional
METHOD keywords. Default are the official
HTTP methods as defined by the various RFCs.
- x_headers(+Options, +URI, +Out) is det[private]
- Emit extra headers from
request_header(Name=Value) options in
- To be done
- - Use user/password fields
- auth_header(+AuthOption, +Options, +HeaderName, +Out)[private]
- do_open(+HTTPVersion, +HTTPStatusCode, +HTTPStatusComment, +Header, +Options, +Parts, +Host, +In, -FinalIn) is det[private]
- Handle the HTTP status once available. If 200-299, we are ok. If a
redirect, redo the open, returning a new stream. Else issue an
- redirect_limit_exceeded(+Options:list(compound), -Max:nonneg) is semidet[private]
- True if we have exceeded the maximum redirection length (default 10).
- redirect_loop(+Parts, +Options) is semidet[private]
- True if we are in a redirection loop. Note that some sites
redirect once to the same place using cookies or similar, so we
allow for two tries. In fact, we should probably test whether
authorization or cookie headers have changed.
- redirect_options(+Parts, +RedirectedParts, +Options0, -Options) is det[private]
- A redirect from a POST should do a GET on the returned URI. This
means we must remove the
post(Data) options from
the original option-list.
If we are connecting over a Unix domain socket we drop this option
if the redirect host does not match the initial host.
- map_error_code(+HTTPCode, -PrologError) is semidet[private]
- Map HTTP error codes to Prolog errors.
- To be done
- - Many more maps. Unfortunately many have no sensible Prolog
- open_socket(+Address, -StreamPair, +Options) is det[private]
- Create and connect a client socket to Address. Options
- Sets timeout on the stream, after connecting the
- To be done
- - Make timeout also work on tcp_connect/4.
- - This is the same as do_connect/4 in http_client.pl
- parse_headers(+Lines, -Headers:list(compound)) is det[private]
- Parse the header lines for the
headers(-List) option. Invalid
header lines are skipped, printing a warning using
- return_final_url(+Options) is semidet[private]
- If Options contains
final_url(URL), unify URL with the final
URL after redirections.
- transfer_encoding_filter(+Lines, +In0, -In, +Options) is det[private]
- Install filters depending on the transfer encoding. If In0 is a
stream-pair, we close the output side. If transfer-encoding is
not specified, the content-encoding is interpreted as a synonym
for transfer-encoding, because many servers incorrectly depend
on this. Exceptions to this are content-types for which
- http:disable_encoding_filter(+ContentType) is semidet[multifile]
- Do not use the
encoding for specific values of ContentType. This predicate is
multifile and can thus be extended by the user.
- transfer_encoding(+Lines, -Encoding) is semidet[private]
- True if Encoding is the value of the
- content_encoding(+Lines, -Encoding) is semidet[private]
- True if Encoding is the value of the
- read_header(+In:stream, +Parts, -Version, -Code:int, -Comment:atom, -Lines:list) is det[private]
- Read the HTTP reply-header. If the reply is completely empty
an existence error is thrown. If the replied header is
otherwise invalid a 500 HTTP error is simulated, having the
Invalid reply header.
|Parts||- A list of compound terms that describe the
parsed request URI.|
|Version||- HTTP reply version as Major-Minor pair|
|Code||- Numeric HTTP reply-code|
|Comment||- Comment of reply-code as atom|
|Lines||- Remaining header lines as code-lists.|
- content_length(+Header, -Length:int) is semidet[private]
- Find the Content-Length in an HTTP reply-header.
- Read 1 or more digits and return as integer.
- Get rest of input as an atom.
- reply_header(+Lines, +Options) is det[private]
- Return the entire reply header as a list of strings to the option
- http_set_authorization(+URL, +Authorization) is det
- Set user/password to supply with URLs that have URL as prefix.
If Authorization is the atom
-, possibly defined
authorization is cleared. For example:
- To be done
- - Move to a separate module, so http_get/3, etc. can use this
- authorization(+URL, -Authorization) is semidet[private]
- True if Authorization must be supplied for URL.
- To be done
- - Cleanup cache if it gets too big.
- parse_url_ex(+URL, -Parts)[private]
- Parts: Scheme, Host, Port, User:Password, RequestURI (no
- parts_scheme(+Parts, -Scheme) is det[private]
- parts_uri(+Parts, -URI) is det[private]
- parts_request_uri(+Parts, -RequestURI) is det[private]
- parts_search(+Parts, -Search) is det[private]
- parts_authority(+Parts, -Authority) is semidet[private]
- iostream:open_hook(+Spec, +Mode, -Stream, -Close, +Options0, -Options) is semidet[multifile]
- Hook implementation that makes open_any/5 support
https URLs for
Mode == read.
- consider_keep_alive(+HeaderLines, +Parts, +Host, +Stream0, -Stream, +Options) is det[private]
- read_incomplete(+In, +Left) is semidet[private]
- If we have not all input from a Keep-alive connection, read the
remainder if it is short. Else, we fail and close the stream.
- keep_connection(+Address) is semidet[private]
- Succeeds if we want to keep the connection open. We currently
keep a maximum of 10 connections waiting and a maximum of 2
waiting for the same address. Connections older than 2 seconds
- http_close_keep_alive(+Address) is det
- Close all keep-alive connections matching Address. Address is of
the form Host:Port. In particular,
closes all currently known keep-alive connections.
- keep_alive_error(+Error, +StreamPair)[private]
- Deal with an error from reusing a keep-alive connection. If the
error is due to an I/O error or end-of-file, fail to backtrack over
get_from_pool/2. Otherwise it is a real error and we thus re-raise
it. In all cases we close StreamPair rather than returning it to the
pool as we may have done a partial read and thus be out of sync wrt.
the HTTP protocol.