Did you know ... | Search Documentation: |
XML documents |
The parser can operate in two modes: sgml
mode and xml
mode, as defined by the dialect(Dialect)
option. Regardless
of this option, if the first line of the document reads as below, the
parser is switched automatically into XML mode.
<?xml ... ?>
Currently switching to XML mode implies:
<element [attribute...] />
is
recognised as an empty element.
lt
(<
), gt
(>
), amp
(&
), apos
('
) and quot
("
).
ELEMENT
, etc.).
_
) and colon (:
) are
allowed in names.
preserve
. In addition to setting
white-space handling at the toplevel the XML reserved attribute
xml:space
is honoured. It may appear both in the document
and the DTD. The remove
extension is honoured as
xml:space
value. For example, the DTD statement below
ensures that the pre
element preserves space, regardless of
the default processing mode.
<!ATTLIST pre xml:space nmtoken #fixed preserve>
Using the dialect xmlns
, the parser will
interpret XML namespaces. In this case, the names of elements are
returned as a term of the format
URL:
LocalName
If an identifier has no namespace and there is no default namespace it is returned as a simple atom. If an identifier has a namespace but this namespace is undeclared, the namespace name rather than the related URL is returned.
Attributes declaring namespaces (xmlns:<ns>=<url>
)
are reported as if xmlns
were not a defined resource.
In many cases, getting attribute-names as url:name is not desirable. Such terms are hard to unify and sometimes multiple URLs may be mapped to the same identifier. This may happen due to poor version management, poor standardisation or because the the application doesn't care too much about versions. This package defines two call-backs that can be set using set_sgml_parser/2 to deal with this problem.
The call-back xmlns
is called as XML namespaces are
noticed. It can be used to extend a canonical mapping for later use by
the urlns
call-back. The following illustrates this
behaviour. Any namespace containing rdf-syntax
in its URL
or that is used as
rdf
namespace is canonicalised to rdf
. This
implies that any attribute and element name from the RDF namespace
appears as
rdf:<name>
:- dynamic xmlns/3. on_xmlns(rdf, URL, _Parser) :- !, asserta(xmlns(URL, rdf, _)). on_xmlns(_, URL, _Parser) :- sub_atom(URL, _, _, _, 'rdf-syntax'), !, asserta(xmlns(URL, rdf, _)). load_rdf_xml(File, Term) :- load_structure(File, Term, [ dialect(xmlns), call(xmlns, on_xmlns), call(urlns, xmlns) ]).
The library provides iri_xml_namespace/3 to break down an IRI into its namespace and localname:
#
or /
. Note however that
this can produce unexpected results. E.g., in the example below, one
might expect the namespace to be http://example.com/images\#,
but an XML name cannot start with a digit.
?- iri_xml_namespace('http://example.com/images#12345', NS, L). NS = 'http://example.com/images#12345', L = ''.
As we see from the example above, the Localname can be the empty atom. Similarly, Namespace can be the empty atom if IRI is an XML name. Applications will often have to check for either or both these conditions. We decided against failing in these conditions because the application typically wants to know which of the two conditions (empty namespace or empty localname) holds. This predicate is often used for generating RDF/XML from an RDF graph.