This module provides the ability to formulate queries to the
Sindice semantic web search engine, and to analyse the results
obtained. It is based on an original module by Yves Raimond,
but mostly rewritten by Samer Abdallah.
Sindice queries have serveral components:
- A keyword based query, which may use + and - operators
to mark terms that are required or must be excluded.
It may also use boolean operators AND, OR and NOT, though
note that NOT has the semantics of set difference, not
the set complement. NOT is a binary operator in Sindice
queries.
- A triple based query which is built using Boolean operators
from RDF triples. In this queries, a '*' denotes an
constrained URI or literal
- One or more filters, which specify certain simple test
to be applied to the returned objects.
Other parameters determine what and how much information is returned:
- The page parameter determines which page of a multipage query is returned.
- The sortbydate paramater affects the order of results (the default
is to sort by relevance).
- The field parameter determines what information is returned about
each object.
Results
Results are retreived as a named RDF graph.
To interpret this, it is necessary to understand the Sindice
ontology. The results consist of a set of resources of the class
sindice:Result. Each item has the following properties:
- dc:title :: literal
- dc:created
- sindice:cache
- sindice:link :: url
- sindice:rank :: xsd:integer
- sindice:explicit_content_length
- sindice:explicit_content_size
- si_field:format
- si_field:class
- si_field:ontology
- si_field:property
As well as information about each item, the results also contain
data about the search itself, which is represented as a resource of
class sindice:Query, and data about the returned page, represented as
a resource of class sindice:Page.
The sindice:Query has the following properties
- sindice:totalResults :: xsd:integer
- dc:title
- dc:creator
- dc:date
- sindice:searchTerms :: literal
- sindice:totalResults ::
literal(integer)
- sindice:itemsPerPage ::
literal(integer)
- sindice:first :: sindice:Page
- sindice:last :: sindice:Page
- result :: sindiceResult [nondet]
The sindice:Page has the following properties:
- dc:title :: literal
- sindice:next :: sindice:Page
- sindice:previous :: sindice:Page
- sindice:startIndex :: literal
Running queries
The core predicate for running a Sindice query is si_with_graph/4, which formulates
a query from a term of type si_request
and a list of options, and then
loads into the RDF store, temporarily, a named graph containing the results.
The last argument to si_with_graph/4 is a goal which is called with the results
graph in context. The graph is only available to this goal, and is unloaded after
si_with_graph/4 finished. You may use any RDF-related predicates to interrogate
the graph.
On top of this is built a high-abstraction: si_with_result/5, which hides the
details of large, multi-page result sets and calls a supplied goal once (disjunctively)
for each result, automatically issuing multiple Sindice requests to iterate through
multiple pages. You may interrogate the properties of each result only within
the supplied goal. For convenience, the si_facet/2 allows a number of properties
to be extracted from the RDF graph with type conversions from
RDF literals to Prolog values where appropriate.
Building queries
The three main parts of a Sindice query are represented by a term of
type si_request
, which has several forms. Currently, these
are
si_request ---> keyword(atom)
; keywords(list(atom))
; uri(resource).
A resource can be an atomic URI or a Prefix:Suffix term as understood
by rdf_global_id/2. Eventually, Sindice's full query syntax, including
ntriple queries and Boolean operators, will be implemented.
@seealso
http://sindice.com/
http://sindice.com/developers/queryLanguage#QueryLanguage
Samer Abdallah, UCL, University of London;
Yves Raimond, C4DM, Queen Mary, University of London
- sindice_url(+Req:si_request, +Opts:options, -URL:atom) is det
- Formulates a Sindice query URL from a request and options.
Recognised options:
- sort_by_date(B:boolean)
- If true, then results are sorted by date rather than relevance
- fields(F:list(atom))
- Specify which fields are returned for each result.
- count(P:nonneg)
- Number of results per page. (Incompatible with from option.)
- page(P:nonneg)
- Request a given page number. (Incompatible with from option.)
- from(Offset:nonneg, Count:nonneg)
- Starts from result number Offset+1, with Count results per page.
Incompatible with count and page options.
The resulting URL can be loaded with rdf_load/2.
- si_with_graph(+Req:si_request, +Opts:options, -Graph:atom, +Goal:callable) is det
- Formulates a Sindice query and temporarily loads the resulting RDF graph.
Graph must be a variable; it is unified with the name of the loaded graph
and then Goal is called. The graph is not available outside Goal.
- si_with_result(+Req:si_request, +Opts:options, -Prog:progress, -R:resource, +Goal:callable) is nondet
- For each result produced by the query, R is unified with the URI of the sindice:Result
and Goal is called. Multi-page result sets are traversed automatically and on demand.
The graph containing the query results is not available outside Goal and is unloaded
when si_with_result/5 is finished. Progress is a term of the form Current/Total, where Total
is the total number of results and Current is the index of the result currently bound to R.
- si_facet(-R:resource, -F:si_facet) is nondet
- True when search result R has facet F.
Current facets are:
si_facet ---> link(url)
; cache(url)
; rank(nonneg)
; title(atom)
; class(resource)
; predicate(resource)
; formats(list(atom))
; explicit_content_size(nonneg)
; explicit_content_length(nonneg)
.