A simple library for communicating with publication information servers: pub med and semantic scholar.
Currently allows (a) searching on conjunctions and disjunctions, (b) fetching the details of a paper
(c) the publications citing a paper, (d) publications cited by a paper, (e) simple reporting of fetched information and (f) storing fethed information to local databases.

Since version 0.1 the library supports caching of the paper information on Prolog term or csv data files
and odbc connected or sqlite databases. Also as of 0.1 pub_graph is debug/1 aware. To see information regarding
the progress of execution, use

    ?- debug(pub_graph).

The pack requires the curl executable to be in the path. Only tested on Linux.
It is being developed on SWI-Prolog 6.1.8 and it should also work on Yap Prolog.

To install under SWI simply do

    ?- pack_install(pub_graph).
    % and load with
    ?- use_module(library(pub_graph)).

The storing of paper and citation depends on db_facts and for sqlite connectivity on proSQlite (both available as SWI packs and from http://stoics.org.uk/~nicos/sware/)

- Nicos Angelopoulos
- 0.1.0 2014/7/22 (was pubmed)
- 1.0 2018/9/22
- 1.1 2018/9/23, wrap/hide caching libs errors
See also
- http://stoics.org.uk/~nicos/sware/pub_graph
- http://www.ncbi.nlm.nih.gov/books/NBK25500/
- http://api.semanticscholar.org
- files in examples/ directory
- sources at http://stoics.org.uk/~nicos/sware/pub_graph/
To be done
- currently the info tables are wastefull in the interest of simplicity. Eg they are of the form info(ID,Key,O,Value). But Key is really a type of information. so we could split this to a number of tables (info:)key(Id,O,Value). Alternatively you could make key an enumerate type, which will save loads of space
 pub_graph_id(+Id, -IdType)
True if Id corresponds to a paper identifier from server typed by IdType.
Currently ncbi (https://www.ncbi.nlm.nih.gov/pubmed/) and semscholar (http://semanticscholar.org/) are the known IdTypes.

The predicate does not connect to the server, it only type checks the shape of Id.
If Id is an integer or an atom that can be turned to an integer, then IdType is instantiated to ncbi.
There are three term forms for semscholar.

such as cbd251a03b1a29a94f7348f4f5c2f830ab80a909
presented as, doi:'10.1109/TITB.2002.1006298' (doi is stripped before request is posted)
as, arXiv:1705.10311 (arXiv forms part of the semanticscholar.org request)

The following two ids correspond to the same paper.

    pub_graph_id( 12075665, Type ).

Type = ncbi.

    pub_graph_id( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Type ).

Type = semscholar.
- nicos angelopoulos
- 0.1 2018/9/11
See also
- https://www.ncbi.nlm.nih.gov/pubmed/
- http://semanticscholar.org
 pub_graph_version(+Version, +Date)
Get version information and date of publication.

V = 1:1:0,
D = date(2018, 9, 23).
 pub_graph_search(+STerm, -Ids)
 pub_graph_search(+STerm, -Ids, +Options)
This is currently only implemented for ncbi ids as there is no means for searching in the semantic scholar API.

Search in pub_graph for terms in the search term STerm. In this, conjunction is marked by , (comma) and disjunction by ; (semi-column). '-' pair terms are considered as Key-Value and interpreted as Value[Key] in the query. List are thought to be flat conjoint search terms with no pair values in them which are interpreted by pub_graph also as OR operations. (See example below.) Known keys are : journal, pdat. au, All Fields The predicate constructs a query that is posted via the http API provided by NCBI (http://www.ncbi.nlm.nih.gov/books/NBK25500/).

Options should be a term or list of terms from:

the maximum number of records that will be returned def: 100
if Verbose == true then the predicate is verbose about its progress by, for instance, requesting query is printed on current output stream.
file to use, or when Tmp is variable the file that was used to receive the results from pub_graph.
keep the file with the xml result iff Keep==true
return in QTrans the actual query ran on the the pub_graph server.
When reldate is set to an integer n, ELink returns only those items that have a date specified by datetype within the last n days.
Date range used to limit a link operation by the date specified by datetype. These two parameters (mindate, maxdate) must be used together to specify an arbitrary date range. The general date format is YYYY/MM/DD, and these variants are also allowed: YYYY, YYYY/MM.
see mindate Option For instance, taking an example from the url we show how to find all breast cancer articles that were published in Science in 2008.
    St = (journal=science,[breast,cancer],pdat=2008),
    pub_graph_search( St, Ids, [verbose(true),qtranslation(QTrans)] ),
    length( Ids, Len ), write( number_of:Len ), nl,
    pub_graph_summary_display( Ids, _, display(all) ).

        Author=[Varambally S,Cao Q,Mani RS,Shankar S,Wang X,Ateeq B,Laxman B,Cao X,Jing X,Ramnarayanan K,Brenner JC,Yu J,Kim JH,Han B,Tan P,Kumar-Sinha C,Lonigro RJ,Palanisamy N,Maher CA,Chinnaiyan AM]
        Title=Genomic loss of microRNA-101 leads to overexpression of histone methyltransferase EZH2 in cancer.
        PubDate=2008 Dec 12
        PubType=Journal Article
        FullJournalName=Science (New York, N.Y.)
        Author=Couzin J
        Title=Genetics. DNA test for breast cancer risk draws criticism.
        Author=[Silva JM,Marran K,Parker JS,Silva J,Golding M,Schlabach MR,Elledge SJ,Hannon GJ,Chang K]
        Title=Profiling essential genes in human mammary cells by multiplex RNAi screening.
        PubDate=2008 Feb 1
        PubType=Journal Article
        FullJournalName=Science (New York, N.Y.)
St =  (journal=science, [breast, cancer], pdat=2008),
Ids = ['19008416', '18927361', '18787170', '18487186', '18239126', '18239125'],
QTrans = ['("Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]) AND ("breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]) AND 2008[pdat]'],
Len = 6.

     St = (author='Borst Piet'),
     pub_graph_search( St, Ids, verbose(true) ),
     length( Ids, Len ), write( number_of:Len ), nl.

Date = date(2018, 9, 22),
St =  (author='Borst Piet'),
Ids = ['29894693', '29256493', '28821557', '27021571', '26774285', '26530471', '26515061', '25799992', '25662217'|...],
Len = 83.

    date(Date), pub_graph_search( prolog, Ids ),
    length( Ids, Len ), write( number_of:Len ), nl.

Date = date(2018, 9, 22),
Ids = ['30089663', '28647861', '28486579', '27684214', '27142769', '25509153', '24995073', '22586414', '22462194'|...],
Len = 100.

    date(Date), pub_graph_search( prolog, Ids, retmax(200) ),
    length( Ids, Len ), write( number_of:Len ), nl.

Date = date(2018, 9, 22),
Ids = ['30089663', '28647861', '28486579', '27684214', '27142769', '25509153', '24995073', '22586414', '22462194'|...],
Len = 127.

   St = ('breast','cancer','Publication Type'='Review'),
   date(Date), pub_graph_search( St, Ids, reldate(30) ),
   length( Ids, Len ).

Date = date(2018, 9, 22),
Ids = ['30240898', '30240537', '30240152', '30238542', '30238005', '30237735', '30236642', '30236594', '30234119'|...],
Len = 100.

    pub_graph_summary_display( 30243159, _, true ).
        Author=[Wang K,Yee C,Tam S,Drost L,Chan S,Zaki P,Rico V,Ariello K,Dasios M,Lam H,DeAngelis C,Chow E]
        Title=Prevalence of pain in patients with breast cancer post-treatment: A systematic review.
- nicos angelopoulos
- 0:1 2012/07/15
- 0:2 2018/09/22, small update on \ escape on eutils, ncbi, queries
Short for pub_graph_summary_display( Ids, _Summary, [] ).
 pub_graph_summary_display(+Ids, -Summary)
Short for pub_graph_summary_display( Ids, Summary, [] ).
 pub_graph_summary_display(+IdS, -Summaries, +Opts)
A wrapper around pub_graph_summary_info/3. It call this predicate with same arguments before displaying the Summary information. Opts can be a single term option or a list of such terms. In addition to pub_graph_summary_info/3 options this wrapper also recognises the term:


a list of article information keys that will displayed one on a line for each Id in Ids. Disp values of var(Disp), '*' and 'all', list all available values.
    pub_graph_search((programming,'Prolog'), Ids),
    length( Ids, Len),
    Ids = [A,B,C|_], pub_graph_summary_display( [A,B,C] ).

    Author=[Holmes IH,Mungall CJ]
    Title=BioMake: a GNU make-compatible utility for declarative workflow management.
    Author=[Melioli G,Spenser C,Reggiardo G,Passalacqua G,Compalati E,Rogkakou A,Riccio AM,Di Leo E,Nettis E,Canonica GW]
    Title=Allergenius, an expert system for the interpretation of allergen microarray results.
    Author=[Mørk S,Holmes I]
    Title=Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.
Date = date(2018, 9, 22),
Ids = ['28486579', '24995073', '22215819', '21980276', '15360781', '11809317', '9783213', '9293715', '9390313'|...],
Len = 43.
A = '28486579',
B = '24995073',
C = '22215819'.
    pub_graph_summary_display( 30235570, _, display(*) ).

    Author=[Morgan CC,Huyck S,Jenkins M,Chen L,Bedding A,Coffey CS,Gaydos B,Wathen JK]
    Title=Adaptive Design: Results of 2012 Survey on Perception and Use.
    Source=Ther Innov Regul Sci
    PubDate=2014 Jul
    PubType=Journal Article
    FullJournalName=Therapeutic innovation & regulatory science
     pub_graph_cited_by( 20195494, These ),
     pub_graph_summary_display( These, _, [display(['Title','Author','PubDate'])] ).

    Author=[Tang K,Boudreau CG,Brown CM,Khadra A]
    Title=Paxillin phosphorylation at serine 273 and its effects on Rac, Rho and adhesion dynamics.
    PubDate=2018 Jul
    Author=[McKenzie M,Ha SM,Rammohan A,Radhakrishnan R,Ramakrishnan N]
    Title=Multivalent Binding of a Ligand-Coated Particle: Role of Shape, Size, and Ligand Heterogeneity.
    PubDate=2018 Apr 24
    Author=[Padmanabhan P,Goodhill GJ]
    Title=Axon growth regulation by a bistable molecular switch.
    PubDate=2018 Apr 25
    Author=[Welf ES,Haugh JM]
    Title=Stochastic Dynamics of Membrane Protrusion Mediated by the DOCK180/Rac Pathway in Migrating Cells.
    PubDate=2010 Mar 1
These = [29975690, 29694862, 29669897, 28752950, 27939309, 27588610, 27276271, 25969948, 25904526|...].

    pub_graph_summary_display( 20195494, _Res, true ).

    Author=[Cirit M,Krajcovic M,Choi CK,Welf ES,Horwitz AF,Haugh JM]
    Title=Stochastic model of integrin-mediated signaling and adhesion dynamics at the leading edges of migrating cells.

    pub_graph_summary_display( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, _, display(all) ).

        authors=[Graham J. L. Kemp,Nicos Angelopoulos,Peter M. D. Gray]
        title=Architecture of a mediator for a bioinformatics database federation
        venue=IEEE Transactions on Information Technology in Biomedicine
 pub_graph_summary_display_info(+Summaries, +Entries)
Display the Entries information for Summaries, which should be a list of summaries. If Entries is a variable all info will be printed.
 pub_graph_cited_by(+Id, -Ids)
 pub_graph_cited_by(+Id, -Ids, +Options)
Ids is the list of pub_graph ids that cite Id.

Options is a term option or list of terms from the following;

be verbose
cache(Type, Handle, Date, Update)
use cache with Handle and Type, cutting off cached items that are (strictly) older than Date. For Update = true update the cache if you do an explicit retrieval.
     date(D), pub_graph_cited_by( 12075665, By ), length( By, Len ).

D = date(2018, 9, 22),
By = [25825659, 19497389, 19458771],
Len = 3.

    date(D), pub_graph_cited_by( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, By ), length( By, Len ).

D = date(2018, 9, 22),
By = ['2e1f686c2357cead711c8db034ff9aa2b7509621', '6f125881788967e1eec87e78b3d2db61d1a8d0ac'|...],
Len = 12.
 pub_graph_cites(+Id, -Ids)
 pub_graph_cites(+Id, -Ids, +Options)
Ids is the list of pub_graph Ids that are cited by Id.

Options is a term option or list of terms from the following;

be verbose
    pub_graph_cites( 20195494, Ids ),
    length( Ids, Len ), write( D:Len ), nl.

D = date(2018, 9, 22),
Ids = ['19160484', '19118212', '18955554', '18800171', '18586481'|...],
Len = 38.

% pubmed does not have references cited by the following paper:

    pub_graph_cites( 12075665, Ids ),
    length( Ids, Len ), write( D:Len ), nl.


% whereas, semanticscholar.org finds 17 (non '') of the 21:
    pub_graph_cites( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Ids ),
    length( Ids, Len ), write( D:Len ), nl.

D = date(2018, 9, 22),
Ids = ['6477792829dd059c7d318927858d307347c54c2e', '1448901572d1afd0019c86c42288108a94f1fb25', |...],
Len = 17.

    pub_graph_summary_display( 12075665, Results, true ).

    Author=[Kemp GJ,Angelopoulos N,Gray PM]
    Title=Architecture of a mediator for a bioinformatics database federation.
Results = [12075665-['Author'-['Kemp GJ', 'Angelopoulos N', 'Gray PM'], ... - ...|...]].
 pub_graph_table(+Ids, -Rows, +Opts)
fixme Assumes jif predicate.


  • include_if(IF=false) whether to include Impact Factor (IF) column (if true requires jif/6).
  • missing_if(MIF=throw) what to do when a journal has no impact factor: [throw,has(Val),quite(Val)].
  • output(Type=html) type of output, if file is expected (see stem), in [csv,?pdf?,html]
  • search(Search='No search term available'), search term corresponding to the Ids
  • spy(Spy=[]) A number of ids to spy (should be atomic).
  • stem(Stem) when present a file <Stem>.<Type> is created.

Output rows should contain #citing, [IF ,] Date, Journal, Title, Author, (Title urled to pubmed/$id)

 pub_graph_summary_info(+IdS, -Summaries, +Opts)
Summaries is the summary information for pub_graph id(s) IdS.
The form of results depends on whether IdS is a single PubMed Id,
in which case Summaries is a list of Name-Value pairs.
Whereas, when IdS is a list, Summaries is a list Id-Info pairs, where Info
is a Name-Value list. The predicate fetches the information with curn
via the http interface Summaries are deposited in local temporary files which are subsequently parsed.

Options is a single term, or list of the following terms:

list of info slot names to be found in the xml file
the maximum number of records that will be returned
temporary file to be used for saving xml files. If Tmp is a variable, or option is missing, a temporary file is created with tmp_file_stream/3.
if true, keep the temporary xml file, otherwise, and by default, delete it.
When true be verbose.
cache(Type, Handle, Update)
Use a cache with Type and Handle. Update should be boolean, set to false if you dont want the cache to be updated with newly downloaded information.
  Opts = names(['Author','PmcRefCount','Title']),
  pub_graph_summary_info( 12075665, Results, Opts ),
  write( date:Date ), nl,
  member( R, Results ), write( R ), nl,

Author-[Kemp GJ,Angelopoulos N,Gray PM]
Title-Architecture of a mediator for a bioinformatics database federation.

    member(R,Res), write( R ), nl,

Author-[Kemp GJ,Angelopoulos N,Gray PM]
Title-Architecture of a mediator for a bioinformatics database federation.
Source-IEEE Trans Inf Technol Biomed
PubDate-2002 Jun
PubType-Journal Article
FullJournalName-IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society

    pub_graph_summary_info( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Res, true ),
    member( R, Res ), write( R ), nl,

authors-[Graham J. L. Kemp,Nicos Angelopoulos,Peter M. D. Gray]
title-Architecture of a mediator for a bioinformatics database federation
venue-IEEE Transactions on Information Technology in Biomedicine
 pub_graph_abstracts(+IdS, -IdsAbs)
For a list of IdS get all their respective IdAbs (ID-Abstracts) pairs. If IdS is a single PubMed Id then IDsAbs is simply the abstract (not a pair). Abstracts are returned as lists of atom, representing lines in the original reply.
  ?- pub_graph_abstracts( 24939894, Abs ).
  Abs = ['Lemur tyrosine kinase 3 (LMTK3) is associated with cell proliferation and',...].
To be done
- add option for returning the full response of the querny (includes sections for, Citation, Title, Aurhors, Affiliation and PMCID if one exists (last is in PMID section).
 pub_graph_cited_by_graph(+Ids, -Graph, +Opts)
Graph of all ancestors reaching Ids within Depth moves. The graph grows upwards from the roots (Ids) to find the papers that cite the growing bag of papers recursively.

Options is a single term, or list of the following terms:

use cache of Type. Type == false or absent to turn caching off
if using cache, which location should be used
if caching is used, at what date do cache expires. Default: 1 month ago.
maximum depth to chase
superseed the extension on Object.
Boolean value. If csv cited_by should be one per line or of the form Id1;Id2;
should the input cache be imported as flat (def. = Flat).
prints progress messages if true

Type is one of csv,prolog,sqlite and odbc. In the first 3 cases, Object should be a filename and for odbc it should be a DSN token. In the case of filenames, the default value for Object is formed as, <type>_<id1>{_<id2>}.<type_ext>. <type_ext> is either set to Ext or if this is missing it is deduced from Type. It can be set to '' if you want no extension added.

Graph is compatible with the graph representation of Prolog unweighted graphs. That is, all vertices should appear in a keysorted list as V-Ns pairs, where V is the vertex and Ns is the sorted list of all its neighbours. Ns is the empty list if V has no neighbours, although this should only be the case here, if one of the input Ids has no citing papers or for the nodes at the edge of Depth.

     pub_graph_cited_by_graph( 12075665, G, cache(sqlite) ).
 pub_graph_cited_by_treadmill(+Ids, -Graph, +Opts)
Use iterative increase of depth limit on pumed_cited_by_graph/3 with until to the overall Depth is reached. Results are saved to a cache file before proceeding to rerun the whole thing with an unit increase on the depth limit. Previous results will be fished out from the cache files.

Options is a single term or list of the following:

  • file(File) file to use for storage
  • single_file(Single) boolean value, def. is true.
    if false seperate (aggregating) files are created at each iteration
  • depth(D) the overall depth limit
To be done
- use ODBC
 pub_graph_cache_open(+Type, +File, +Which, -Handle, Opts)
Open a pub_graph File of a given Type. A Handle is returned if appropriate. Currently csv,prolog,odbc and sqlite files are recognised. The former two are consulted into module pub_graph_cache, and Handle is therofore not used. For odbc/sqlite files the lookups and database access is via the odbc and prosqlite libraries respectively. Handle can be named to an alias of choise, otherwise a opaque atom is returned with which the db is accessed. Which, should either be cited_by or info .

Options is a term or list of terms from:

  • ext(Ext) extension to try on the file. Use the empty atom if you do not want the library to use the default extension for the type of cache used.

Options are also passed to the underlying open operations for the type chosen. So for instance you can provide the username and passward for the odbc connection with user(U) and password(P).

 pub_graph_cache_save(+Type, +FileinORHandle, What, Opts)
Close or save a cache to a file. Currently Types csv, `prolog', `odbc' and `sqlite' are recognised. In the case of prolog, the list of predicates What is dumped to the prolog file Filein. Likewise for `csv' but as data rows. The predicates are looked for in module `pub_graph_cache'. Once the preds are saved, their retracted from memory.

Opts a term or list of terms from:

should csv and prolog rows be compressed by third argument ?

Undocumented predicates

The following predicates are exported, but not or incorrectly documented.

 pub_graph_cited_by(Arg1, Arg2, Arg3)
 pub_graph_cites(Arg1, Arg2, Arg3)
 pub_graph_search(Arg1, Arg2, Arg3)