Discussion:
Getting more out of Freebase with XQuery
(too old to reply)
Ihe Onwuka
2014-03-31 01:02:00 UTC
Permalink
This is a follow on from

http://en.wikibooks.org/wiki/XQuery/Freebase

which originated from a problem Michael Westbay assisted me with.

Again it illustrates how to obtain information from Freebase via it's
MQL language (it predated Sparql).

The previous query was taken from

https://developers.google.com/freebase/v1/mql-overview

and it limits the data that results from the call to the Freebase API.
You can see a limit parameter being set to 3 below in the API call.

https://www.googleapis.com/freebase/v1/mqlread?query=[{"type":"/music/album","name":null,"artist":{"id":"/en/bob_dylan"},"limit":3}]&cursor

If you do not specify a limit with your API call, Freebase will impose
a limit of 100 records on your query. This message addresses the
question of how to get everything.

The key to doing this is dangling at the end of the above API call,
it's the cursor parameter and it's usage is discussed with an example
here

https://developers.google.com/freebase/v1/mql-overview#querying-with-cursor-paging-results

To summarise you ask for a cursor (see the example API call above for
the form of the initial request) to be returned with your query
results which acts as a link to the next set of query results. You
obtain that next set by supplying the value of the cursor returned
from the previous invocation. Along with that next set you get another
cursor that points to the set after that. When the final set of
results are retrieved the cursor returns a string value of false (the
Freebase overview has this in upper case but my code used lower case
'false' and that works).

The overview has sample Python code which I have not tried or parsed
in anger but which I believe invokes libraries that take care of all
the cursor handling for you.

https://developers.google.com/freebase/v1/mql-overview#looping-through-cursor-results

However the same thing can easily be achieved from XQuery with a
little bit of tail recursion.

We will use as an example a MQL query that returns all films with
their netflix_id's.

[{
"type": "/film/film",
"name": null,
"netflix_id": []
}]

A few brief comments about MQL. You ask for something by giving the
field name and a value null. Null gets replaced by the actual value.
However if the field can have multiple values MQL will return an array
and cause your null query to error. This may happen even when you are
expecting a singular value so you can avoid this problem by using the
symbol for an empty array instead of null as in the query above.

You can paste the query above into

http://www.freebase.com/query

to see the results (we will take care of the cursor in the code example).

Now to the code, which assumes XQuery 3.0

xquery version "3.0";

import module namespace xqjson="http://xqilla.sourceforge.net/lib/xqjson";

Freebase returns JSON but we want to store this in an xml db so we use
the above package for json to XML conversion. From eXist you can
install the package by just clicking it on the eXist Package Manager
which you can get to from the eXist Dashboard.

We declare a variable for our query.

declare variable $mqlQuery {'[{
"type": "/film/film",
"name": null,
"netflix_id": []
}]'};

declare variable $freebase {'https://www.googleapis.com/freebase/v1/mqlread'};
declare variable $key {obtain an API key from freebase and puts it's
value here'};

Here is a link to a blog that describes the process of obtaining a
Freebase API key
http://anchetawern.github.io/blog/2013/02/11/getting-started-with-freebase-api/

Since we are going to be doing tail recursion we need to put the API
call in a function. Lets start with the function signature.

declare function local:freebaseCall($cursor as xs:string,$i as xs:integer)

2 parameters the first is the cursor and the second an integer which I
use to provision an auto-incremented unique file name an to tell me
how many records were loaded at the end - since there are a 100
records per API call it's ( $i - 1) * 100 + the number of records
returned by the final cursor. This function will make the API call and
store the results in the db

{
if ($cursor eq 'false')

termination condition

then $i || ' pages loaded' else

let $params := ('query=' || encode-for-uri($mqlQuery), 'key=' ||
$key, 'cursor=' || encode-for-uri($cursor))

Above uri encodes the parameters to the API call - we have three the
MQL query, the API key and the cursor

let $href := $freebase || '?' || string-join($params, '&')

This constructs the API call - again thanks to Michael Westbay for
showing the correct way to do this by string joining the parameters
with a separator of &

let $responses :=
http:send-request(<http:request href="{$href}" method="get"/>)

Make the API call.

return if ($responses[1]/@status ne '200')
then <failure
href="{xmldb:decode-uri(xs:anyURI($href))}">{$responses[1]}</failure>
else let $jsonResponse:= util:base64-decode($responses[2])

Standard EXPATH http error checking - don't forget to base64 decode
the body of the response.

let $freebaseXML:= xqjson:parse-json($jsonResponse)

Convert the returned JSON to XML because we are going to construct an
http PUT to store it in our xml db.

let $movieData := http:send-request(<http:request

href="{concat(path to store the data in your repostiory,$i,'.xml')}"
username="username"
password="password"
auth-method="basic"
send-authorization="true"
method="put">
<http:body
media-type="application/xml"/>
</http:request>,
(),
<batch
cursor="{$cursor}">{transform:transform($freebaseXML,doc(identity.xsl'),())}</batch>)

Standard EXPATH PUT request. On the last line we are wrapping the
returned XML with an element that carries the value of the cursor that
was used to obtain the page. Identity.xsl is of course the standard
XSLT identity transform, you can use it as a placeholder for the
insertion of your own custom transform.

return
local:freebaseCall($freebaseXML//data(pair[@name="cursor"]), $i + 1)

Finally the tail recursive call. We extract the cursor from the
returned JSON for parameter 1 and increment $i to give us a unique
document name for the next page to store.

};

Don't forget your closing curly brace and the attendant semi colon

One last thing to kick it all off pass the null string as the initial
cursor value and initialise your counter

local:freebaseCall('',1)

One last thing. I experienced repeated eXist crashes when running
this. Note that you can prime the function call with a cursor to pick
up from where you left off. That way you'll get to the end.




return
local:freebaseCall($freebaseXML//data(pair[@name="cursor"]), $i + 1)
};
Joe Wicentowski
2014-03-31 20:26:44 UTC
Permalink
Hi Ihe,
Post by Ihe Onwuka
http:send-request(<http:request href="{$href}" method="get"/>)
...
Post by Ihe Onwuka
One last thing. I experienced repeated eXist crashes when running
this.
Check to make sure you're closing your HTTP connection. See
http://markmail.org/message/3opzcgyzv6auevkt for the solution; the
whole thread is edifying though.

Joe
Ihe Onwuka
2014-04-01 06:44:37 UTC
Permalink
Post by Joe Wicentowski
Hi Ihe,
Post by Ihe Onwuka
http:send-request(<http:request href="{$href}" method="get"/>)
...
Post by Ihe Onwuka
One last thing. I experienced repeated eXist crashes when running
this.
Check to make sure you're closing your HTTP connection. See
http://markmail.org/message/3opzcgyzv6auevkt for the solution; the
whole thread is edifying though.
Will investigate - thanks.

A couple of dilemmas both related.

The return type of the function freebaseCall is string..... would it
be helpful to specify that given that it has a side-effect (HTTP Put)

One for the RESTFUL purists that might object to a GET that has
side-effects. What HTTP method should be used to invoke this XQuery?

On reflection the function could probably use a better name.
Ihe Onwuka
2014-04-13 05:53:43 UTC
Permalink
The wiki has been updated. The sample code now closes the HTTP
connection following a PUT (thanks to Joe Wiz for pointing this out)
but it does not stop the crashes - this may be an issue specific to
eXist.

Thanks to Dan M for organising the layout of the update. He had final
editorial say so so if there are any mistakes please blame him.

http://en.wikibooks.org/w/index.php?title=XQuery/Freebase&stable=0#Using_Cursors_to_Get_Additional_Data
The purpose of PUT is to update a resource. You may not expect GET to have
side effects, if it does it's irrelevant to the client. If the server state
varies, use POST.
Like how you say REST purists... :-)
Post by Ihe Onwuka
Post by Joe Wicentowski
Hi Ihe,
Post by Ihe Onwuka
http:send-request(<http:request href="{$href}" method="get"/>)
...
Post by Ihe Onwuka
One last thing. I experienced repeated eXist crashes when running
this.
Check to make sure you're closing your HTTP connection. See
http://markmail.org/message/3opzcgyzv6auevkt for the solution; the
whole thread is edifying though.
Will investigate - thanks.
A couple of dilemmas both related.
The return type of the function freebaseCall is string..... would it
be helpful to specify that given that it has a side-effect (HTTP Put)
One for the RESTFUL purists that might object to a GET that has
side-effects. What HTTP method should be used to invoke this XQuery?
On reflection the function could probably use a better name.
------------------------------------------------------------------------------
_______________________________________________
Exist-open mailing list
https://lists.sourceforge.net/lists/listinfo/exist-open
Continue reading on narkive:
Loading...