Matrix Multiplication (JSONiq)

Discussion:

Hermann Stamm-Wilbrandt

2014-02-01 23:33:21 UTC

Last Sylvester there was a thread on Matrix Multiplication in XQuery:
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk+matrix
+multiplication

The biggest problem identified was that XQuery does not allow for efficient
representation of 2+dimensional arrays.

JSON does provide 2+dimensional arrays for free.

I did not measure performance yet, but this JSONiq script looks very
similar to what would be done in C:
http://try.zorba.io/queries/xquery/jFd3Q8f82HuZGzcYDzQpdN4SdfY=

declare variable $A := [
[1,2],
[3,4],
[5,6],
[7,8]
];
declare variable $B := [
[1,2,3],
[4,5,6]
];

[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($B(1))) return
fn:sum(
for $j in 1 to count(jn:members($B)) return
$A($i)($j) * $B($j)($k)
)
]
]

And much simpler than in XSLT:
http://rosettacode.org/wiki/Matrix_multiplication#XSLT_1.0:

Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
http://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

David Carlisle

2014-02-02 00:44:35 UTC

Permalink

Post by Hermann Stamm-Wilbrandt
Last Sylvester there was a thread on Matrix Multiplication in
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk+matrix+multiplication
The biggest problem identified was that XQuery does not allow for
efficient representation of 2+dimensional arrays.

well it does, but as in other languages that only really have 1-D arrays
(or languages such as C or Fortran where 2D arrays are a thin veneer
over 1D arrays), you need to store a 2 D array as a 1D array with an
additional integer giving the stride or leading dimension (in row or
column order). To keep the arrays self contained I stored this as the
first item in each sequence in the example below.

Post by Hermann Stamm-Wilbrandt
JSON does provide 2+dimensional arrays for free.
I did not measure performance yet, but this JSONiq script looks very
http://try.zorba.io/queries/xquery/jFd3Q8f82HuZGzcYDzQpdN4SdfY=
declare variable $A := [ [1,2], [3,4], [5,6], [7,8] ]; declare
variable $B := [ [1,2,3], [4,5,6] ];
[ for $i in 1 to count(jn:members($A)) return [ for $k in 1 to
count(jn:members($B(1))) return fn:sum( for $j in 1 to
count(jn:members($B)) return $A($i)($j) * $B($j)($k) ) ] ]

In Xquery 1 you could do

let $a:=(2, (:2 columns :)
1,2,
3,4,
5,6,
7,8),

$b:=(3, (:3 columns :)
1,2,3,
4,5,6)

return

(
$b[1],
for $i in 1 to xs:int((count($a) -1) div $a[1]),
$j in 1 to xs:int($b[1])
return
sum(
for $k in 1 to xs:int($a[1]) return
($a[($i -1)*$a[1]+$k+1] * $b[($k -1)*$b[1]+$j+1])
)
)

which produces

3 (:3 columns :)
9 12 15
19 26 33
29 40 51
39 54 69
As the above is in fact only Xpath 2, you could do the identical
expression in XSLT 2

David

Hermann Stamm-Wilbrandt

2014-02-03 10:56:08 UTC

Permalink

Thanks for your XQuery 1-dimensional sample.

There are 4 XQuery with JSONiq implemenations named on jsoniq.org:
28.io, Zorba, IBM Websphere DataPower Integration Appliance and Pascal
XQuery engine

It seems not to be that easy to measure runtime.
Since http://try.zorba.io allows to share and run code I used that.
The method I found was to place Zorba's datetime:current-time() in result
sequence as first and last elements.
And the matrix multiplications need to be executed often to result in
measurable times (I did use 10.000.000).

These are JSONiq (1) and your XQuery (2) implemenations:
http://try.zorba.io/queries/xquery/vq+kL9tWK+jmntDZz0oxDcyrypA=
http://try.zorba.io/queries/xquery/NIlfOIBmkdvt8+2zNmvM8Hf1+bo=

The times reported are quite different although run on same processor:
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)

Yes, that is only result for one processor. But I would assume even (much)
bigger differences in case the matrix dimensions become bigger and not toy
like as in the examples.

(1)
import module namespace datetime =
"http://www.zorba-xquery.com/modules/datetime";

declare variable $A := [
[1,2],
[3,4],
[5,6],
[7,8]
];
declare variable $B := [
[1,2,3],
[4,5,6]
];
declare variable $N := 10000000;

let $R := ( datetime:current-time(),
for $h in 1 to $N return
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($B(1))) return
fn:sum(
for $j in 1 to count(jn:members($B)) return
$A($i)($j) * $B($j)($k)
)
]
]
, datetime:current-time() )

return $R[count($R)] - $R[1]

(2)
import module namespace datetime =
"http://www.zorba-xquery.com/modules/datetime";

declare variable $a:=(2, (:2 columns :) 1,2, 3,4, 5,6, 7,8);
declare variable $b:=(3, (:3 columns :) 1,2,3, 4,5,6);
declare variable $N := 10000000;

let $R := ( datetime:current-time(),
for $h in 1 to $N return
( $b[1], for $i in 1 to xs:int((count($a) -1) div $a[1]), $j in 1 to
xs:int($b[1]) return
sum( for $k in 1 to xs:int($a[1]) return ($a[($i -1)*$a[1]+$k+1] * $b
[($k -1)*$b[1]+$j+1]) ) )
, datetime:current-time() )

return $R[count($R)] - $R[1]

Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
http://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

From: David Carlisle <davidc-ueUsWbEuBr1aa/***@public.gmane.org>

To: Hermann Stamm-Wilbrandt/Germany/***@IBMDE, talk-***@public.gmane.org,

Date: 02/02/2014 01:44 AM

Subject: Re: [xquery-talk] Matrix Multiplication (JSONiq)

Post by Hermann Stamm-Wilbrandt
Last Sylvester there was a thread on Matrix Multiplication in
XQuery
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk

+matrix+multiplication

Post by Hermann Stamm-Wilbrandt
The biggest problem identified was that XQuery does not allow for
efficient representation of 2+dimensional arrays.

David Carlisle

2014-02-03 11:13:52 UTC

Permalink

Post by Hermann Stamm-Wilbrandt
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)

ooh interesting , I wonder where the bottleneck in the xquery is.
Probably as Michael commented at some point earlier in the thread, the
time to access the ith element of a sequence $a[$i].

But the language doesn't _need_ to change, just if more people did it
the xquery compilers would perhaps look out for sequences that are
exclusively accessed via numeric filters and implement them in a way
that gives constant time access. Having a separate array type does give
them a big hint though:-)

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________

jean-marc Mercier

2014-02-03 11:30:45 UTC

Permalink

Hi all,

I've tried the following JSON query with zorba, mimicking a NxN, with
N=200, matrix multiplications. Time is 10 sec on http://try.zorba.io/,
behaving with a cubic N^3 complexitity.
Do you really want to know what are the performances of standard linear
algebra library for such matrix multiplications ?

import module namespace datetime = "
http://www.zorba-xquery.com/modules/datetime";

declare variable $size := 200;

declare variable $A := [ for $i in 1 to $size return
[
for $j in 1 to $size return $i*$size+$j
]
];

let $R := ( datetime:current-time(),
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($A)) return
fn:sum(
for $j in 1 to count(jn:members($A)) return
$A($i)($j) * $A($j)($k)
)
]
]
, datetime:current-time() )

return $R[count($R)] - $R[1]

Post by David Carlisle

Post by Hermann Stamm-Wilbrandt
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)

ooh interesting , I wonder where the bottleneck in the xquery is.
Probably as Michael commented at some point earlier in the thread, the
time to access the ith element of a sequence $a[$i].
But the language doesn't _need_ to change, just if more people did it
the xquery compilers would perhaps look out for sequences that are
exclusively accessed via numeric filters and implement them in a way
that gives constant time access. Having a separate array type does give
them a big hint though:-)
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ______________________________
__________________________________________
_______________________________________________
http://x-query.com/mailman/listinfo/talk

Ghislain Fourny

2014-02-03 12:55:58 UTC

Permalink

Hi,

With a naive parallelism approach (on $i and $j), I think it's possible to bring it down to O(N) with a constant cost (in dollars), assuming "full cloud elasticity" (i.e., the number of instances that can be triggered up is not the bottleneck).

The 28.io platform should supports this (disclaimer: it's my employer).

Kind regards,
Ghislain

Post by jean-marc Mercier
Hi all,
I've tried the following JSON query with zorba, mimicking a NxN, with N=200, matrix multiplications. Time is 10 sec on http://try.zorba.io/, behaving with a cubic N^3 complexitity.
Do you really want to know what are the performances of standard linear algebra library for such matrix multiplications ?
import module namespace datetime = "http://www.zorba-xquery.com/modules/datetime";
declare variable $size := 200;
declare variable $A := [ for $i in 1 to $size return
[
for $j in 1 to $size return $i*$size+$j
]
];
let $R := ( datetime:current-time(),
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($A)) return
fn:sum(
for $j in 1 to count(jn:members($A)) return
$A($i)($j) * $A($j)($k)
)
]
]
, datetime:current-time() )
return $R[count($R)] - $R[1]
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)
ooh interesting , I wonder where the bottleneck in the xquery is.
Probably as Michael commented at some point earlier in the thread, the
time to access the ith element of a sequence $a[$i].
But the language doesn't _need_ to change, just if more people did it
the xquery compilers would perhaps look out for sequences that are
exclusively accessed via numeric filters and implement them in a way
that gives constant time access. Having a separate array type does give
them a big hint though:-)
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk

jean-marc Mercier

2014-02-03 13:09:54 UTC

Permalink

Well, I am sure that these algorithms can be parallelized, using 28.io, or
other vendor solutions.
However, honestly, today, you would have to spend tons of processors,
watts, intelligence, to perform a matrix multiplication over a 4000x4000
matrix with JSONiq or XQUERY 3.0.
I can compute such a multiplication within a second using BLAS over a
single threaded process.

Post by Ghislain Fourny
Hi,
With a naive parallelism approach (on $i and $j), I think it's possible to
bring it down to O(N) with a constant cost (in dollars), assuming "full
cloud elasticity" (i.e., the number of instances that can be triggered up
is not the bottleneck).
The 28.io platform should supports this (disclaimer: it's my employer).
Kind regards,
Ghislain

Post by jean-marc Mercier
Hi all,
I've tried the following JSON query with zorba, mimicking a NxN, with

N=200, matrix multiplications. Time is 10 sec on http://try.zorba.io/,
behaving with a cubic N^3 complexitity.

Post by jean-marc Mercier
Do you really want to know what are the performances of standard linear

algebra library for such matrix multiplications ?

Post by jean-marc Mercier
import module namespace datetime = "

http://www.zorba-xquery.com/modules/datetime";

Post by jean-marc Mercier
declare variable $size := 200;
declare variable $A := [ for $i in 1 to $size return
[
for $j in 1 to $size return $i*$size+$j
]
];
let $R := ( datetime:current-time(),
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($A)) return
fn:sum(
for $j in 1 to count(jn:members($A)) return
$A($i)($j) * $A($j)($k)
)
]
]
, datetime:current-time() )
return $R[count($R)] - $R[1]
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)
ooh interesting , I wonder where the bottleneck in the xquery is.
Probably as Michael commented at some point earlier in the thread, the
time to access the ith element of a sequence $a[$i].
But the language doesn't _need_ to change, just if more people did it
the xquery compilers would perhaps look out for sequences that are
exclusively accessed via numeric filters and implement them in a way
that gives constant time access. Having a separate array type does give
them a big hint though:-)
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.

________________________________________________________________________

Post by jean-marc Mercier
_______________________________________________
http://x-query.com/mailman/listinfo/talk
_______________________________________________
http://x-query.com/mailman/listinfo/talk

Michael Kay

2014-02-03 14:06:11 UTC

Permalink

Well, I am sure that these algorithms can be parallelized, using 28.io, or other vendor solutions.
However, honestly, today, you would have to spend tons of processors, watts, intelligence, to perform a matrix multiplication over a 4000x4000 matrix with JSONiq or XQUERY 3.0.
I can compute such a multiplication within a second using BLAS over a single threaded process.

I don't think anyone(*) is suggesting that XQuery or JSONiq should be anyone's number 1 choice for doing a matrix multiplication. The key thing is to make the language powerful enough so that if you have an application doing 300 tasks, the fact that one of them involves matrix mutliplication shouldn't stop you.

Just like XSLT isn't one's obvious choice of graphics programming language, but it's sure handy that you can use it to generate an SVG histogram or pie chart.

(*) well, I exclude the XQuery fanatics from this....

jean-marc Mercier

2014-02-03 14:22:01 UTC

Permalink

Michael, I agree with you. From your side, you already know exactly what I
am promoting : XQUERY could be a fabulous tool for scientific computing.
Very flexible, easy template programming, handy and efficient storage
provided by XML database vendors. It could be a graal for mathematicians. I
wish I had weight enough in the XML industry to convince W3C to promote
this point of view to XML Database vendors :)

Post by jean-marc Mercier
Well, I am sure that these algorithms can be parallelized, using 28.io,
or other vendor solutions.
However, honestly, today, you would have to spend tons of processors,
watts, intelligence, to perform a matrix multiplication over a 4000x4000
matrix with JSONiq or XQUERY 3.0.
I can compute such a multiplication within a second using BLAS over a
single threaded process.
I don't think anyone(*) is suggesting that XQuery or JSONiq should be
anyone's number 1 choice for doing a matrix multiplication. The key thing
is to make the language powerful enough so that if you have an application
doing 300 tasks, the fact that one of them involves matrix mutliplication
shouldn't stop you.
Just like XSLT isn't one's obvious choice of graphics programming
language, but it's sure handy that you can use it to generate an SVG
histogram or pie chart.
(*) well, I exclude the XQuery fanatics from this....

Hermann Stamm-Wilbrandt

2014-02-03 15:24:41 UTC

Permalink

Hi,

as other I do see neither JSONiq nor XQuery being able to compete with eg.
native C implemenation of matrix multiplication.

Your 200x200 example is interesting as it shows that JSONiq(JSON,
PT10.838593S) is now slower than XQuery(XML, PT9.03348S):
http://try.zorba.io/queries/xquery/elMs5bmRLr%2FZ0IHY5mr6YNWqgjI%3D
http://try.zorba.io/queries/xquery/FhH%2BPs3wjNB2xwTw%2BchwwMes2dw%3D

Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
http://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

From: jean-marc Mercier <jeanmarc.mercier-***@public.gmane.org>

To: David Carlisle <davidc-ueUsWbEuBr1aa/***@public.gmane.org>,

Cc: Hermann Stamm-Wilbrandt/Germany/***@IBMDE, "talk-***@public.gmane.org Talk" <talk-***@public.gmane.org>

Date: 02/03/2014 12:30 PM

Subject: Re: [xquery-talk] Matrix Multiplication (JSONiq)

Hi all,

I've tried the following JSON query with zorba, mimicking a NxN, with
N=200, matrix multiplications. Time is 10 sec on http://try.zorba.io/,
behaving with a cubic N^3 complexitity.
Do you really want to know what are the performances of standard linear
algebra library for such matrix multiplications ?

import module namespace datetime = "
http://www.zorba-xquery.com/modules/datetime";

declare variable $size := 200;

declare variable $A := [ for $i in 1 to $size return
[
for $j in 1 to $size return $i*$size+$j
]
];

let $R := ( datetime:current-time(),
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($A)) return
fn:sum(
for $j in 1 to count(jn:members($A)) return
$A($i)($j) * $A($j)($k)
)
]
]
, datetime:current-time() )

return $R[count($R)] - $R[1]

2014-02-03 David Carlisle <davidc-ueUsWbEuBr1aa/***@public.gmane.org>:
On 03/02/2014 10:56, Hermann Stamm-Wilbrandt wrote:
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)

ooh interesting , I wonder where the bottleneck in the xquery is.
Probably as Michael commented at some point earlier in the thread, the
time to access the ith element of a sequence $a[$i].

But the language doesn't _need_ to change, just if more people did it
the xquery compilers would perhaps look out for sequences that are
exclusively accessed via numeric filters and implement them in a way
that gives constant time access. Having a separate array type does give
them a big hint though:-)

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
_______________________________________________
talk-***@public.gmane.org
http://x-query.com/mailman/listinfo/talk

Christian Grün

2014-02-03 14:35:55 UTC

Permalink

Hi Hermann,

thanks for your interesting comparison. I'd like to point out that the
loop around the calculation (for $h in 1 to $N) might be optimized
away by a query compiler. If this happens, the only thing that would
be measured in the query would be the concatenation of 10 million
results.

But I completely agree that it's difficult to formulate an example
that is easy to compare if as long as we have no dynamic input.

Christian
______________________________________________

On Mon, Feb 3, 2014 at 11:56 AM, Hermann Stamm-Wilbrandt

Post by Hermann Stamm-Wilbrandt
Thanks for your XQuery 1-dimensional sample.
28.io, Zorba, IBM Websphere DataPower Integration Appliance and Pascal
XQuery engine
It seems not to be that easy to measure runtime.
Since http://try.zorba.io allows to share and run code I used that.
The method I found was to place Zorba's datetime:current-time() in result
sequence as first and last elements.
And the matrix multiplications need to be executed often to result in
measurable times (I did use 10.000.000).
http://try.zorba.io/queries/xquery/vq+kL9tWK+jmntDZz0oxDcyrypA=
http://try.zorba.io/queries/xquery/NIlfOIBmkdvt8+2zNmvM8Hf1+bo=
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)
Yes, that is only result for one processor. But I would assume even (much)
bigger differences in case the matrix dimensions become bigger and not toy
like as in the examples.
(1)
import module namespace datetime =
"http://www.zorba-xquery.com/modules/datetime";
declare variable $A := [
[1,2],
[3,4],
[5,6],
[7,8]
];
declare variable $B := [
[1,2,3],
[4,5,6]
];
declare variable $N := 10000000;
let $R := ( datetime:current-time(),
for $h in 1 to $N return
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($B(1))) return
fn:sum(
for $j in 1 to count(jn:members($B)) return
$A($i)($j) * $B($j)($k)
)
]
]
, datetime:current-time() )
return $R[count($R)] - $R[1]
(2)
import module namespace datetime =
"http://www.zorba-xquery.com/modules/datetime";
declare variable $a:=(2, (:2 columns :) 1,2, 3,4, 5,6, 7,8);
declare variable $b:=(3, (:3 columns :) 1,2,3, 4,5,6);
declare variable $N := 10000000;
let $R := ( datetime:current-time(),
for $h in 1 to $N return
( $b[1], for $i in 1 to xs:int((count($a) -1) div $a[1]), $j in 1 to
xs:int($b[1]) return
sum( for $k in 1 to xs:int($a[1]) return ($a[($i -1)*$a[1]+$k+1] * $b
[($k -1)*$b[1]+$j+1]) ) )
, datetime:current-time() )
return $R[count($R)] - $R[1]
Mit besten Gruessen / Best wishes,
Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
http://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
Date: 02/02/2014 01:44 AM
Subject: Re: [xquery-talk] Matrix Multiplication (JSONiq)

Post by Hermann Stamm-Wilbrandt
Last Sylvester there was a thread on Matrix Multiplication in
XQuery
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk

+matrix+multiplication

Post by Hermann Stamm-Wilbrandt
The biggest problem identified was that XQuery does not allow for
efficient representation of 2+dimensional arrays.

In Xquery 1 you could do
let $a:=(2, (:2 columns :)
1,2,
3,4,
5,6,
7,8),
$b:=(3, (:3 columns :)
1,2,3,
4,5,6)
return
(
$b[1],
for $i in 1 to xs:int((count($a) -1) div $a[1]),
$j in 1 to xs:int($b[1])
return
sum(
for $k in 1 to xs:int($a[1]) return
($a[($i -1)*$a[1]+$k+1] * $b[($k -1)*$b[1]+$j+1])
)
)
which produces
3 (:3 columns :)
9 12 15
19 26 33
29 40 51
39 54 69
As the above is in fact only Xpath 2, you could do the identical
expression in XSLT 2
David
_______________________________________________
http://x-query.com/mailman/listinfo/talk

Hermann Stamm-Wilbrandt

2014-02-03 17:26:05 UTC

Permalink

Hi Christian,

I'd like to point out that the loop around the calculation (for $h in 1

to $N) might be optimized away by a query compiler.
agreed, but I did test with different values of $N and the reported times
increased.

The easiest way of disallowing the optimizer to kick in would be to compute
the matrix A^{2^N} by N repeated matrix square operations, in that case
nothing can be optimized away.

Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
http://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

From: Christian Grün <christian.gruen-***@public.gmane.org>

To: Hermann Stamm-Wilbrandt/Germany/***@IBMDE,

Cc: David Carlisle <davidc-ueUsWbEuBr1aa/***@public.gmane.org>, "talk-***@public.gmane.org" <***@x-query.com>

Date: 02/03/2014 03:36 PM

Subject: Re: [xquery-talk] Matrix Multiplication (JSONiq)

Hi Hermann,

thanks for your interesting comparison. I'd like to point out that the
loop around the calculation (for $h in 1 to $N) might be optimized
away by a query compiler. If this happens, the only thing that would
be measured in the query would be the concatenation of 10 million
results.

But I completely agree that it's difficult to formulate an example
that is easy to compare if as long as we have no dynamic input.

Christian
______________________________________________

On Mon, Feb 3, 2014 at 11:56 AM, Hermann Stamm-Wilbrandt

Thanks for your XQuery 1-dimensional sample.
28.io, Zorba, IBM Websphere DataPower Integration Appliance and Pascal
XQuery engine
It seems not to be that easy to measure runtime.
Since http://try.zorba.io allows to share and run code I used that.
The method I found was to place Zorba's datetime:current-time() in result
sequence as first and last elements.
And the matrix multiplications need to be executed often to result in
measurable times (I did use 10.000.000).
http://try.zorba.io/queries/xquery/vq+kL9tWK+jmntDZz0oxDcyrypA=
http://try.zorba.io/queries/xquery/NIlfOIBmkdvt8+2zNmvM8Hf1+bo=
PT1.713634S (JSONiq) versus PT9.77805S (XQuery)
Yes, that is only result for one processor. But I would assume even (much)
bigger differences in case the matrix dimensions become bigger and not toy
like as in the examples.
(1)
import module namespace datetime =
"http://www.zorba-xquery.com/modules/datetime";
declare variable $A := [
[1,2],
[3,4],
[5,6],
[7,8]
];
declare variable $B := [
[1,2,3],
[4,5,6]
];
declare variable $N := 10000000;
let $R := ( datetime:current-time(),
for $h in 1 to $N return
[
for $i in 1 to count(jn:members($A)) return
[
for $k in 1 to count(jn:members($B(1))) return
fn:sum(
for $j in 1 to count(jn:members($B)) return
$A($i)($j) * $B($j)($k)
)
]
]
, datetime:current-time() )
return $R[count($R)] - $R[1]
(2)
import module namespace datetime =
"http://www.zorba-xquery.com/modules/datetime";
declare variable $a:=(2, (:2 columns :) 1,2, 3,4, 5,6, 7,8);
declare variable $b:=(3, (:3 columns :) 1,2,3, 4,5,6);
declare variable $N := 10000000;
let $R := ( datetime:current-time(),
for $h in 1 to $N return
( $b[1], for $i in 1 to xs:int((count($a) -1) div $a[1]), $j in 1 to
xs:int($b[1]) return
sum( for $k in 1 to xs:int($a[1]) return ($a[($i -1)*$a[1]+$k+1] * $b
[($k -1)*$b[1]+$j+1]) ) )
, datetime:current-time() )
return $R[count($R)] - $R[1]
Mit besten Gruessen / Best wishes,
Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/HermannSW/
http://stamm-wilbrandt.de/GraphvizFiddle/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
Date: 02/02/2014 01:44 AM
Subject: Re: [xquery-talk] Matrix Multiplication (JSONiq)

Post by Hermann Stamm-Wilbrandt
Last Sylvester there was a thread on Matrix Multiplication in
XQuery
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk

+matrix+multiplication

Post by Hermann Stamm-Wilbrandt
The biggest problem identified was that XQuery does not allow for
efficient representation of 2+dimensional arrays.

In Xquery 1 you could do
let $a:=(2, (:2 columns :)
1,2,
3,4,
5,6,
7,8),
$b:=(3, (:3 columns :)
1,2,3,
4,5,6)
return
(
$b[1],
for $i in 1 to xs:int((count($a) -1) div $a[1]),
$j in 1 to xs:int($b[1])
return
sum(
for $k in 1 to xs:int($a[1]) return
($a[($i -1)*$a[1]+$k+1] * $b[($k -1)*$b[1]+$j+1])
)
)
which produces
3 (:3 columns :)
9 12 15
19 26 33
29 40 51
39 54 69
As the above is in fact only Xpath 2, you could do the identical
expression in XSLT 2
David
_______________________________________________
http://x-query.com/mailman/listinfo/talk

Ihe Onwuka

2014-02-02 03:50:44 UTC

Permalink

On Sat, Feb 1, 2014 at 11:33 PM, Hermann Stamm-Wilbrandt

Post by Hermann Stamm-Wilbrandt
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk+matrix
+multiplication
The biggest problem identified was that XQuery does not allow for efficient
representation of 2+dimensional arrays.
JSON does provide 2+dimensional arrays for free.

I gave an algorithm for representing n-dimensional arrays with a 1
dimensional array earlier in the thread.

See my post of 31 Dec 2013 here.

http://markmail.org/message/dkhw7ryirwyviqm3#query:+page:1+mid:ccipumgobpaljjao+state:results

jean-marc Mercier

2014-02-02 20:33:03 UTC

Permalink

N-dimensional representation of arrays are quite straightforward with XML
too. Is there any incentive to expect better performances with a JSON
matrix representation rather than an XML one ?

@Ihe Storing N-dim arrays with 1-D ones, as you quoted, is used since
decades in others languages. It is very well adapted to some particular
scientific computing. But for matrix operations, it might not be so well
adapted (I don't know an example of a linear algebra library using such
structures). Indeed, for "complex" linear algebra operations, one needs
very often to take matrix "slices" in every dimensions, an operation for
which these kind of Multi-D structures might not be very efficient at.

Hope this helps

Post by Ihe Onwuka
On Sat, Feb 1, 2014 at 11:33 PM, Hermann Stamm-Wilbrandt
http://markmail.org/message/7q7qnbbnjo7cljzv?q=list:com.x-query.talk+matrix

Post by Hermann Stamm-Wilbrandt
+multiplication
The biggest problem identified was that XQuery does not allow for

efficient

Post by Hermann Stamm-Wilbrandt
representation of 2+dimensional arrays.
JSON does provide 2+dimensional arrays for free.

I gave an algorithm for representing n-dimensional arrays with a 1
dimensional array earlier in the thread.
See my post of 31 Dec 2013 here.
http://markmail.org/message/dkhw7ryirwyviqm3#query:+page:1+mid:ccipumgobpaljjao+state:results
_______________________________________________
http://x-query.com/mailman/listinfo/talk

Michael Kay

2014-02-02 21:13:46 UTC

Permalink

N-dimensional representation of arrays are quite straightforward with XML too. Is there any incentive to expect better performances with a JSON matrix representation rather than an XML one ?

I think that if you had an XML schema for an XML representation of N-dimensional arrays, and if the XPath processor recognized that schema and used a custom tree representation for its instances, then arrays could be represented using XML just as efficiently as using JSONiq arrays. But if you use a general tree representation that allow any element names, namespaces, base URIs, mixed content, attributes, and all the other paraphernalia of XML, then it is likely to be significantly less efficient.

For example:

* XML is text-oriented, and using XML for numeric values invariably involves string-to-number conversion, which is expensive

* Numeric subscripts when addressing XML (as in para[3]) are likely to have O(n) performance rather than constant performance, because the tree structure is likely to be optimized for scanning all the children rather than locating an individual child by its index.

Michael Kay
Saxonica

jean-marc Mercier

2014-02-03 10:16:48 UTC

Permalink

Michael, you're right, and I do now understand that an efficient
serialization / deserialization of a matrix structure could be more
straightforward using JSON than XML.
However, without considering this potential overhead due to serializing /
deserializing procedure, that should be or order "epsilon" comparing to
linear algebra algorithms complexity, once loaded into memory, I don't
think that a XQUERY coded matrix algorithm will be more efficient using XML
or JSON as serialization choice.

Am I right ?

Post by jean-marc Mercier
N-dimensional representation of arrays are quite straightforward with

XML too. Is there any incentive to expect better performances with a JSON
matrix representation rather than an XML one ?
I think that if you had an XML schema for an XML representation of
N-dimensional arrays, and if the XPath processor recognized that schema and
used a custom tree representation for its instances, then arrays could be
represented using XML just as efficiently as using JSONiq arrays. But if
you use a general tree representation that allow any element names,
namespaces, base URIs, mixed content, attributes, and all the other
paraphernalia of XML, then it is likely to be significantly less efficient.
* XML is text-oriented, and using XML for numeric values invariably
involves string-to-number conversion, which is expensive
* Numeric subscripts when addressing XML (as in para[3]) are likely to
have O(n) performance rather than constant performance, because the tree
structure is likely to be optimized for scanning all the children rather
than locating an individual child by its index.
Michael Kay
Saxonica

Michael Kay

2014-02-03 11:03:08 UTC

Permalink

It's not just a serialization issue. The data model is different; a tree consisting of a root element owning lots of row elements owning lots of cell elements each of which contains a text node which is the string representation of a number is not the same as a two-dimensional array of numbers.

Michael Kay
Saxonica

Michael, you're right, and I do now understand that an efficient serialization / deserialization of a matrix structure could be more straightforward using JSON than XML.
However, without considering this potential overhead due to serializing / deserializing procedure, that should be or order "epsilon" comparing to linear algebra algorithms complexity, once loaded into memory, I don't think that a XQUERY coded matrix algorithm will be more efficient using XML or JSON as serialization choice.
Am I right ?

N-dimensional representation of arrays are quite straightforward with XML too. Is there any incentive to expect better performances with a JSON matrix representation rather than an XML one ?

I think that if you had an XML schema for an XML representation of N-dimensional arrays, and if the XPath processor recognized that schema and used a custom tree representation for its instances, then arrays could be represented using XML just as efficiently as using JSONiq arrays. But if you use a general tree representation that allow any element names, namespaces, base URIs, mixed content, attributes, and all the other paraphernalia of XML, then it is likely to be significantly less efficient.
* XML is text-oriented, and using XML for numeric values invariably involves string-to-number conversion, which is expensive
* Numeric subscripts when addressing XML (as in para[3]) are likely to have O(n) performance rather than constant performance, because the tree structure is likely to be optimized for scanning all the children rather than locating an individual child by its index.
Michael Kay
Saxonica