So after a bit more coffee and a bit of research it seems to me that
the only way you are going to get this to be fast would be if you used
a hash based looked for one of your sequences. Something like a
HashMap or BloomFilter would do the job, see:
I have made some experimentations with the above in Scala, a HashMap
works nicely here in-terms of performance because it is simple and
available, and in your case with Integers you do not need to worry
about duplicates because the value of a number is it's identity.
So, if you don't want to implement a HashMap or BloomFilter in XQuery,
what is one to do? Well XQuery 3.1 will introduce the Map data-type
and some implementations already have done this. If you understand the
implementation (or even hazard a guess), there is a good chance that
that XQuery map may in-fact under the covers be a HashMap.
I won't say too much more about this as I have been discussing it with
Wolfgang this morning, and I think he will shortly post you a very
fast example when using XQuery 3.1 Maps in eXist...
Post by Adam Retter Post by Adam Retter
I think that question is very implementation specific. If all of your
data is in RAM, as your dataset is relatively small and these are just
numbers, I would expect performance to be excellent. I am not sure
that sorting will make much of a difference, but it depends on the
implementation and how it initiates the search for a false comparison
in a large sequence.
I retract any previous statements that may have alluded to good
performance. A quick test on Saxon and eXist shows that this is a very
slow problem. I am trying to see if there is not a short-cut that
could be taken, I will come back to you...