Something like the following, maybe?
function(doc) {
//as before, emit every tag
}
function(keys, values, rereduce) {
if(!rereduce) {
//return the following, in an object or list to unpack:
// 1) the first and last tag in keys (or the same tag twice, if
all are equal or length == 1)
// 2) a count of unique tags (not including and not equal to the
first and/or last)
// This assumes keys is sorted (which I think is safe), but if not
you can always sort them yourself
} else {
var total = //sum of all inner unique counts from !rereduce step
var merged = //merge all the [first, last] lists
if(merged.length != 2) {
//Run the !rereduce clause (maybe put this in a function) on merged
//Add the resulting inner count to total, set merged = [first,
last] result.
}
//Return the new inner total and new [first, last]
}
}
I'm pretty sure I've messed things up horribly somewhere in there, but
the basic idea feels right. Explore it.
A word of caution though: this algorithm assumes that there is no
overlap in key range for each independent initial reduction. I believe
this is a safe assumption when running on a single CouchDB node.
However, this will not be the case if something like lounge or
bigcouch is rereducing results from many shards with overlapping tag
ranges.
If anyone can think of a better, more general algorithm that avoids
building a huge dictionary in the reduce phase, I think it'd be
interesting to see, and probably useful to more people than just
Anand. :)
-Randall
Post by Anand ChitipothuPost by Anand ChitipothuIs it possible to count the number of unique values by writing a couchdb view?
Consider the following 2 docs.
{
"_id": "posts/1",
"title": "post 1",
"tags": ["foo", "bar"]
}
{
"_id": "posts/2",
"title": "post 2",
"tags": ["foo", "foobar"]
}
Is it possible to find that there are 3 tags?
Yes. Just write a map function that emits all tags it finds (not checked and
function(doc) {
for(tag in doc.tags) {
emit(tag, null);
}
}
In the reduce-function, just use _count
(http://wiki.apache.org/couchdb/Built-In_Reduce_Functions).
That gives counts of each tag, not the total number of unique tags.
{"rows":[
{"key":"bar","value":1},
{"key":"foo","value":2},
{"key":"foobar","value":1}
]}
And without any group_level it will return 4, which is the total
number of tags occurrences.
{"rows":[
{"key":"null","value":4}
]}
Is there any way to find the number of *unique* tags?
Anand