Now that we have a decent FeatureCollection interface (well not indecent, you could take it home to meet mom), and more importantly a FeatureCollection interface that is used - it's time to have some fun.
There are two ways to go about hacking at features these days: FeatureReader (and Iterator) or as an entire FeatureCollection (formally FeatureResults)
What is the Point
I would like to set up a FeatureVisitor extension for making summary calculations. I am going to have
to define the API for uDig developers anyways, they already want to know the number "distinct" values
for an attribute, or min/max information etc ...
This is needed to make SLD documents (I mean SymbologyEncoding documents) and that involves giving the
users a bit of a summary of what is going on - histogram, unique values, ranges, statistical breakdown - or something.
Note: James already has the problem licked for GeoVista
The bad old days
In the bad old days we would need to write a for loop to do something like "count" the number of features.
There are a couple things wrong with this picture:
- it prescribes an order (sequential or otherwise) through the data
- it gives us no chance to turn that lovely "count" into naked SQL for speed and profit
In a sense it depends on the internal structure of the FeatureCollection and by showing us too much of the guts, takes away the FeatureCollection implementors freedom to do magic stuff.
Ah Magic Stuff
What kind of magic stuff?
- like splitting the collection in two and processing half on each processor
- picking up the visitor and processing over on the server side
- abusing the internal structure of the collection (perhaps there is an index?)
The amount of magic you can do really depends on the abilities of your FeatureCollection, different magic is available for an indexed shapefile vs a local HSQL Database. You may need to balance the amount of data collected against any distribution that may be in play (consider a geometric "buffer" opperation performed on a remote PostGIS).
So yeah Visitor Pattern
Normally when you want to hide internal structure you break out the visitor pattern ... see GOF.
Yes that really is just the inside of the for loop}
- Note: To try this out we will need FeatureCollections.visit( FeatureCollection, FeatureVisitor )
Now there is a small wrinkle - when breaking apart work, and merging it together again you should use a "Collector" - the one most people are familiar with are TestResults from the JUnit infrastructure. You can see this idea already in the ValidationResults (if you want an example in GeoTools).
By isolating the traversal of the datastructure and reporting of results, it buys us a lot of freedom. So much freedom we can just pretend to visit, as long as we produce the right results nobody will know we generated some SQL on the fly.
Where is this Going?
or even better:
and we can do more fun stuff too:
Other stuff that Jody wrote that hasn't been implemented yet:
- replaces FeatureResults getCount() .... aFeatureSource.features( Filter filter ).visit( new Count() )
- replace getBounds() can do the same kind of thing
- Visitor is also easier to optimize if you have multiple processors ...
- hard to optimize into SQL