A proposal for bringing full FM branch functionality to trunk
Proposal to bring community schema support into Geotools Trunk
I'm pretty loving the idea of geotools being able to work with both feature models in parallel, the current geotools legacy one and the new GeoAPI - thus standard – one, with minimal impact on the existing code, both internal and our users code.
Now that people is having a lot of fun with dependency injection, I think we could draw a plan
for bringing FM to trunk as a parallel type hierarchy, forking DataStore (instead of indefinitely extending) would be cleaner, and to provide the needed factories implementations that work with both models too (Filter, etc).
In other words, the current infrastructure, as is, is being widely used and users will be hesitant if their only chance to keep using the library is to adapt their applications to the new GeoAPI Feature Model in a short period of time, or get lost in the limbo. But, we could provide both worlds just using OO techniques, and seems like a good time to do that.
I see no restriction (other than horse power) for bringing up FM into trunk as a parallel model to the current one and, even more exciting, to get the core layers (data, rendering, + styling) playing nice with both models.
In this proposal we present an introduction for the motivations behind the need of bringing a mixture of the functional power of the GeoTools complex-features branch together with the revised GeoAPI Feature Model into trunk, what the history of these past attempts is and how the current situation of the mainstream version of the library seems as a perfect niche to finally achieve success.
The proposal itself explain the strategy to do that and what the implementation should look like, the risks involved, and the need for the GeoTools community to embrace this idea and participate.
Contact: Gabriel Roldán
Assigned to release
An information community consists of a group of people who share information. A specific information community may be characterized by an information model and the catalogue of feature types that are of interest to its members (see ISO 19110, 19126). Agreement on the definitions of the community's set of feature types will facilitate information sharing within the community. 
Yes, this was always the motivation for the, now three, failed attempts for bringing complex features support into geotools trunk. And still is. The difference now is trunk being in a better shape as to allow functional extensions while not necessarily bloating up the API.
We want to bring into trunk the functional power of complex-features branch and its ComplexDataStore, along with the natty Feature Model in the FM branch, and without forcing a geotools API replacement, keeping everything that's working since years still working and maintenable, with no regressions, and not to impose a "upgrade or die" evolution strategy to the incommensurable amount of code that relies on the library out there.
Also, we really think the immense effort put in the development of the two branches due to the commercial contributors and, more significantly, the non interested contributions of the people from the community that embraced the previous efforts and donated their time and talent deserves a happy end.
Finally we're convinced that what we're proposing is perfectly possible and practicable, and a very smart way of achieving the common goal of enabling information communities to inter operate.
Throughout almost the last four years in the GeoTools history there were various attempts to make the library's feature model richer enough to represent complex feature models. Though they achieved different functional success, they all failed to become part of the core library due to almost the same reasons. We learned a lot in the process, and as you'll see in the paragraphs bellow, we're closer than ever from achieving final success.
Common reason of failure: trying to do too much.
The current mainstream version of geotools targets 2.4, which is aimed to pass QA in terms of an improved factory system, a Filter implementation aligned with GeoAPI and able to work over more than just org.geotools.feature.Feature.
This way, the Filter API and especially Expression serves as the abstraction layer that allows different data models to transparently interoperate, like POJO, the current GeoTools Feature api, the new GeoAPI Feature api, and anything else that you want to implement Expression implementations for.
By the other hand, we're starting to take advantage of the dependency injection pattern, thus truly allowing the usage of different implementations on the same execution environment.
Finally, we need to ensure that regardless of the internals of the library it remains easy to use, and that the new functionality is easily embeddable into client code. This is the reason the uDig Catalog has been backported to GeoTools. The Catalog and Resolve interfaces allows for implementors to provide adaptors (actually resolvers) to different interfaces. We'll see how the use of this pattern will allow us to easily embedd the new Feature Model.
The proposal is to create a GeoTools unsupported module that brings, by one side, the implementation of the new GeoAPI Feature Model living in the FM branch into trunk; and by the other side to adapt the ComplexDataStore living in the complex-features branch to the new Feature Model.
This way, this two nice pieces of software could finally come home in a non disruptive way. As a proof of concept, I went ahead and created a module holding only the org.opengis.feature implementation from the FM branch into my checked out copy of trunk. Surprisingly, it was only one hour and a half of work to get both feature models in the same running VM, and with all the unit tests still passing.
As per data handing, we have to attack both data access and rendering. For data access we'll keep the current GeoTools DataStore API untouched, and revise the the GeoAPI one as stated above, so any data access implementor can both keep the current working stuff as well as provide adaptors to GeoAPI.
Rendering is actually going to be attacked by community members (thanks Jody), and since there isn't actually a true hard dependency between rendering code and data model, it is possible to make de dependency between rendering and the geotools feature model even looser, as long as Filter/Expression provides the abstraction needed to access data in a generic way. Think of renderer fetching geometries and properties from a (Feature)Collection using Expressions instead of calling Feature.getAttribute directly.
Finally, the existence of both Feature and Data Access type hierarchies has not to imply it becomes hard to use the library. To deal with this, we have the IAdaptable/IResolve pattern mentioned early. So user code can ask a "resource" to resolve both to an org.geotools.data.DataStore or to a org.opengis.feature.FeatureStore. Perhaps more importantly, it would be easy to privide geoapi FeatureStore adaptors for the current DataStore implementations, as well as GeoAPI Feature adaptors for the current GeoTools Feature implementations.
The proposed situation will lead to three user profiles:
- Those that have long term running geotools based applications.
- Those having geotools based applications that wish to upgrade to the new model.
- Those going to build new geotools based applications.
The first ones will be happy about not being imposed a huge refactoring.
The second ones will have the possibility of embracing the benefits of the new model smoothly
The third ones are for sure going to be a lot of new users, since it is a long time now since people jumps on the mailing list asking for such a functionality.
This approach is to do the transition up front. The first step will be to make elements in the old feature model ( Feature,FeatureType,AttributeType ) extensions of interfaces in the new model ( SimpleFeature,SimpleFeatureType,AttributeType ).
This approach has a very clear goal: to make the new feature model implementation "turn over" on trunk. And nothing beyond that. Being core to the entire code base, a new feature model has an impact on the data access / data store api. Tackling those issues will not be a goal this time around.
Where possible, this approach will strive to maintain backwards compatability. In some cases however, there will be conflicts, so some breakage will be required. Beyond syntax, this approach must also strive to maintain the semantics of the old model. The old model comes with many assumptions about the behaviour and structure of features, these must be preserved.
- A single feature model, with remnants of the old model existing as an extension of the new model.
- Transition occurs up front, no need to rally the troops to switch over after the fact
- Some api breakage will occur
The following is a check list of the needed work identified to implement this proposal:
- Bring the Feature implementation in FM into an unsupported GeoTools module
- Adapt ComplexDataStore from the unsupported feature model it lives with in the complex-features branch to the new Feature Model
- Make ComplexDataStore use the eclipse's EMF library to load community schemas from XML XSD files instead of using the legacy and weak parser it is currently using
- Review the GeoAPI data access interfaces to serve as the parallel interface hierarchy to the GeoTools DataStore, and ensure it captures the semantics of the new Feature Model (i.e., proper use of namespaces, query by "Feature realization" name, not by FeatureType name -i.e. Roads, not RoadsType-, etc)
- Ensure Filter works well over both Feature Models (as a side note, making it work over POJOs is currently in course)
- Make ComplexDataStore being able of wrapping both complex and simple feature providers
- Find a better name for ComplexDataStore
Backwards compatibility issues
It is not expected to impose any backward compatibility issue. Moreover, we fully argue that the approach taken shall leverage backwards compatibility, easy of maintenance, and to provide a smooth evolution path.
While this approach is 100% technically sound, I have some concerns which I will categorize
Effects of Branching
Many efforts in geotools have failed due to branching. The reason being that developers on a branch ( and I am probably the worst offender ) are not constrained in any way. They are essentially let loose into the wild and not subject to the normal checks and balances that people developing on trunk are subject to. The result is usually software that is very nice and performs some very usefule function, but is useless to the rest of the codebase because it breaks some rule, or changes some semantic that will have a profound impact to the rest of the codebase.
Just recentley, the notion of "unsupported" modules has been added to the geotools codebase. I beleive this proposed usage of an unsuported module is a misuse of the concept. In my mind an unsupported module should be one that is poorly maintained, lacks documentation, test coverage, and is buggy. I dont think it should be a playground for new R&D. Especially not R&D that is ultimatley supposted to replace a very core part of the geootools code base.
Lack of a good Transition Strategy
Any branched development that expects to come home must have a transition strategy of some sort. In my opinion providing a means to transition after the fact is not good enough for a number of reasons.
At the end of an effort, funding is usually dried up
In our world, R&D efforts usually have some sort of target, which usually boils down to a customer wanting some bit of functionality in uDig or GeoServer. And so as happy developers we start on phase1 which is the geotools portion of the project. So we jump through the hoops of creating a branch, or slice of svn somewhere, and happily develop away. Next is on to phase two, the modiciations to the app. After that, the customer has what they want and everyone is happy. And then your boss tells you that time and funding is up and it is time for you to start working on something else. So you document your geotools work, and provide a means for other developers to transitition to it.
Technical superiority is not motivation enough
Part of documenting your work is providing the transition means. And of course because what you have done is tecnically superior to what is on trunk people will be chomping at the bit to swtich over right? Wrong. Porting something to a new bit of functionlity takes time and effort. These types of efforts are usually not factored into project plans so its hard for developers to justify spending time on it. Doing so requires a round of integration testing and user acceptance testing. Usually the ends don't justify the means and the port does not happen.
Fear that GeoAPI Feature Model is not Stable
Between the first and second attempts of creating a new geotools feature model, the GeoAPI model for it changes drastically. The major motivation to basing the feature model off of GeoAPI is that it will provide us with useful feedback from people who are interested in a general feature model. However, I have not really seen this feedback, and the GeoAPI feature model has not yet been stamped with stable.
My fear is that if work is started, people will actually start to review it, and find fundamental flaws, which of course were not initially accounted for. This happening leaves geotools in a sticky place, being better off with the old model.
Posted by jdeolive at Dec 10, 2006 15:36
Opps; even if we can get GeoTools to run both models at the same time, the resolution idea will not work for the poor GeoServer code base .... let me try a quick outline, but really planning will need to be in consultation with others.
GeoTools 2.4: Start early
GeoTools 2.5: Complex-DataStore Support
GeoServer 1.6 (ie trunk on trunk):
Both of these will stress the geoserver design in different directions
Posted by jive at Dec 14, 2006 23:46