GeoTools : ApplicationMetadataAndComplexTypes

Support of Complex Schemas with Application MetaData (AMD)


The Application Metadata framework developed at SIS allow us to describe simple FeatureTypes using the following metamodel (the names in brackets are the classes we use to store metadata on FeatureType Model):

  • A (Simple)FeatureType (Meta_Type) is composed by an ordered list of AttributeTypes and Associations
  • An AttributeType (Meta_TypeAttr) is basically an Atomic type (String, Date, Timestamp, Float, Integer, Boolean, Geometry) to which we can associate some restrictions (for example using a validity check boolean Expression).
  • An Association or Relationship (Meta_TypeRel) is a link directed from the FeatureType in which it is declared towards the associated FeatureType. The way the Association is implemented is determined by the application.
  • A FeatureType may extend (or inherit from) another FeatureType; the way this extension works is determined by the application.

A model defined using the given metamodel is then mapped by the application into FeatureTypes, can be used to execute createSchema() methods, or to associate Validations to FeatureTypes. A more detailed description of the metamodel and how the model is mapped into FeatureTypes can be found at Meta Information Infrastructure. Please note that the model could also be translated into SQL CREATE TABLE instructions, I mean it is general enough to be translated into differen logic models.

At this point, the metamodel is basically an Entity/Relationship, and was designed to best fit the needs of having to deal with RDBMS data without bothering about nested Features and other possibilities given by the GML specification.

As the Feature Model is moving towards the support ComplexTypes, we're trying to see if this metadata framework can fit to it or needs refactoring.


The base for this discussion lies in the following documents:


  • See if our metadata fits with the new FM
  • See if metadata can describe mappings between FeatureTypes (private to public schemas, joins, complex to and from simple types, views...)
  • See if metadata can describe how to build FIDs (declarative approach to FID)
  • Are we interested in building (Complex)Types on the fly for querying purposes?
  • ...

The relationship between Complex and Simple Schemas

It's important to establish a clear relationship between a simple and a complex schema, which probably means how to describe one in term of the other, or vice versa.

If I well understood, in Feature Model Proposal the simple schema is seen as a simplified view over a complex one, i.e. the SimpleFeature API is an alternative way to access a complex content.

With our current implementation of metadata the view is upset down: we should be able to describe complex types using simple ones.

Please note that currently in AMD there's no separation between schema and type. This separation could be the key to achieve Schema mapping. Could we think of applying different Schema Descriptors to the same ComplexType in order to "reshape" it?

Review of Complex schemas - Business driver examples: comparisons with AMD

Let's see all the examples given in Complex schemas - Business driver examples and see, case by case, if AMD model fits or not.
Remember that currently AMD gives a flat view of features.

A feature may have a multi-valued property

Multi-valued properties can be implemented as Features related to the owner Feature (I call them WeakFeature, following the concept of Weak Entity in ER models): a WeakFeature is always related to one and only one Feature (the owner), and cannot exists without having a owner.
The wq_plus type has a multivalued measurement attribute, which means that measurement is a flat feature related to wq_plus.
How can we render these two related features as a complex? Either stating that measurement is a weak entity (an instance cannot exists if not related to wq_plus), or stating that the association between measurement and wq_plus has to be rendered as a multi-valued property.
Currently in AMD we depict a WeakFeature as a Feature having one Association flagged as primary (this means that the attributes used for rendering the association will be part of the primary key or FID).

Please note that multiple instance of a feature attribude don't make sense if they aren't somehow identified in a traceable way: time, geography, meaning, modality... we always can expect that a multiple value property is made of the property value itself plus one or more attributes giving somehow meaning to that value. In the case of measurement, we have the Date value associated with the measurement itself.

A feature may have various multi-valued properties

The same as the above example: each multiple-value-attribute is a weak Feature related to the attribute owner Feature.

A feature may have multiple geometries

The same as above: one of the attributes of the WeakFeature is a Geometry.
Currently in GIS-DB world there's an abuse of single instances of Multi* geometries... for examle, depicting the sampling locations as a single instance of MultiPoint geometry... the real world cases in which I don't need to differentiate between each sample point are quite rare, so using a related WeakFeature sample with a point geometry and other attributes seems to be usually the best solution.

A feature may be defined to include properties from many namespaces.

That's ok, shall we provide a namespace property for the Meta_TypeAttr, too? AMD attributes are always atomic, so they were not considered to be extensible or reusable. We achieved atomic attribute reusability using TemplateFeatures.

A set of features may be inter-related (bidirectional association)

Currently in AMD the associations are directed, but keep in mind that directionality only refers to association navigation - all associations are in fact bidirectional. In AMD a reverse_name is provided for the association, so that the application is free to render it in a bi-directional association.
Example: a RoadElement starts end ends into a Junction: this is described via the startJunction and endJunction directed associations from RoadElement to Junction.
The application could render these association bidirectional making them accessible from Junction: for example, enteringRoadElements e exitingRoadElements.

Many features may share the same feature as a property

This is fully expressed via AMD's associations (relationships). Again, the related Feature can be eventually seen as a property of the owner Feature.

Features may not exist except as a result of queries

DataStores are already capable of exposing RDBMS views as they were "normal" FeatureTypes. The problem is - can we create views within GeoTools using AMD, so that we can have a join between a shapefile and an Oracle table? I suppose that the ability of defining Complex types out of simple ones with AMD could also help us in this direction.

Multiple columns could be mapped to a multi-value property

This use case is the most complicated. Usually we have to deal with the opposite case, which is simply the creation of a Pivot table.
Moreover, it introduces the need for applying join, projection, union operations.

Let's see how to map one schema into the other one, and vice versa.









flat instance
<sample gml:id="watersample.1">
complex instance
<sample gml:id="watersample.1">

Flat to Complex:

In SQL, I would create a measurement view as a UNION query like:

create view measurement as 
select watersampleid, "ph" as parameter, ph as value from watersample UNION
select watersampleid, "temp" as parameter, temp as value from watersample UNION
select watersampleid, "turbidity" as parameter, turbidity as value from watersample

providing ph, temp and turbidity are all expressed with the same atomic data type (Float, for example).

We should achieve a way to render this in AMD so we can build a virtual measurement FeatureType, establish a relation between it an watersample, and finally create a ComplexType rendering watersample with the list of measures as a complex attribute.

Complex to Flat:

Pivoting can be done if, given a table, we can get couples of values, one for building field names, the other for building field values. The number of resulting fields can be unknown at result time, thus doesn't fit with the idea of having Unmodifiable FeatureTypes.

If the number of possible "field names" (i.e. parameters) is known at design time, we could do the pivoting manually:

create view ph_meas as select watersampleid, value as ph from measurement;
create view temp_meas as select watersampleid, value as temp from measurement;
create view turbidity_meas as select watersampleid, value as turbidity from measurement;

create view flat_samples as select watersample.watersampleid, ph, temp, turbidity from watersample, ph_meas, temp_meas, turbidity_meas where
   watersample.watersampleid = ph_meas.watersampleid and watersample.watersampleid = temp_meas.watersampleid and ... we can perform a simple join on the given views using the watersampleid with the watersamples table and obtain a flat view.

Basically, for mapping one schema into another one, we need the following operations:

  • Filtering
  • Sorting
  • Renaming, or hiding an attribute
  • Adding calculated attributes (expressions) - see the following section
  • Joining Features
  • Grouping and aggregation

I know, I'm trying to reinvent SQL... (smile)

A feature property may be mapped to part of or more than one storage schema

We should add support for attributes which are calculated on the fly using Expressions.

For now, AMD give us the ability to assign calculation objects to Attributes, so that during editing the result of calculation instead of a value added by the user is stored in the Feature, thus implying that the underlying attribute exists.

Conclusion and open problems

Though it seems that AMD currently supports many aspects of creating complex features, still a lot of work has to be done in:

  • Complex Features could be described in terms of simple Features, i.e. keeping a "FlatFeature AMD model";
  • We need to separate Feature storage issues from Feature Model in AMD;
  • We should provide an intermediate layer - composed of both MetaData and code - which can give different storage solutions to the same logic data model (ComplexFeatures saved in GML versus flat features saved in RDBMS), or, on the other hand, give different Feature views to the same storage.

Basically FeatureTypes / Features are something which is returned by a DataStore; an RDBMS-based DataStore is expected to always return flat Features (but Oracle supports nested recordsets...), while a ComplexDataStore may have the ability to obtain Features from one or more underlying DataStores, combine them using joins, filtering them (optimization can be done delegating the filtering to the underlying DS when feasible), adding calculated attributes, modifying the attribute layout (derived Feature schema), possibly performing groupings and aggregates, and so on.

This means that we should always expect to deal with ComplexFeatures, while SimpleFeatures are just a particular case.
We could describe complex features as aggregates of simple features.

Do we need this operation feasible "on the fly" (like performing an arbitrary SQL select on a DB)? If it's the case, WFS should support it - otherwise, the derived FeatureType must be described by some pieces of AMD.

Are we going to need a clear separation between "base FeatureTypes" (i.e. FT physically stored somewhere) and "derived FeatureTypes" (i.e. views, which can be based on base and derived FT)? To me it seems that this is just a DataStore configuration issue.

I'm going to think about this some more time, and hope that the topic is interesting an will stimulate a good discussion among us.