We had a request for a “smaller” demo sometimes around December: the topic was to enable users to select attribute and values to initiate queries against a data set and visualize them in GoogleMaps. This visualized objects where houses in Germany.
To strip the demo down we produced static data files on a server and the user was only able to select one attribute with specific values and then we showed the result. The GUI was done by using Sencha Architect which is based on Sencha ExtJS JS library.
Demo published – product bought. That was worth it.
Now in January we had to rebuild the demo. The dataset grew much more (estimated to cover whole Germany in the autumn of this year). The original plans where to use a relational database (and NOT Gemstone/S – a company decision) (and the relational database was the source of the data) and write the server part in PHP or whatever.
But as a new project it was another good testing candidate for PUM and our whole software infrastructure and I sat down, defined the model, the API and rebuild the system in a few days including the whole stuff like uploading, downloading, user management, project management (access restriction) and the whole GUI.
The system had been finished 14 days before needed.
The interesting part of the project was, that now parts of Gemstone/S were used which I never really needed before: searching and indexing facilities of Gemstone/S were needed here and the project showed the drawbacks of the Gemstone/S system in this area.
After the system had been finished, I did a closer look toward searching and indexing and compared the result of Gemstone/S against SQLite3 – perhaps an unfair comparision, because SQLite3 is a file based system.
The system under SQLite3 had only ONE table – more was not needed. The table itselfs contains the data of the houses and and also shows up an internal hierarchical structure of the objects (4 layers).
On the top layer (such as Germany) Gemstone/S had to handle the same amount of items as SQLite3. Doing queries on this layer the only difference is the raw spped of the system – here SQLite3 was two times faster than the Gemstone/S working on a multi million items containing IdentitySet.
By adding indexes on the table (actually for each attribute – so many indexes were added), SQLite3 performed up to 10 times faster than the raw (not indexing) Gemstone/S. These numbers are only valid for a warmed-up Gemstone/S database.
The situation changed, when one is leaving the top most level. SQLite3 still has to query the whole table (with additional where clauses to find the sub-layers) again and can not improve very much here. In this example the second layer has only 1/128 members of the top layer and now Gemstone/S (unindexed) is good in time with SQLite3 (indexed). Of course SQLite3 users could introduce 128 additional tables (and copy data from the large table to the new table) and woul0d win again – but I assume, that this is not the way, relational users are thinking about solutions.
Playing with indexes showed quite some problems with Gemstone/S. As long as you have only one index you are on the good path. Adding a second (and more) index(es): times got worse – simply because Gemstone/S chooses the wrong index.
Another problem was shown by using the streaming facility: the today implementation does assume, that an index MUST be available on that set, which is queried. But the programming guide also said, that on not so large sets one should not create indexes. So in my hierarchy I have nodes, where the sets are very large and some nodes, where the sets only contain 100 items. The streaming approach can not be used in general on all nodes – thats pretty bad. I would liek to see a more general interface for queries.
Another problem I found was, that the non-streaming approach make an assumption, that the result of a query should only be a small set – and behaves very badly (in milliseconds), when the result set is around 1/3 of the total set. It executes the query and creates a temporary set to hold the items found – adding millions of items takes quite some time here.
So today (with 3.3.3) I am only considering to use one index to use and the streaming approach whenever possible.
The rest of the system is the same (as in all other examples I posted): Gemstone/S 3.3.3, Linux, Apache frontend http, load balancing. Modelling in PUM, Source code gcreation for Topaz (Server) and Sencha ExtJS (client).
On a positive side: Dale got an example application from me,showing all these problems and he is using it in improving the index system in Gemstone/S 3.4 and the first numbers he gave to me looked promising.
After all: the systems for two customers went online yesterday.