In our product REWOO Scope we use a postgres database as underlying data storage. While postgres serves a good overall performance, some of our data structure are more graphish like data structures. A standard RDBS does not suit well here – even a recursive SQL query in postgres. Our current implementation needs over 2 1/2 minutes to collect over 70000 vertices of one particular sub graph. So we investigate some time to evaluate OrientDB, an awesome open source graph-document database.
One of our problem was to load our graph structure with over 1,3M vertices and almost 2,5M edges into OrientDB. The main problem was that the insert of vertices went well, but the insert the edges was a pain. Further we got
OConcurrentModificationException here or
OutOfMemoryError (perm space) errors there. After some implementation iteration we found a good way:
- Create indexes before inserting vertices or edges
- Insert the vertices with OIntentMassiveInsert()
- Insert the edges with disabled Level 1 cache
- Directed graph with 1,3M vertices and 2,5M edges
- Each vertex has 3 properties (1x String, 2x Long). Edges do not have properties.
- Test machine: i7 8×3,4GHz, 8GB RAM, SSD, Ubuntu 13.04, 64 Bit
- OrientDB 1.5.0, Java 1.7.0_09 with -Xmx6g
Now it needs about 100 seconds to read 1,3M vertices (0.07ms/vertex) and about 370 seconds to read about 2,5M edges (0.15ms/edge). The graph database needs about 450 MB disk space. BTW: the final graph traversal of 70000 vertices took about 4 seconds which is a very good result against our current implementation.
- PerformanceTuning – Use_the_Massive_Insert_intent
- Bulk insert of a massive graph
For dummy source code read the full blog entry.