Massive Graph Insert with OrientDB

In our product REWOO Scope we use a postgres database as underlying data storage. While postgres serves a good overall performance, some of our data structure are more graphish like data structures. A standard RDBS does not suit well here – even a recursive SQL query in postgres. Our current implementation needs over 2 1/2 minutes to collect over 70000 vertices of one particular sub graph.  So we investigate some time to evaluate OrientDB, an awesome open source graph-document database.

One of our problem was to load our graph structure with over 1,3M vertices and almost 2,5M edges into OrientDB. The main problem was that the insert of vertices went well, but the insert the edges was a pain. Further we got OConcurrentModificationException here or OutOfMemoryError (perm space) errors there. After some implementation iteration we found a good way:

  • Create indexes before inserting vertices or edges
  • Insert the vertices with OIntentMassiveInsert()
  • Insert the edges with disabled Level 1 cache

Our setting

  • Directed graph with 1,3M vertices and 2,5M edges
  • Each vertex has 3 properties (1x String, 2x Long). Edges do not have properties.
  • Test machine: i7 8×3,4GHz, 8GB RAM, SSD, Ubuntu 13.04, 64 Bit
  • OrientDB 1.5.0, Java 1.7.0_09 with -Xmx6g

Now it needs about 100 seconds to read 1,3M vertices (0.07ms/vertex) and about 370 seconds to read about 2,5M edges (0.15ms/edge). The graph database needs about 450 MB disk space. BTW: the final graph traversal of 70000 vertices took about 4 seconds which is a very good result against our current implementation.

Resources:

For dummy source code read the full blog entry.

package com.rewoo.graph;

import com.orientechnologies.orient.core.config.OGlobalConfiguration;
import com.orientechnologies.orient.core.db.graph.OGraphDatabase;
import com.orientechnologies.orient.core.id.ORID;
import com.orientechnologies.orient.core.intent.OIntentMassiveInsert;
import com.orientechnologies.orient.core.metadata.schema.OClass;
import com.orientechnologies.orient.core.metadata.schema.OType;
import com.orientechnologies.orient.core.record.impl.ODocument;

import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

interface IVertex {
    public String getProperty1();
    public Long getProperty2();
}

interface IEdge {
    public IVertex getSource();
    public IVertex getTarget();
}

interface IGraph {
    public List getVertexList();
    public List getEdgeList();
}

public class OrientGraphLoader {
    public static String PROPERTY1 = "property1";
    public static String PROPERTY2 = "property2";

    OGraphDatabase orientGraph;

    public static void main(String[] args) {
        String location = "/tmp/orientdb";

        OrientGraphLoader loader = new OrientGraphLoader();
        loader.open(location);
        IGraph graph = null;
        // read graph
        loader.load(graph);
        loader.close();
    }

    public void open(String location) {
        close();
        orientGraph = new OGraphDatabase("local:" + location);
        if (new File(location).exists()) {
            orientGraph.open("admin", "admin");
        } else {
            orientGraph.create();
        }
    }

    public void close() {
        if (orientGraph != null && !orientGraph.isClosed()) {
            orientGraph.close();
            orientGraph = null;
        }
    }

    public void load(IGraph g) {
        createIndex();
        Map<IVertex, ORID> vertexMap = insertVertices(g.getVertexList());
        insertEdges(g.getEdgeList(), vertexMap);
    }

    protected void createIndex() {
        OClass v = orientGraph.getVertexType("V");
        v.createProperty(PROPERTY1, OType.STRING);
        v.createProperty(PROPERTY2, OType.LONG);
        v.createIndex(PROPERTY1 + "_idx", OClass.INDEX_TYPE.NOTUNIQUE, PROPERTY1);
        v.createIndex(PROPERTY2 + "_idx", OClass.INDEX_TYPE.NOTUNIQUE, PROPERTY2);
    }

    /**
     * Insert vertices in batch mode
     *
     * @link http://code.google.com/p/orient/wiki/PerformanceTuningGraph
     * @link http://code.google.com/p/orient/wiki/PerformanceTuning#Massive_Insertion
     */
    private Map<IVertex, ORID> insertVertices(List vertices) {
        Map<IVertex, ORID> vertexMap = new HashMap<IVertex, ORID>();

        orientGraph.declareIntent(new OIntentMassiveInsert());

        ODocument doc = orientGraph.createVertex();
        String vertexClass = doc.getClassName();

        for (IVertex vertex : vertices) {
            doc.reset();
            doc.setClassName(vertexClass);
            doc.field(PROPERTY1, vertex.getProperty1());
            doc.field(PROPERTY2, vertex.getProperty2());
            doc.save();
            vertexMap.put(vertex, doc.getIdentity().copy());
        }

        orientGraph.declareIntent(null);

        return vertexMap;
    }

    /**
     * Read edges with disabled level 1 cache
     */
    private void insertEdges(List edges, Map<IVertex, ORID> vertexMap) {
        OGlobalConfiguration.CACHE_LEVEL1_ENABLED.setValue(false);

        for (IEdge edge : edges) {
            ODocument source = orientGraph.load(vertexMap.get(edge.getSource()));
            ODocument target = orientGraph.load(vertexMap.get(edge.getTarget()));

            orientGraph.createEdge(source, target).save().unpin();
        }

        OGlobalConfiguration.CACHE_LEVEL1_ENABLED.setValue(true);
    }

}
Advertisements

2 thoughts on “Massive Graph Insert with OrientDB

    • You should find the orientdb folder in /tmp directory in Unix systems. Change the location variable respectively for Windows systems.

      Do not forget to adapt and implement the interfaces of IVertex, IEdge, and IGraph.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s