What is olap traversal in graph database?
OLAP stands for OnLine Analytical Processing, is one of the ways to traverse graph database parallelly in batch operations.
Janusgraph OLAP Traversal makes use of distributed graph processing by leveraging gremlin plugin for Apache Hadoop and Apache Spark.
For more information on this topic please refer to below links:
JanusGraph with TinkerPop’s Hadoop-Gremlin - JanusGraph
The Problem
We had a working setup of Janusgraph with version 0.5.2 where we were able to insert and query (OLTP) the data as per need. We were exploring JanusGraph OLAP traversal for some reporting and analytical requirements. However when we tried to follow the instructions provided on the JanusGraph documentation, we were not able connect to Cassandra with SSL enabled, when traversing the graph in OLAP mode through Gremlin queries. Cassandra database was setup on SSL connection with a Truststore expected with client connection requests. OLTP Queries or the regular way of working with the queries was working fine and inline with the official documentation available.
Below is config for OLTP which works janusgraph-cql-oltp.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cassandra.cassandra.svc.cluster.local
storage.username=cassandra
storage.password=cassandra123
storage.cql.keyspace=janusgraph
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
storage.lock.wait-time = 60000
storage.cql.ssl.enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/truststore
storage.cql.ssl.truststore.password=secretpasswd
When we load this line in gremlin console to connect and traverse a simple query we were able to fetch the expected results.
Below is the config for OLAP which is showing error for connection to Cassandra with ssl enabled:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
# # JanusGraph Cassandra InputFormat configuration
# # These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cql
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=cassandra.cassandra.svc.cluster.local
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.username=cassandra
janusgraphmr.ioformat.conf.storage.password=cassandra123
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
janusgraphmr.ioformat.conf.storage.lock.wait-time = 60000
janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/etc/config/tls/truststore
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123
janusgraphmr.ioformat.conf.storage.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.ssl.truststore.location=/etc/config/tls/truststore
janusgraphmr.ioformat.conf.storage.ssl.truststore.password=cassandra123
janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE
storage.lock.wait-time = 60000
storage.cql.ssl.enabled=true
storage.cql.ssl.client-authentication-enabled=true
storage.cql.ssl.truststore.location=/etc/config/tls/truststore
storage.cql.ssl.truststore.password=cassandra123
janusgraphmr.ioformat.conf.cache.db-cache = true
janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20
janusgraphmr.ioformat.conf.cache.db-cache-time = 180000
janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true
# # SparkGraphComputer Configuration #
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
When we load the graph object in gremlin console, we can see properties are loaded correctly. But when we traverse the graph as mentioned in the documentation, we get cassandra connection error related to ssl config.
gremlin> graph=HadoopGraph.open('/janusgraph-full-0.5.2/conf/olap.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> graph.configuration()
//// i can see all the properties from the file loaded here
gremlin> g.V().limit(1)
07:34:44 WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.cassandra.svc.cluster.local/10.0.165.158:9042 (com.datastax.driver.core.exceptions.TransportException: [cassandra.cassandra.svc.cluster.local/10.0.165.158:9042] Connection has been closed))
Type ':help' or ':h' for help.
We could verify from cassandra logs that a connection was attempted but request was rejected for ssl reasons. Below are the logs from cassandra instance:
INFO [epollEventLoopGroup-2-4] 2023-05-02 07:34:58,809 Message.java:826 - Unexpected exception during request; channel = [id: 0xeb0e017f, L:/10.12.0.224:9042 ! R:/10.12.0.135:60316]
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0400000001000000500003000b43514c5f56455253494f4e0005332e302e30000e4452495645525f56455253494f4e0005332e392e30000b4452495645525f4e414d4500144461746153746178204a61766120447269766572
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1057) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) [netty-all-4.0.44.Final.jar:4.0.44.Final]
Finally found the missing piece
After trying several combinations to pass the ssl info the connection configuration, we were still not able to establish connection with Cassandra and successfully execute an OLAP query.
We posted this as a question on stackoverflow, discord channel and google groups hoping to receive some help from community. Finally got a response from the discord community member and it worked out. The discord channel for Janusgraph and Gremlin users is quite active. The configuration parameters which were needed to be populated for ssl connection were not mentioned in the documentation. They are there in the code and below is the reference. These however work with latest versions of Janusgraph and we verified this with 0.6.0 and 1.0.0-rc2 versions.
The OLAP connection configuration was updated with below mentioned entries:
cassandra.input.native.ssl.trust.store.password=cassandra123
Finally the updated OLAP traversal configuration looks like below:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
janusgraphmr.ioformat.conf.storage.backend=cql
janusgraphmr.ioformat.conf.storage.hostname=cassandra-headless.cassandra.svc.cluster.local
janusgraphmr.ioformat.conf.storage.port=9042
janusgraphmr.ioformat.conf.storage.username=cassandra
janusgraphmr.ioformat.conf.storage.password=cassa@2@2!
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE
janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/tmp/security/truststore
janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123
storage.cql.read-consistency-level=ONE
janusgraphmr.ioformat.conf.cache.db-cache = true
janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20
janusgraphmr.ioformat.conf.cache.db-cache-time = 180000
janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.native.keep.alive=true
cassandra.input.native.ssl.trust.store.path=/tmp/security/truststore
cassandra.input.native.ssl.trust.store.password=cassa@2@2!
storage.cql.protocol-version=V4
spark.master=local[*]
spark.executor.memory=3g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
spark.cassandra.input.fetch.size_in_rows=500
With the above configuration we were able to traverse the graph using OLAP traversal and achieve our objective.
Top comments (0)