How to connect tableau to Cassandra using spark SQL?

Tableau software can connect to cassandra directly via ODBC Driver but it’s not efficient, you should connect it via Spark SQL Thrift server to have more performance, SST is a JDBC/ODBC server allowing JDBC and ODBC interfaces for client connections like tableau.

First you need to install a Cassandra cluster and spark cluster connected with datastax spark Cassandra connector:

Cassandra connector

To connect spark to a Cassandra cluster, the Cassandra Connector will need to be added to the Spark project. DataStax provides their own Cassandra Connector on GitHub and we will use that:

  1. Clone the Spark Cassandra Connector repository: https://github.com/datastax/spark-cassandra-connector (ensure the compatibility with your spark version)
  2. cd into “spark-cassandra-connector”
  3. Build the Spark Cassandra Connector
  4. Execute the command “./sbt/sbt assembly”
  5. This should output compiled jar files to the directory named “target”. There will be two jar files, one for Scala and one for Java.
  6. The jar we are interested in is: “assembly-1.6.0-M2-11-g088f482.jar” the one for Scala.
  7. Move the jar file into an easy to find directory: I put mine into ~/data/spark/

Configuration Spark

Add the following lines to spark-defaults.conf
spark.driver.extraClassPath /data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar
spark.executor.extraClassPath /data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar
spark.cassandra.connection.host cassandra_host1, cassandra_host2

Start spark and spark-thrift

./sbin/start-thriftserver.sh –hiveconf hive.server2.thrift.bind.host master_host –hiveconf
hive.server2.thrift.port 10000 –jars
/data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar –driver-class-path
/data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar

Start Cassandra

Connect with Beeline to Spark Thriftserver

belline
!connect jdbc:hive2://master_host:10000
CREATE TABLE users_table USING org.apache.spark.sql.cassandra OPTIONS (cluster ‘ctest’, keyspace ‘ktest, table ‘users);
cache table users_table;
create a permanent table
create table users_per as select * from users_table;

Start tableau and select Spark SQL as connection

spark

Query users per using Tableau

Useful links

http://www.tableau.com/fr-fr/about/blog/2014/10/tableau-spark-sql-big-data-just-got-even-more-supercharged-33799
http://www.planetcassandra.org/blog/kindling-an-introduction-to-spark-with-cassandra/
https://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/spark/sparkJdbcBeeline.html

Share
Mohammed LABDOUI
Mohammed LABDOUI

3449

Comments

  1. Hello very interesting post. What do you mean by efficiency? are referring response time? What do you think could be the reason of that inefficiency?

    Thanks for your response

Leave a Reply

Your email address will not be published. Required fields are marked *