How to connect tableau to Cassandra using spark SQL?

Tableau software can connect to cassandra directly via ODBC Driver but it’s not efficient, you should connect it via Spark SQL Thrift server to have more performance, SST is a JDBC/ODBC server allowing JDBC and ODBC interfaces for client connections like tableau.

First you need to install a Cassandra cluster and spark cluster connected with datastax spark Cassandra connector:

Cassandra connector

To connect spark to a Cassandra cluster, the Cassandra Connector will need to be added to the Spark project. DataStax provides their own Cassandra Connector on GitHub and we will use that:

  1. Clone the Spark Cassandra Connector repository: (ensure the compatibility with your spark version)
  2. cd into “spark-cassandra-connector”
  3. Build the Spark Cassandra Connector
  4. Execute the command “./sbt/sbt assembly”
  5. This should output compiled jar files to the directory named “target”. There will be two jar files, one for Scala and one for Java.
  6. The jar we are interested in is: “assembly-1.6.0-M2-11-g088f482.jar” the one for Scala.
  7. Move the jar file into an easy to find directory: I put mine into ~/data/spark/

Configuration Spark

Add the following lines to spark-defaults.conf
spark.driver.extraClassPath /data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar
spark.executor.extraClassPath /data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar cassandra_host1, cassandra_host2

Start spark and spark-thrift

./sbin/ –hiveconf master_host –hiveconf
hive.server2.thrift.port 10000 –jars
/data/spark/spark-cassandra-connector-assembly-1.6.0-M2-11-g088f482.jar –driver-class-path

Start Cassandra

Connect with Beeline to Spark Thriftserver

!connect jdbc:hive2://master_host:10000
CREATE TABLE users_table USING org.apache.spark.sql.cassandra OPTIONS (cluster ‘ctest’, keyspace ‘ktest, table ‘users);
cache table users_table;
create a permanent table
create table users_per as select * from users_table;

Start tableau and select Spark SQL as connection


Query users per using Tableau

Useful links

Mohammed LABDOUI
Mohammed LABDOUI



  1. Hello very interesting post. What do you mean by efficiency? are referring response time? What do you think could be the reason of that inefficiency?

    Thanks for your response

  2. Thanks, Really nice explanation, We also tried to connect Cassandra with Saprk thrift server in Spark cluster(Three node), But did not work for us. Could you provide your spark and scala version.

  3. My config details are
    Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
    spark: version 2.2.0
    Cassandra 3.11.0
    Three node Spark cluster, but while connect with single node spark with beeline we are able to connect cassandra.

    Please suggest.

Leave a Reply

Your email address will not be published. Required fields are marked *