Getting Started with Apache Zeppelin and Elassandra with Instaclustr

Elassandra (Elasticsearch + Cassandra) is a fork of Elasticsearch modified to run on top of Apache Cassandra to provide advanced search features on Cassandra tables. In this tutorial we will walk you through the basic steps of setting up an Instaclustr Elassandra cluster with Zeppelin on Amazon Web Services (AWS) and how to query and visualize Elassandra indexes using Elasticsearch interpreter. The high-level steps are:

  1. Provision a cluster with Elassandra and Zeppelin
  2. Create a Zeppelin notebook based on Elasticsearch interpreter
  3. Add data to Elassandra using Zeppelin Elasticsearch interpreter
  4. Query and search data via Zeppelin notebook

 

1. Provision a cluster with Elassandra and Zeppelin

a) If you haven’t already signed up for an Instaclustr account, refer our support article to sign up and create an account.

b) Once you have signed up for Instaclustr and verified your email, log in to the Instaclustr console and click the Create Cassandra Cluster button.

creating_cluster_01_final.png

c) On the Create Cassandra Cluster page, enter an appropriate name and network address block for your cluster. Under Applications section, select:

  • Elassandra 2.4.2.13 (Cassandra 3.0.10) (preview)
  • Apache Zeppelin as an Add-on

application_section.png

d) Under Data Centre section, select:

  • Amazon Web Services as the Infrastructure Provider
  • A minimum node size of t2.medium

datacentre_section.png

e) Leave the other options as default. Accept the terms and conditions and click Create Cluster button.

create_cluster_button.png

The cluster will automatically provision and will be available for use once all nodes are in the running state.

 

2. Create a notebook based on Elasticsearch interpreter

a) Once all nodes in the cluster are in the running state, click the Zeppelin tab to get to its dashboard.

zepellin_tab_pix.png

b) You will be asked to provide Zeppelin account credentials which can be found on the Connection Info page.

zeppelin_credentials_pix.png

c) On the Zeppelin Dashboard, click Create new note. On the Create New Note dialog box, choose a name for the notebook, select elasticsearch as Default Interpreter and click Create Note button.

create_new_notebook.png

creating_cluster_01_final_1.png

d) The notebook has already been preconfigured to use Elasticsearch interpreter. Click the gear button on the top right of the notebook to see the enabled interpreters and more importantly Elasticsearch.

click_gear.png

e) Make sure Elasticsearch interpreter is at the top of the list and Cassandra interpreter is enabled. Click Save button to save the settings.

elasticsearch_to_top.png

 

3. Add data to Elassandra using Zeppelin Elasticsearch interpreter

To start off, let's index some data into Elassandra by running the commands below, one per paragraph.
Note: if Elasticsearch is not your default interpreter, you should have %elasticsearch at the top of each paragraph to get it to run.

index twitter/user/kimchy  { "name" : "Shay Banon" }

run_paragraph.png

 run_paragraph_result.png

Index some more data by running the following commands on the notebook:

index twitter/tweet/1 {
    "postDate": "2009-11-15T13:12:00",
    "message": "Trying out Zeppelin Elasticsearch interpreter, so far so good?"}
index twitter/tweet/2 {
    "postDate": "2009-11-15T14:12:12",
    "message": "Another tweet, will it be indexed?"}
index twitter/tweet/3 {
    "postDate": "2009-11-15T15:12:12",
    "message": "Give me my index and no query gets hurt!"}
index twitter/tweet/4 {
    "postDate": "2009-11-16T15:12:12",
    "message": "Index it before search it!"}

 

4. Query Elassandra data

Once the data is in Elassandra, we can search using Zeppelin, for example:

get twitter/user/kimchy

get_user.png

 

count twitter/tweet

count_tweet.png

 

search twitter/tweet

search_twitter.png

The result of a search query can also be viewed graphically (histograms, pie charts etc.) or downloaded as CSV (Comma Separated Values) or TSV (Tab Separated Values) file by clicking on the buttons marked in blue box in the above screenshot.

We can also search for specific words or strings:

search twitter/tweet { "query": { "query_string": { "query": "good" } } }
search twitter/tweet { "query": { "query_string": { "query": "it" } } }

Finally, to get the list of available commands, run:

help

help.png

 

5. Conclusion
In this tutorial you have learned how to:

  • Provision a cluster with Elassandra and Zeppelin
  • Create a Zeppelin notebook based on Elasticsearch interpreter
  • Add, query and search data via Zeppelin notebook

For more information, refer following resources:

Last updated:
If you have questions regarding this article, feel free to add it to the comments below.

0 Comments

Please sign in to leave a comment.