DEV Community

Sergei Semenkov
Sergei Semenkov

Posted on • Edited on

How to explore ClickHouse data via Excel PivotTable using Mondrian

ClickHouse is a very powerful OLAP engine. Microsoft Excel is one of the world's most popular and powerful business programs. There are several ways to bring ClickHouse data to Excel spreadsheets.

One way to do this is using ClickHouse ODBC driver. In order to do this you have to install the ODBC driver and create ClickHouse data source in Excel. Then you can explore data as tables or you can run a query on ClickHouse and browse results.

The other way to explore ClickHouse data in Excel is using Mondrian. This approach allows you to describe ClickHouse data as a multidimensional model and give access to this model through XMLA standart. It also allows client applications to run MDX queries on ClickHouse data. You represent ClickHouse data as cubes and dimensions and then browse those cubes in Excel PivotTable.

This article describes how to install Mondrian, create a Mondrian datasource for ClickHouse database and connect Excel PivotTable to this datasource.

We will use the eMondrian fork of the Mondrian ROLAP engine. The main feature of this version is support of ADOMD.NET and Excel clients. eMondrian already includes ClickHouse driver.

Mondrian can be hosted on different java servers (Apache Tomcat, Jetty, Tiny Java Web Server). This example describes the case with Ubuntu operating system and Apache Tomcat java server.

In this example Tomcat is installed in /opt/tomcat/latest/ directory.

You have to change the following command to your Tomcat path.



export TOMCATDIR="/opt/tomcat/latest"


Enter fullscreen mode Exit fullscreen mode

Next commands download the Mondrian archive, copy it to Tomcat directory and restart Tomcat java server.



sudo wget https://github.com/SergeiSemenkov/eMondrian/releases/latest/download/emondrian.war
sudo cp emondrian.war $TOMCATDIR/webapps
sudo systemctl restart tomcat


Enter fullscreen mode Exit fullscreen mode

Now, let's connect Excel to the Mondrian demo Foodmart database.

In order to get address to your Mondrian XMLA service you have to insert your Ubuntu server ip in following template
http://<your_server_address>:8080/emondrian/xmla
To get your server ip, run command



hostname -I


Enter fullscreen mode Exit fullscreen mode

In this example Mondrian server address is
http://192.168.100.4:8080/emondrian/xmla

Open Excel, select the Data tab, then Get Data -> From Database -> From Analysis Services.

Paste address of Mondrian XMLA service as Server name and click Next.

Then choose a cube and click Finish. The PivotTable with cube data will appear.

Next, we will create a Mondrian schema for ClickHouse example dataset OnTime. The schema defines two cubes. Cube OnTime is based on data from ontime table. Cube ViewOnTime is an example of how to create a cube based on a query instead of a table.

Following command will create new schema file OnTime.xml.



sudo nano $TOMCATDIR/webapps/emondrian/WEB-INF/schema/OnTime.xml


Enter fullscreen mode Exit fullscreen mode

Copy xml to this file and press Ctrl-S (Save) and Ctrl-X(Exit).



<?xml version="1.0"?>
<Schema name="OnTime">
<Cube name="OnTime">
  <Table name="ontime"/>
  <Dimension name="Date">
    <Hierarchy hasAll="true" allMemberName="All Dates">
      <Level name="Year" column="Year" uniqueMembers="true" type="Numeric"/>
      <Level name="Quarter" column="Quarter" uniqueMembers="false" type="Numeric">
        <CaptionExpression>
          <SQL dialect="generic">concat(toString(Quarter), ' ', toString(Year))</SQL>
        </CaptionExpression>
      </Level>
      <Level name="Month" column="Month" uniqueMembers="false" type="Numeric">
        <CaptionExpression>
          <SQL dialect="generic">
            concat(          
            CASE toString(Month) WHEN '1' THEN 'January'
            WHEN '2' THEN 'February'
            WHEN '3' THEN 'March'
            WHEN '4' THEN 'April'
            WHEN '5' THEN 'May'
            WHEN '6' THEN 'June'
            WHEN '7' THEN 'July'
            WHEN '8' THEN 'August'
            WHEN '9' THEN 'September'
            WHEN '10' THEN 'October'
            WHEN '11' THEN 'November'
            WHEN '12' THEN 'December'
            ELSE 'Unknown' END,
            ' ',
            toString(Year))
          </SQL>
        </CaptionExpression> 
      </Level>
      <Level name="Day" column="FlightDate" uniqueMembers="true" type="Date">
        <CaptionExpression>
          <SQL dialect="generic">concat(toString(Year), '-', toString(Month), '-', toString(DayofMonth))</SQL>
        </CaptionExpression>
      </Level>
    </Hierarchy>
  </Dimension>
  <Dimension name="Unique Carrier">
    <Hierarchy hasAll="true" allMemberName="All Unique Carriers">
      <Level name="Unique Carrier" uniqueMembers="true">
        <KeyExpression>
          <SQL dialect="generic">toStringCutToZero(UniqueCarrier)</SQL>
        </KeyExpression>
      </Level>
    </Hierarchy>
  </Dimension>
  <Dimension name="Destination">
    <Hierarchy hasAll="true" allMemberName="All Destinations">
      <Level name="Destination State Name" column="DestStateName" uniqueMembers="true"/>
      <Level name="Destination City" column="DestCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Dimension name="Origin">
    <Hierarchy hasAll="true" allMemberName="All Origins">
      <Level name="Origin State Name" column="OriginStateName" uniqueMembers="true"/>
      <Level name="Origin City" column="OriginCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Measure name="Flights" column="Flights" aggregator="sum" formatString="#,###"/>
  <Measure name="Air Time" column="AirTime" aggregator="sum" formatString="#,###"/>
  <Measure name="Distance" column="Distance" aggregator="sum" formatString="#,###"/>
  <Measure name="Dep Delay" column="DepDelay" aggregator="sum" formatString="#,###"/>
  <Measure name="Carrier Delay" column="CarrierDelay" aggregator="sum" formatString="#,###"/>
  <Measure name="Weather Delay" column="WeatherDelay" aggregator="sum" formatString="#,###"/>
</Cube>
<Cube name="ViewOnTime">
<View alias="ViewOnTime">
<SQL dialect="generic">
<![CDATA[select * from ontime]]>
</SQL>
</View>
  <Dimension name="Destination City Name" foreignKey="DestCityName">
    <Hierarchy hasAll="true" allMemberName="All Destinations">
      <Level name="Destination City Name" column="DestCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Dimension name="Origin City Name" foreignKey="OriginCityName">
    <Hierarchy hasAll="true" allMemberName="All Origin Cities">
      <Level name="Origin City Name" column="OriginCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Measure name="Dep Delay" column="DepDelay" aggregator="sum" formatString="#,###"/>
</Cube>
</Schema>


Enter fullscreen mode Exit fullscreen mode

You have to give rights to this file to the user under which Tomcat server is running. In the following command change the user tomcat:tomcat name to the name under which your Tomcat server works and run it.



sudo chown -RH tomcat:tomcat $TOMCATDIR/webapps/emondrian/WEB-INF/schema/OnTime.xml


Enter fullscreen mode Exit fullscreen mode

Now, we will edit the Mondrian datasource configuration file. Run following command.



sudo nano $TOMCATDIR/webapps/emondrian/WEB-INF/datasources.xml


Enter fullscreen mode Exit fullscreen mode

Replace the DataSources tag with following xml.



<DataSources>
  <DataSource>
    <DataSourceName>OnTime</DataSourceName>
    <DataSourceDescription>ClickHouse Sample Data</DataSourceDescription>
    <URL>http://localhost:8080/emondrian/xmla</URL>
    <DataSourceInfo>Provider=mondrian;Jdbc=jdbc:clickhouse://192.168.100.8:8123/datasets;JdbcDrivers=ru.yandex.clickhouse.ClickHouseDriver</DataSourceInfo>
    <ProviderName>Mondrian</ProviderName>
    <ProviderType>MDP</ProviderType>
    <AuthenticationMode>Unauthenticated</AuthenticationMode>
    <Catalogs>
        <Catalog name="OnTime">
            <Definition>/WEB-INF/schema/OnTime.xml</Definition>
        </Catalog>
    </Catalogs>
  </DataSource>
</DataSources>


Enter fullscreen mode Exit fullscreen mode

Edit connection string to your ClickHouse dataset in DataSourceInfo tag using following syntax:
jdbc:clickhouse://<host>:<port>[/<database>][?user=<my_user>&password=<my_password>]

Save (Ctrl-S) and close (Ctrl-X) datasources file.
Restart tomcat server.



sudo systemctl restart tomcat


Enter fullscreen mode Exit fullscreen mode

Create a new connection to Mondrian server in Excel to see cubes from your new schema. Select one to browse its data in PivotTable.

Try and create your one Mondrian schema for ClickHouse datasets (documentation).

If you have any questions or suggestions, feel free to contact me at siarhei.semiankou@gmail.com.

Top comments (5)

Collapse
 
vbabin profile image
Владислав Бабин

Hi,
Thanks for sharing the solution.
I tried to reproduce your test case and found that Mondrian + ClickHouse are failing when it comes to dimension hierarchies. If you take look at the screenshot below you will see that January-December total for 2017 is exactly the same as the grand total for all years. There must be an error in how Mondrian queries ClickHouse. Did you encounter such an issue?

Screenshot

Collapse
 
vbabin profile image
Владислав Бабин

Well... answering my own question: the problem is with schema file. Everything works fine if I change Date dimension definition to:

   <Dimension name="Date" type='TimeDimension'>
     <Hierarchy hasAll="true" allMemberName="All Dates" primaryKey="FlightDate">
       <Level name="Year" column="Year" uniqueMembers="true" levelType="TimeYears" type="Numeric" />
       <Level name="Month" column="Month" uniqueMembers="false" ordinalColumn="Month" levelType="TimeMonths" type="Numeric">
         <CaptionExpression>
           <SQL dialect="generic">
             CASE Month WHEN 1 THEN 'January'
             WHEN 2 THEN 'February'
             WHEN 3 THEN 'March'
             WHEN 4 THEN 'April'
             WHEN 5 THEN 'May'
             WHEN 6 THEN 'June'
             WHEN 7 THEN 'July'
             WHEN 8 THEN 'August'
             WHEN 9 THEN 'September'
             WHEN 10 THEN 'October'
             WHEN 11 THEN 'November'
             WHEN 12 THEN 'December'
             ELSE 'Unknown' END
           </SQL>
         </CaptionExpression>
       </Level>
       <Level name="Day" column="DayofMonth" uniqueMembers="false" ordinalColumn="DayofMonth" nameColumn="FlightDate" levelType="TimeDays" type="Numeric"/>
    </Hierarchy>
  </Dimension>
Enter fullscreen mode Exit fullscreen mode

Sergei, thanks again for sharing your project. Very useful indeed

Collapse
 
sergeisemenkov profile image
Sergei Semenkov

Thank you Vlad! Really, there were issues in Date dimension. I've fixed them in article. You can check it out.

Collapse
 
porechajp profile image
Jatan Porecha

I am running ClickHouse and eMondrian on Linux VM, and it is working. However ClickHouse is slow and could see that it's using around 600 MB memory. Is it possible to let ClickHouse use more memory and speed up the search.

Collapse
 
sergeisemenkov profile image
Sergei Semenkov

There are too many different causes that can slow queries. To get an answer we have to look into your case particularly. If you wish you can contact me directly at (semenkovsergei@mail.ru).