Abstract
The Ibis Project provides the power of Python Analytics with SQL. In this short article, we'll see how to use Ibis to connect to SingleStoreDB Cloud and run a few commands to demonstrate the integration.
Introduction
In previous articles, we've discussed various options to connect to SingleStoreDB. These have included using, for example:
Another connection option is the Ibis backend being developed by SingleStore Labs. In this short article, we'll test this implementation with some commands.
Create a SingleStoreDB Cloud account
A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Ibis Demo Group as our Workspace Group Name and ibis-demo as our Workspace Name. We'll make a note of our password and host name.
We'll use the SQL Editor to create a new database, as follows:
CREATE DATABASE IF NOT EXISTS iris_db;
Create a Deepnote account
We'll create a free account on the Deepnote website. Once logged in, we'll create a new Deepnote project to give us a new notebook. We'll also create a new directory called data
. We'll use the Iris flower data set and upload the CSV file into the data
directory.
Deepnote notebook
First, we'll install the package:
!pip install ibis-singlestoredb
Deepnote will prompt us to add this to the requirements.txt
file.
Next, we'll create some imports:
import ibis
import pandas as pd
ibis.options.interactive = True
We'll now read the iris.csv
file and look at the data:
iris_df = pd.read_csv("data/iris.csv")
iris_df.head(5)
The result should be similar to the following:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Next, let's check the Pandas Dataframe:
iris_df.info()
The result should be as follows:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
We'll now create a connection to SingleStoreDB:
conn = ibis.singlestoredb.connect(
"admin:<password>@<host>:3306/iris_db"
)
We'll replace the <password>
and <host>
with the values from our SingleStoreDB Cloud account.
Now we'll create a new table in SingleStoreDB using the Pandas Dataframe and then look at the data:
iris_tbl = conn.create_table("iris", iris_df, force = True)
iris_tbl.head(5)
The result should be similar to the following:
sepal_length sepal_width petal_length petal_width species
0 4.6 3.1 1.5 0.2 Iris-setosa
1 5.4 3.9 1.3 0.4 Iris-setosa
2 5.1 3.5 1.4 0.3 Iris-setosa
3 4.9 3.1 1.5 0.1 Iris-setosa
4 4.5 2.3 1.3 0.3 Iris-setosa
We can get some further details:
iris_tbl.info()
The result should be as follows:
Summary of iris
150 rows
+--------------+------------------------+---------+---------+
| Name | Type | # Nulls | % Nulls |
+--------------+------------------------+---------+---------+
| sepal_length | Float64(nullable=True) | 0 | 0.00 |
| sepal_width | Float64(nullable=True) | 0 | 0.00 |
| petal_length | Float64(nullable=True) | 0 | 0.00 |
| petal_width | Float64(nullable=True) | 0 | 0.00 |
| species | String(nullable=True) | 0 | 0.00 |
+--------------+------------------------+---------+---------+
Basic schema information can also be viewed:
iris_tbl.schema()
The result should be as follows:
ibis.Schema {
sepal_length float64
sepal_width float64
petal_length float64
petal_width float64
species string
}
We can also find the details of the CREATE TABLE
statement:
conn.show.create_table("iris")
The result should be as follows:
| Name | CreateTable |
---------------------------------------------------------------------------------------------------------------------------------------
| iris | CREATE TABLE `iris` (
`sepal_length` float DEFAULT NULL,
`sepal_width` float DEFAULT NULL,
`petal_length` float DEFAULT NULL,
`petal_width` float DEFAULT NULL,
`species` text CHARACTER SET utf8 COLLATE utf8_general_ci,
KEY `__UNORDERED` () USING CLUSTERED COLUMNSTORE,
SHARD KEY ()
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' |
We can count the number of different species:
iris_tbl.species.value_counts()
The result should be as follows:
species count
0 Iris-virginica 50
1 Iris-setosa 50
2 Iris-versicolor 50
Filtering is also possible. Here, for example, we want to limit the results by Iris-versicolor
where the petal_length
is greater than 4.5:
res = iris_tbl[iris_tbl.species.like("Iris-versicolor")][iris_tbl.petal_length > 4.5]
Using the filter we can get a count:
res.count()
The result should be as follows:
14
Sorting is also possible. This is in ascending order for petal_length
:
res.sort_by("petal_length")
The result should be as follows:
sepal_length sepal_width petal_length petal_width species
0 6.6 2.9 4.6 1.3 Iris-versicolor
1 6.1 3.0 4.6 1.4 Iris-versicolor
2 6.5 2.8 4.6 1.5 Iris-versicolor
3 6.1 2.8 4.7 1.2 Iris-versicolor
4 6.7 3.1 4.7 1.5 Iris-versicolor
5 7.0 3.2 4.7 1.4 Iris-versicolor
6 6.3 3.3 4.7 1.6 Iris-versicolor
7 6.1 2.9 4.7 1.4 Iris-versicolor
8 5.9 3.2 4.8 1.8 Iris-versicolor
9 6.8 2.8 4.8 1.4 Iris-versicolor
10 6.9 3.1 4.9 1.5 Iris-versicolor
11 6.3 2.5 4.9 1.5 Iris-versicolor
12 6.7 3.0 5.0 1.7 Iris-versicolor
13 6.0 2.7 5.1 1.6 Iris-versicolor
Finally, we can convert the results back into a Pandas Dataframe, for additional processing, as follows:
another_iris_df = res.execute()
Summary
The Ibis Project provides another way to work with SingleStoreDB using Python and Pandas. The SingleStore Labs GitHub repo contains example notebooks that use other popular datasets. Check it out.
Top comments (0)