Abstract
The Ibis Project provides the power of Python Analytics with SQL. In this short article, we'll see how to use Ibis to connect to SingleStoreDB Cloud and run a few commands to demonstrate the integration.
The notebook file used in this article is available on GitHub.
Introduction
In previous articles, we've discussed various options to connect to SingleStoreDB. These have included using, for example:
Another connection option is the Ibis backend being developed by SingleStore Labs. In this short article, we'll test this implementation with some commands.
Create a SingleStoreDB Cloud account
A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use the Free Shared Tier and take the default names for the Workspace and Database.
Notebook
First, we'll create some imports:
import ibis
import pandas as pd
ibis.options.interactive = True
We'll now read the iris.csv
file and look at the data:
url = "https://gist.githubusercontent.com/VeryFatBoy/9af771d443f5ec4dd6eec8d69a062638/raw/c03ef25a97f23a48ee408ac02114195b663a2364/iris.csv"
iris_df = pd.read_csv(url)
iris_df.head(5)
The result should be similar to the following:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Next, let's check the Pandas Dataframe:
iris_df.info()
The result should be as follows:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
We'll now create a connection to SingleStoreDB:
conn = ibis.singlestoredb.connect(connection_url)
Now we'll create a new table in SingleStoreDB using the Pandas Dataframe and then look at the data:
iris_tbl = conn.create_table("iris", iris_df, force = True)
iris_tbl.head(5)
The result should be similar to the following:
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ sepal_length ┃ sepal_width ┃ petal_length ┃ petal_width ┃ species ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │ string │
├──────────────┼─────────────┼──────────────┼─────────────┼─────────────────┤
│ 4.9 │ 3.0 │ 1.4 │ 0.2 │ Iris-setosa │
│ 5.0 │ 3.4 │ 1.5 │ 0.2 │ Iris-setosa │
│ 4.9 │ 3.1 │ 1.5 │ 0.1 │ Iris-setosa │
│ 4.6 │ 3.6 │ 1.0 │ 0.2 │ Iris-setosa │
│ 7.0 │ 3.2 │ 4.7 │ 1.4 │ Iris-versicolor │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────────────┘
We can get some further details:
iris_tbl.info()
The result should be as follows:
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┓
┃ name ┃ type ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━┩
│ string │ string │ boolean │ int64 │ int64 │ float64 │ int8 │
├──────────────┼─────────┼──────────┼───────┼───────────┼───────────┼──────┤
│ sepal_length │ float64 │ True │ 0 │ 150 │ 0.0 │ 0 │
│ sepal_width │ float64 │ True │ 0 │ 150 │ 0.0 │ 1 │
│ petal_length │ float64 │ True │ 0 │ 150 │ 0.0 │ 2 │
│ petal_width │ float64 │ True │ 0 │ 150 │ 0.0 │ 3 │
│ species │ string │ True │ 0 │ 150 │ 0.0 │ 4 │
└──────────────┴─────────┴──────────┴───────┴───────────┴───────────┴──────┘
Basic schema information can also be viewed:
iris_tbl.schema()
The result should be as follows:
ibis.Schema {
sepal_length float64
sepal_width float64
petal_length float64
petal_width float64
species string
}
We can also find the details of the CREATE TABLE
statement:
conn.show.create_table("iris")
The result should be as follows:
| Name | CreateTable |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| iris | CREATE TABLE `iris` ( `sepal_length` double DEFAULT NULL, `sepal_width` double DEFAULT NULL, `petal_length` double DEFAULT NULL, `petal_width` double DEFAULT NULL, `species` text CHARACTER SET utf8 COLLATE utf8_general_ci, SORT KEY `__UNORDERED` () , SHARD KEY () ) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' |
We can count the number of different species:
iris_tbl.species.value_counts()
The result should be as follows:
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────────────┼───────────────┤
│ Iris-virginica │ 50 │
│ Iris-setosa │ 50 │
│ Iris-versicolor │ 50 │
└─────────────────┴───────────────┘
Filtering is also possible. Here, for example, we want to limit the results by Iris-versicolor
where the petal_length
is greater than 4.5:
res = iris_tbl[iris_tbl.species.like("Iris-versicolor")][iris_tbl.petal_length > 4.5]
Using the filter we can get a count:
res.count()
The result should be as follows:
14
Sorting is also possible. This is in ascending order for petal_length
:
res.order_by("petal_length")
The result should be as follows:
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ sepal_length ┃ sepal_width ┃ petal_length ┃ petal_width ┃ species ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │ string │
├──────────────┼─────────────┼──────────────┼─────────────┼─────────────────┤
│ 6.6 │ 2.9 │ 4.6 │ 1.3 │ Iris-versicolor │
│ 6.5 │ 2.8 │ 4.6 │ 1.5 │ Iris-versicolor │
│ 6.1 │ 3.0 │ 4.6 │ 1.4 │ Iris-versicolor │
│ 6.3 │ 3.3 │ 4.7 │ 1.6 │ Iris-versicolor │
│ 6.1 │ 2.8 │ 4.7 │ 1.2 │ Iris-versicolor │
│ 7.0 │ 3.2 │ 4.7 │ 1.4 │ Iris-versicolor │
│ 6.1 │ 2.9 │ 4.7 │ 1.4 │ Iris-versicolor │
│ 6.7 │ 3.1 │ 4.7 │ 1.5 │ Iris-versicolor │
│ 5.9 │ 3.2 │ 4.8 │ 1.8 │ Iris-versicolor │
│ 6.8 │ 2.8 │ 4.8 │ 1.4 │ Iris-versicolor │
│ … │ … │ … │ … │ … │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────────────┘
Finally, we can convert the results back into a Pandas Dataframe, for additional processing, as follows:
another_iris_df = res.execute()
Summary
The Ibis Project provides another way to work with SingleStoreDB using Python and Pandas. The SingleStore Labs GitHub repo contains example notebooks that use other popular datasets. Check it out.
Top comments (0)