DEV Community

CheriseFoster
CheriseFoster

Posted on

Creating a database and seeding data with SQLAlchemy

Databases are a critical component for storing and managing data in modern software development. A popular Python library called SQLAlchemy provides an Object-Relational Mapping (ORM) system that simplifies database interactions. In this blog post, we'll explore a code example I created that demonstrates how to create a database using SQLAlchemy, define tables and relationships, and seed it with sample data.

But first, what does it mean to "seed" data?

Seeding data simply means creating sample instances of data. It involves inserting predefined data into the database tables created.


Creating the database schema:

To begin, I drew an example on a piece of paper the tables I wanted to create and the columns I wanted to include, as well as defined what type of relationships I wanted my tables to have with each other. It's important to stay organized and have a plan before writing the code. I created a models.py file to build my tables, and a flights.db file so that I can view my tables as actual tables. From there, start by importing necessary modules and defining the database schema using SQLAlchemy's declarative_base() function.

My schema includes 3 tables: Flight, Passenger, and Reservations. Each table is its own entity, and relationships are established between tables with foreign keys.

In the code below, I established a Flight class and created a couple of columns I wanted to be on my table with the corresponding data type:

class Flight(Base):
    __tablename__ = 'flights'

    id = Column(Integer(), primary_key=True)
    airline = Column(String())
    flight_number = Column(Integer())
    origin = Column(String())
    destination = Column(String())
Enter fullscreen mode Exit fullscreen mode

Next, I used Python's repr method on classes which defines how an object should be represented when its string representation is requested (such as when the object is printed).

def __repr__(self):
        return (
            f"ID: {self.id}, " \
           + f"Airline: {self.airline}, " \
           + f"Flight Number: {self.flight_number}, " \
           + f"Origin: {self.origin}, " \
           + f"Destination: {self.destination}, " \
Enter fullscreen mode Exit fullscreen mode

After creating my Flight table, I created a Passenger and Reservation table using the same format and used foreign keys to create a one-to-many relationships between my desired tables. Make sure to import relationship and backref from sqlalchemy.orm to define table relations.


Seeding the data:

Now that the tables have been created, I created a separate file called seed.py to seed my database with sample data. Import create_engine and sessionmaker, as well as your tables from your models file. Import faker from Faker so that we can generate fake sample data. It doesn't hurt to import random too, just in case you need to generate some random numbers (for a flight number, per say). We are going to write some code to check if the current file is being executed as the main script:

if __name__ == "__main__":
    engine = create_engine("sqlite:///flights.db")
    Session = sessionmaker(bind=engine)
    session = Session()

    fake = Faker()
Enter fullscreen mode Exit fullscreen mode

Let's break this down:

  • if __name__ == "__main__": --> __name__ holds the name of the current module. The line checks if the current module is the main module being executed.
  • engine = create_engine("sqlite:///flights.db"): --> This line of code is what creates an SQLAlchemy engine object that will connect to our flight.db database.
  • Session = sessionmaker(bind=engine) --> Here, a session object is created by invoking sessionmaker. This is how we interact with the database.
  • session = Session() --> We invoke sessionmaker by setting it to an object called session. Using session, we can execute queries and perform database operations. Similarly, fake = Faker() is invoked and set to an object to be used in our application as well.

We want to make sure that we write a session query to clear the database before each seeding, otherwise the database will keep the previous fake data we generated. This should be done for each table that was created:
session.query(Flight).delete()

For this specific table, I created a list of airlines that contains different airline names as strings, since there is not currently any way to generate fake airline names.

Create an empty list to append the information we are going to generate, I called mine flights. I want to be able to loop through all of my data, so let's use a for loop:

for _ in range(75):
        flight = Flight(
            airline = random.choice(list_of_airlines),
            flight_number = random.randint(100, 9999),
            origin = f"{fake.city()}",
            destination = f"{fake.city()}"
Enter fullscreen mode Exit fullscreen mode

Since Flight was imported from models.py, I can create a new instance of the Flight class and set it to the "flight" variable. This will create a list of 75 instances of flight information, where it loops through the list of airlines I created, generate a random flight number between the set parameters (told you to import random just in case!), and, using Faker, create fake cities for the origin and destination.

To finish this code off, we need to push these changes to our database and create instances of data.

session.add(flight)
session.commit()
flights.append(flight)
Enter fullscreen mode Exit fullscreen mode

The first line adds the "flight" object to the session, which is an instance of the "Flight" class, and stages the object for insertion into the database. The commit() method persists the changes made within the session. Finally, append the "flight" object to the "flights" list we created.


Navigate into the flights.db file, and you should be able to see your flights table with data inserted. I viewed my table using the SQLite Viewer extension on VS code:

Image description

Top comments (0)