Understanding SQL and Relational Databases — Part 2
An introduction on how to connect to and query SQL Databases
Introduction
The first step of using SQL will be querying data stored inside databases. To do this, you connect to the database with some programming tool, such as Python, which will be used in this blog post. Once you’re connected with the database, you will then be able to read and choose specific information, or even compose new information back into the database.
Connecting with a database
To start with, we will connect with our database by importing sqlite3. You will require a cursor object (cur) to get results. Cursor objects allow you to monitor which result set is which for a specific query.
import sqlite3
conn = sqlite3.connect('pubs_data.db')
cur = conn.cursor()
For this post, we will use the same “database” diagram from Part 1 of my SQL blog posts. The database is displayed below:
*Note* The actual database from the diagram is not in use on my machine, so there will be no outputs after the lines of code. This is an image from Google.
Querying
Since you’re now connected with the database, we will investigate how you can query the data inside.
With your cursor object, you can execute queries. However, the execute command itself just returns the cursor object. To see the outcomes, you must use the fetchall() method.
cur.execute("""SELECT * FROM titles LIMIT 10;""")cur.fetchall()
It’s also possible to combine the past two cells into one line:
cur.execute("""SELECT * FROM titles LIMIT 10;""").fetchall()
*Note* You’re probably wondering why the triple quotes are being used. Triple quotes are useful because they can utilize multiple lines inside the same string. Some SQL queries can be much longer than others, and in these cases it’s useful to use new lines for easier reading. Here’s an example:
cur.execute("""SELECT *
FROM titles
LIMIT 10;""").fetchall()
Creating a DataFrame from your results
The best thing to do after executing a query will be to turn the output(s) into pandas DataFrames. To do this, you wrap the c.fetchall() output with a pandas DataFrame constructor:
import pandas as pd
cur.execute("""SELECT * FROM titles LIMIT 10;""")
df = pd.DataFrame(cur.fetchall())
df.head()
*Tip* Performing this step will result in your column names being 0, 1, 2, etc. We can get the column names from the query by using list comprehension and calling cur.description:
cur.execute("""SELECT * FROM titles LIMIT 10;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df.head()
And that’s it! If you’ve followed these steps, then you have followed the guidelines on how to successfully connect to and query data from a SQL Relational Database. In the third and final part of this series of blog posts on SQL, I will be explaining how to filter, order, and group data. Thank you for reading!