Digital History
Databases
By Mark Ciotola
First published on September 23, 2019. Last updated on June 12, 2024.
Database Concepts
Databases are computerized systems that contain data. Database systems comprise the database application (a program) and data storage.
There are several important database concepts.
- A field is a container for an item of information. It is analogous to a variable in a program in that its contents can change.
- A record is collection of fields that comprise a set of data associated.
- A record may contain one or more fields, each called a key, that can be used to identify the record. At least one key should be unique for that record, which is then called the primary key.
In traditional databases, data is organized into tables. A database table has a special structure. Typically a database will comprise rows and columns. A row comprises a record. Each column represents a field. All of the records in a table will have all of the same fields, although the contents of a field may vary among records. Below is a table with three records and four fields.
University Name | City | State | Country |
Oregon State University | Bend | Oregon | USA |
San Francisco State University | San Francisco | California | USA |
Warwick University | Warwick | UK |
Spreadsheets resemble database tables. If used where one row strictly corresponds to one record and if columns are used consistently, then a spreadsheet table can be used as a database and it can sometimes be directly imported into a database system. However, many people do not use spreadsheets so strictly, so that importing them into a database often would cause havoc.
Relationships
It is possible to relate tables with each other. For example, a table of students and a table of course enrollments might be related by student ID #. This provides the ability to pull information from one table into another table to provide a richer set of information. Some database systems might have many related tables involving multiple relationships. For example, in the table below, the Recipe Steps table is related to the Products, Process and Resources tables to bring together a lot of different information involved in determining the energy or greenhouse gasses involved in manufacturing a product.
Reasons to Use Databases
Databases have several capabilities that add value to historical work. Databases are searchable. Unlike a document, it is possible to search one field at a time in a database. Database records can be sorted, usually by any of the fields. So if you have a contacts database, you could sort it by date entered, surname, or state. You can also perform multi-field searches, and sort the results first by one field and then another.
It is possible to relate one database table to another table. In that case the database is called a relational database.
Although a comma-separated value (CSV) field is not itself a database, it can contain the data from a database, where each field is separated by a comma and each record by a line return (line feed).
The ability to search databases is really important. After all, what’s the point of storing data if you can’t find and retrieve the data you require? Not much! There is a special language for searching databases called Structured Query Language, or SQL for short (pronounced “sequel” or S-Q-L).
Database Platforms
Endnote and BibTeX are essentially database applications to manage bibliographies, citations and references.
SQLite is a relatively simple desktop database system. It can store data in proper tables and can execute SQL queries.
SQLite is a desktop database that can create databases and perform SQL queries. It is relatively simple to set up, even though its interface can be confusing for novices.
MySQL is a popular open source SQL database. It typically requires a command line, a program or web application to access.
Other professional databases include Oracle, Filemaker and Access. Oracle is the leading “corporate” database platform. It is expensive! Airtable is a cloud-based database platform.
Custom-Developed Databases
Many companies have changed from their own custom databases to specific web applications that offer database features for a set of specific uses such as monitoring customer relations (Siebel, Salesforce), project management, running manufacturing (SAP) and tracking human resources (Peoplesoft). These platform typically claim they can handle most many types of operations for many different types of companies, but usually they are better for some uses and companies than others.
There has also been a movement from companies hosting applications that they purchase to simply renting applications over the web that are hosted by an outside service provider. This is called Software-As-A-Service (SaaS). It can lower costs in the short run and make set-up and maintenance much easier, but it means you are locked into paying if you want to keep using the system (versus a one-time payment for traditional applications) and also the vendor will have possession of your data.
Analysis: Databases Versus Parsing Programs
Queries upon data can be performed using both databases and parsing programs (such as those written in Perl). So what are why use one over the other? A database structures information better, such as into fields, each with their own data types. So when you do a search on a numerical field, you generally know that you are dealing with numerical data. You can also do field-specific searches.
However, databases often require more upfront work to accomplish that structuring. In addition to setting up the database itself, one must accurately enter the data into the database, field by field. Can’t the data entry process be automated to allow for bulk entry of documents and records and auto-population of specific fields? Yes, if the data has a consistent, preexisting structure. Often business and trade records will be recorded using a regular structure, so that a database script will know to look for a date in the first several characters of each line, a financial amount at the end of each line, and some descriptive information in between. However, a lot of data will not possess such preexisting structure and consistency.
A parsing program can search an entire document for something that looks like a year or financial amount and look for variations of words (assume that any four plus letter word that ends with “son” or “sen” is a surname). Also, parsing programs often run more efficiently than databases (which have a lot of “overhead”), so may be able to perform faster searches on large quantities of data.
Further Reading
- w3schools SQL Tutorial
« Level Up: Perl and Workflows | COURSE | Database Activities »