Skip to content

Commit 56205c4

Browse files
authored
Merge pull request #101 from galitz-matt/staging
Move databases howto from main to learning site
2 parents 73e7ee9 + ed60c8d commit 56205c4

File tree

5 files changed

+273
-0
lines changed

5 files changed

+273
-0
lines changed
Binary file not shown.
Loading
+258
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
+++
2+
type = "howto"
3+
date = "2023-01-20T00:00:00-05:00"
4+
tags = [ "databases", "howto" ]
5+
category = ["howto"]
6+
draft = false
7+
title = "Introduction to Databases"
8+
author = "RC Staff"
9+
+++
10+
11+
<p>
12+
There are two main families of databases: Relational and NoSQL.
13+
</p>
14+
<ul>
15+
<li><b>Relational databases</b> store information in an orderly, column, row, and table schema. They “relate” the tables together to present different views of the data.
16+
<li><b>NoSQL databases</b> are much less structured. This means they can store different data alongside each other – which makes things both easier to store but harder to query across.
17+
</ul>
18+
<p>
19+
There are additional types of databases, such as <b>ledger</b>, <b>time-series</b> and others. Those are beyond the scope of this introduction.
20+
</p>
21+
22+
# Relational Databases (RDBMS)
23+
24+
Most users have at least heard of relational databases such as:
25+
26+
* MySQL / MariaDB
27+
* PostgreSQL
28+
* Microsoft SQL Server
29+
* Oracle
30+
31+
Relational databases operate on the concepts of tables, relations, schemas, data types, indices, SQL, joins and basic "CRUD" operations.
32+
33+
C = Create (Insert)
34+
R = Read (Select)
35+
U = Update (Update)
36+
D = Delete (Delete)
37+
38+
Take the example of an online store, where data revolves around the ideas of items, orders and customers. When a customer makes a purchase in our store, the data from the transaction is actually broken apart into tables of related data. Here’s one way of seeing that process:
39+
40+
![](/notes/databases-intro/reldb.png)
41+
42+
The database for such an online store might have a handful of related tables:
43+
44+
1. **Orders**
45+
2. **Customers**
46+
3. **Credit Cards**
47+
4. **Items**
48+
49+
Relational database tables use unique keys as a way to relate one table with another, and so the "orders" table
50+
might simply aggregate keys drawn from other tables for each order. This allows each table to have a clear definition
51+
of what data fields, and their data types, are expected with every transaction. Data coming in must be broken apart
52+
to conform to this data structure, and data going out must be drawn back together from across the tables.
53+
54+
But this “breaking apart” process is actually an intensive, time-consuming process. Data being sent off to
55+
any particular table has to be validated by data type (strings, integers, dates, decimals, binary, etc.), length,
56+
and NULL before it can be inserted into a particular data table. This happens across multiple tables at
57+
the same time, and ensures that the entire transaction completes successfully or is rolled back.
58+
59+
<b>Impedance Mismatch</b> - a set of conceptual and technical difficulties that are often encountered when interacting with a relational database management system.
60+
61+
SQL, "structured query language" is the language spoken by most relational databases. While there are slight variations
62+
in SQL syntax between RDBMS platforms (a semicolon here, a percent sign there), they all generally read the same to
63+
anyone familiar with general SQL queries.
64+
65+
Create a table:
66+
```sql
67+
create table researchers
68+
(researcherID int NOT NULL AUTO_INCREMENT,
69+
first varchar(15),
70+
last varchar(20),
71+
email varchar(30),
72+
age int,
73+
PRIMARY KEY (ID)
74+
);
75+
```
76+
77+
Insert an item into a table:
78+
```sql
79+
insert into researchers
80+
(first, last, email, age)
81+
values ('Jane', 'Doe', '[email protected]', 34);
82+
```
83+
84+
Select (read) all items from a table:
85+
```sql
86+
select * from researchers;
87+
```
88+
89+
Select (read) a single item from a table:
90+
```sql
91+
select * from researchers where researcherID = 147;
92+
select * from researchers where first = 'Jane';
93+
select first, last from researchers where age = 34;
94+
```
95+
96+
---
97+
98+
# NoSQL Databases
99+
100+
NoSQL databases come in at least two main groupings: **Aggregate oriented** or **Node-Arc/Graph**.
101+
102+
## 1. Aggregate-Oriented Databases
103+
104+
* Key-Value - Redis, Memcached
105+
* Document - DynamoDB, MongoDB
106+
* Column-Family - Cassandra, BigTable
107+
108+
NoSQL databases share very few common characteristics. Perhaps the only one is that they are **schema-less**. Typical aggregate-oriented NoSQL databases will store an aggregation in the form of strings or entire documents. That is usually in plain text, often in a specific format or notation, such as JSON or XML.
109+
110+
Here are some sample entries from a simple Key-Value datastore:
111+
112+
<div>
113+
<table class="table">
114+
<thead>
115+
<tr>
116+
<th>Key</th>
117+
<th>Value</th>
118+
</tr>
119+
</thead>
120+
<tbody>
121+
<tr>
122+
<td>access_key</td>
123+
<td>ABCDEfghijklmnop123456789xyzabc</td>
124+
</tr>
125+
<tr>
126+
<td>secret_key</td>
127+
<td>23481283852384128328a</td>
128+
</tr>
129+
<tr>
130+
<td>current_count</td>
131+
<td>472</td>
132+
</tr>
133+
<tr>
134+
<td>jobs_remaining</td>
135+
<td>13</td>
136+
</tr>
137+
<tr>
138+
<td>last-winner</td>
139+
<td>Darla Johnson</td>
140+
</tr>
141+
<tr>
142+
<td>last-winner-date</td>
143+
<td>08/17/2014 08:42:13.015 UTC</td>
144+
</tr>
145+
</tbody>
146+
</table>
147+
</div>
148+
149+
<br>
150+
In the case of document NoSQL databases, the “value” portion of the entry can get much larger.
151+
152+
Here is an example of an entry in JSON. Note that the entire entry (or “document”) breaks down into a hierarchy of data: fields and their values, and dictionaries of multiple values,
153+
154+
```
155+
{
156+
"success": {
157+
"total": 1
158+
},
159+
"contents": {
160+
"quotes": [
161+
{
162+
"quote": "Remove the temptation to settle for anything short of what you deserve.",
163+
"length": "71",
164+
"author": "Lorii Myers",
165+
"tags": [
166+
"expectation",
167+
"inspire",
168+
"perfection"
169+
],
170+
"category": "inspire",
171+
"date": "2017-09-08",
172+
"permalink": "https://theysaidso.com/quote/ZWrV624xU_q6_KYYlrQpYgeF/lorii-myers-remove-the-temptation-to-settle-for-anything-short-of-what-you-deser",
173+
"title": "Inspiring Quote of the day",
174+
"background": "https://theysaidso.com/img/bgs/man_on_the_mountain.jpg",
175+
"id": "ZWrV624xU_q6_KYYlrQpYgeF"
176+
}
177+
],
178+
"copyright": "2017-19 theysaidso.com"
179+
}
180+
}
181+
```
182+
183+
Also consider that subsequent entries into this table may or may not contain a background image, or the same number of tags, or the precise data structure of this
184+
entry. NoSQL evolved out of the need to quickly collect varied data at very high rates and so it does not suffer from impedance mismatch. Rather, it suffers from
185+
its difficulty to aggregate or join.
186+
187+
## 2. Node-Arc / Graph Databases
188+
189+
Graph, or Node-arc databases are entirely different, in that they try to store and represent connectivity between nodes in a constellation, and their relationships. So a “query” of a graph database might inform you about the networks of other nodes related to the node you are interested in, and the types and strengths of those relationships, among other uses. Some examples of Graph DBs are:
190+
191+
* Neo4j
192+
* TinkerPop
193+
* Infinite
194+
195+
196+
![](/notes/databases-intro/graphdb-property.png)
197+
198+
---
199+
200+
# Using Databases in Your Research
201+
202+
We are frequently asked by researchers how to incorporate databases into their work. Here are four suggestions:
203+
204+
<div class="card-deck">
205+
<div class="card">
206+
<div class="card-block">
207+
<h4 class="card-title">&raquo; Track Results</h4>
208+
<p class="card-text">
209+
Track the status of your completed work by adding a record to a table upon completion. This lets you
210+
know what work remains open and information about its processing.
211+
</p>
212+
</div>
213+
</div>
214+
<div class="card">
215+
<div class="card-block">
216+
<h4 class="card-title">&raquo; Queue Up Your Work</h4>
217+
<p class="card-text">
218+
Collect and store data about future work you need to complete, the steps required, and the expected lifecycle
219+
of each step. While this might be easy to do in Excel, you could grow this into a database that orchestrates
220+
some of these steps for you.
221+
</p>
222+
</div>
223+
</div>
224+
</div>
225+
<div class="card-deck" style="margin-top:2rem;">
226+
<div class="card">
227+
<div class="card-block">
228+
<h4 class="card-title">&raquo; Index Everything</h4>
229+
<p class="card-text">
230+
Maintain a searchable history of source data, result sets, and code used to process them.
231+
This could include links to related data, articles published, GitHub code repositories, and more.
232+
</p>
233+
</div>
234+
</div>
235+
<div class="card">
236+
<div class="card-block">
237+
<h4 class="card-title">&raquo; Automate</h4>
238+
<p class="card-text">
239+
If you are awash in source data or have a backlog of files to process, consider automating it by using a database.
240+
Your code, instead of handling a single file at a time, could read each row in the database and process files
241+
indexed in a table. A single HPC job could process thousands of files!
242+
</p>
243+
</div>
244+
</div>
245+
</div>
246+
247+
<br>
248+
249+
**Note:** Research Computing may be able to provide support for your database needs. Please schedule a consulation request on our website by filling out [this form](https://www.rc.virginia.edu/form/support-request/?category=Consultation).
250+
251+
252+
# Other Resources
253+
254+
Here is a great overview of databases and their histories:
255+
256+
{{< youtube qI_g07C_Q5I >}}
257+
258+
Martin Fowler - NoSQL - YouTube
61.3 KB
Loading
+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Introduction to Databases
3+
summary: "This tutorial is an introduction to relational and NoSQL databases."
4+
5+
# Schedule page publish date (NOT talk date).
6+
publishDate: "2024-04-01T00:00:00Z"
7+
8+
categories: ["databases"]
9+
tags: [rivanna]
10+
11+
notes: databases-intro
12+
13+
weight: 200
14+
15+
---

0 commit comments

Comments
 (0)