Syllabus
CS 221: Data Base
Management Systems Fall 2025
Margaret
Menzin Office: E-425-D
Phone:
X2704
Email:
menzin@simmons.edu Home Phone: 781-862-5107
My zoom office hours https://simmons.zoom.us/j/4672991761
There
is no class on Monday 9/22/25. This
class will be made up.
Student Hours: MWF 7:15-8:00 a.m ; MWF
12-2
(except for
lunch days)
other days and times by appointment
–
I am almost always around/on email,
but give me a heads-up the day before.
Note:
The Computer, Data and Mathematics Sciences (CDMS)
Department "eats" at the Fens on many Wednesdays 12-1.
Whenever feasible we will be outside in front of the Fens.
Common
Syllabus Statements about Accessibility, Academic integrity and Sexual
Harassment Policies will be found here
.
Text: Kroenke,
Database Processing 16th edition (Pearson Prentice Hall) .
Required: Student
Membership in the ACM for on-line reading from their site, including the
books on Mongo and Neo4j.
Student Membership in
the ACM costs $19 for the year and gives you access to a huge
number of professional
books. This also saves you a lot of
money, as there are no books
required for MongoDB and some of
the other NoSQL topics.
Software: MariaDB,
MongoDB, and neo4j are all open source &
either
on canary.simmons.edu or
you will install them ( on the CS VM or your own
computer)
Expectations and How
to Succeed in This Course
Very Broad Overview of the Content
My favorite
definition of a database management system is that it is ‘an organized, shared, collection of data (the database) together
with the systems and tools for managing that data.’ (Our book adds the condition that the data be
‘self-describing’.)
In other words, we
are going to be looking at systems for organizing the data – designing how to
break it up into tables, graphs, or other units – ensuring that our data
doesn’t get corrupted, modifying our
data, handling multiple users, making sure that data is accessed only by
appropriate users, etc.
In the first part of
the course (about 2/3s of the course) we will look at Relational Data Bases or RDBMS.
These databases are the workhorses of databases where it is important
that the data be accurate and consistent.
These databases all use SQL or Sequential Query Language. We will
be working with the MariaDB platform.
MariaDB is a fork of the very popular MySQL. (They were created by the same person and are
identical for all practical purposes.)
In these databases
the data is organized in tables or relations, and we will examine how we design
(decide what data to put in what tables, etc.) such a database, how constraints
on appropriate values for our data are enforced, how the data is CRUDed (CRUD = Create,
Read, Update, Delete), how consistency and accuracy of data is maintained,
how concurrency (multiple users trying to read or write data at the same time)
is handled, and how access privileges (the right to CRUD various parts of the
data) are handled. You will learn how to
write SQL queries to create databases and tables and how to CRUD the data.
In the second part of
the course we will look at non-relational databases, also called NoSQL
(NoSQL = Not Only SQL) databases. These
newer databases are used when absolute data consistency is no longer needed –
often when we are working with big data.
We will look at two rather different examples: MongoDB represents its data in JSON (this is just
like a dictionary in python) and is used for many web applications. Mongo is
very popular and we will work with it extensively. Neo4j
represents its data as a graph, with nodes and edges and is appropriate for
some interesting applications. In this
part of the course we will look at some simple examples and how to use
them. As with the relational model,
there is always the fun of designing your data model. We will also discuss the more general issues
surrounding concurrency and distributed databases.
It should be noted that
increasingly some companies use both SQL and NoSQL databases, (referred to as a
‘hybrid model’) and need to get them to ‘talk to each other’.At last tally,
about a quarter of all database installations included both a SQL database and
MongoDB.
Finally we will
discuss some issues which pertain to all kinds of databases: First, how do you decide what data model
(RDBMS? MongoDB/document? Neo4j/graphical? etc.) to use? Second, what do you do when your application
is too big for one server? How do distributed databases work, stay consistent
and current? How do they handle failures of one component and upgrades? What
are the pros and cons of various architectures?
Third, what is your Data Life Cycle?
What steps do you follow, starting with acquiring the data and ending
with its deletion? Are there ethical
issues related to your project and what trade-offs will you make?
An aside: In the RDBMS part of
this course we will be using three examples extensively:
Student Learning
Outcomes
Students will be able
to describe major uses of database systems, distinguishing between those which
are still actively evolving and data warehouses, and describing the major
functions of a DBMS ( including those related
to access control, security, backup and recovery)
Students will query
and CRUD a database using SQL, including the use of aggregate functions,
subqueries, and 'exists'.
Students will be able
to assess if a database is in 3NF or BCNF and if not put it into 3NF or BCNF.
Students will be able
to design a database using an E-R design and then translate that into SQL to
create the tables and views, constraints and triggers. This includes evaluating different designs
(e.g. for subtypes of entities ) and understanding the tradeoffs in creating
additional indices.
Students will create
a web page to query a SQL database, using PHP, and display the results on a web
page.
Students will be able
to describe the issues surrounding concurrency and the solutions to those
problems, including how those solutions also provide back-up and recovery and
the importance of ACID transactions.
Students will install
MongoDB on their own machines, create a MongoDB database from csv files and
write code using the aggregation
pipeline and callback functions in JavaScript to denormalize the data and to
CRUD the database.
Students will
understand the implications of having a distributed database, why Mongo is
BASE, and how eventual consistency is typically achieved.
Students will examine
neo4j and apply their knowledge of neo4j and cypher to understand a major use
case of neo4j, including describing the nodes and links and their attributes,
lecturing on typical code, and evaluating the fitness of the graphical model.
Students will be able
to describe the advantages and disadvantages of various database models.
Students will be able
to read documentation, including following a sequence of references to
supporting defintions, and apply the
documentation to writing correct code.
Students will create
database designs and queries by working together effectively on team projects.
Overview of content:
Unit
1: Introduction to Database Systems
Ch. 1 Kroenke and other readings
What is a database?
What kinds of operations does a database support (CRUD)? What kinds of
databases are there and how have they evolved? What kinds of issues do we
consider in database design (e.g. access, integrity, consistency, recovery from
errors etc.)
Unit 2: Introduction to SQL
Ch. 2 Kroenke and other readings from the ACM and other sites
SQL, or Structured
Query Language, is the lingua franca for querying databases. Our book looks
first at Access (part of Microsoft Office) and then using MariaDB, an
open source database product. NOTE: For purposes of this course, MySQL and
MariaDB are identical.
Ww may look very
briefly at Access with its graphical interface which implements QBE (Query By
Example). Access is really a single-user database, and so not of major interest
to us. QBE, once touted as the wave of the
future, has never become widely implemented and used – perhaps because SQL is
so easy to learn. (Access is available on all the Simmons computers. )
But our major work with Relational Databases and SQL will be on the MariDB
platform. (You will have access to the MyWebSQL
and Workbench to use MariaDB.)
Unit 3:
Database Design for RDBMS, Including Normalization
and Entity-Relationship Diagrams
Ch. 3-7 Kroenke and readings from the ACM and other sites
How do we decide if
our database should have many small relations or one large one? How do we
design a database so that it will not be vulnerable to various kinds of
anomalies?
This Unit will
include more advanced use of SQL and also designing web interfaces for querying
databases and presenting the results of these database queries (thru php or
other scripting language.)
There is a major E-R Design project, done in teams.
Unit
4: General Issues in RDBMS
Ch. 9 of Kroenke and other reading
We revisit the
general issues of Unit 1 and discuss them from a more informed point of view.
We examine issues surrounding concurrency and the methods for handling them,
and how these tie into back-up and recovery.
We also examine issues surrounding security – especially preventing) SQL
injection and granting and revoking privileges (to CRUD columns or tables in a
database.
Unit 5: NoSQL and MongoDB
ACM Book site.
As mentioned in the
Overview, MongoDB (also called Mongo)
stores its data in JSON – which looks just like dictionaries in
python. Mongo is a document database, as
each JSON object (dictionary) may be thought of us a document. You will use either your own computer or the
CS VM to work with Mongo.
We will especially be interested in the advantages and disadvantages of using
MongoDB as opposed to a SQL platform.
Seeing how MariaDB and Mongo differ helps to clarify the importance of
each one's strengths.
There is a major team
project.
Unit 6: NoSQL and neo4j
Neo4j book on Online Campus and ACM Book site.
Neo4j is, as
noted in the Overview, based on a graph model.
Both the nodes and edges can have attributes (properties).
One attractive
feature of neo4j is that insertions, etc. are constant time even as the
database gets large.
As with the other databases we will look at some design issues and work with an
implementation of a neo4j database.
As time allows we
will take a short look at Blockchain, the platform which underlies Bitcoin.
Unit 7: Distributed
databases, Data Life Cycle and Wrap-Up
In this unit we
examine some of the important architectures for distributed databases, and the
implications of those designs for concurrency and availability. We discuss the CAP Theorem. Finally we step back to ask about the major
strengths and weaknesses of various kinds of databases and how one begins to
choose a database for a particular application.
Whatever model of
database you use, there is a life cycle to your data, starting with its
acquisition, then processing and
storing, sharing, and deletion. We will
examine these steps and also see some examples of where ethical issues arise at
various points. We will discuss the
kinds of trade-offs these issues engender.
Grading:
There
will be three tests , two group projects
and one short paired presentation/report in the course. Points are allocated as follows:
In class quizzes (part way thru
Assignment 2B) 10%
Test at end of Unit 2 – part 1
- 15%
Test at end of Unit 2– part 2 - 10%
E-R Design project in Unit 3 15%
Mongo test 20%
Mongo project 20%
Neo4j report 10%
While there is no
explicit point allocation for participation, that feeds into the evaluations of
the group projects. Additionally,
since attendance is required at all meetings, I reserve the right to deduct up
to 15% for students with more than 5 unexcused (i.e. not for a religious
holiday and not accompanied by a relevant doctor’s note) absences. Labs are
counted in the attendance tally.
Time schedule: A tentative time schedule is posted in the
course site.
Take home tests are always due a week after they are released except as noted
on the schedule.
Projects done in small groups typically take longer.
Work is due on the date of the assignment (with a grace period of the next
class). Any extension beyond that must be arranged before hand, except for
dire emergencies etc.
Last thoughts:
Database systems is a fun topic. The
concepts are reasonable, we have a great book, and design (of a database) is
always fun. We should have a great time
together in this course, so let’s rolling!