Syllabus                   CS 221: Data Base Management Systems                              Fall 2025

Margaret Menzin    Office: E-425-D                                                                    Phone: X2704                
                                    Email: menzin@simmons.edu                Home Phone: 781-862-5107           
                                    My  zoom office hours      https://simmons.zoom.us/j/4672991761

                                     There is no class on Monday 9/22/25.  This class will be made up.

                                      Student Hours: MWF  7:15-8:00 a.m ; MWF 12-2
                                                              
(except for  lunch days)
                                                                other days and times by
appointment –
                                                                           I am almost always  ar
ound/on email,
                                                                           but give me a heads-up the day before.

                                        Note:  The Computer, Data and Mathematics Sciences (CDMS)
                                                    Department "eats" at the Fens on many Wednesdays 12-1.                     
                                                    Whenever feasible we will be outside in front of the Fens.

Common Syllabus Statements about Accessibility, Academic integrity and Sexual Harassment Policies will be found here . 

Text: Kroenke, Database Processing 16th edition (Pearson Prentice Hall)          .

Required: Student Membership in the ACM for on-line reading from their site, including the
         books on Mongo and Neo4j. 
        
Student Membership in the ACM costs $19 for the year and gives you access to a huge
        number of professional books.  This also saves you a lot of money, as there are no books
        required for MongoDB and some of the other NoSQL topics.

Software: MariaDB, MongoDB, and neo4j are all open source &  either
               on canary.simmons.edu or you will install them ( on the CS VM or your own
               computer)

Expectations and How to Succeed in This Course

Very Broad Overview of the Content

My favorite definition of a database management system is that it is an organized, shared, collection of data (the database) together with the  systems and tools for managing that data.’  (Our book adds the condition that the data be ‘self-describing’.)

In other words, we are going to be looking at systems for organizing the data – designing how to break it up into tables, graphs, or other units – ensuring that our data doesn’t get corrupted,  modifying our data, handling multiple users, making sure that data is accessed only by appropriate users, etc.

In the first part of the course (about 2/3s of the course) we will look at Relational Data Bases or RDBMS.  These databases are the workhorses of databases where it is important that the data be accurate and consistent.  These databases all use SQL or Sequential Query Language. We will be working with the MariaDB platform.  MariaDB is a fork of the very popular MySQL.  (They were created by the same person and are identical for all practical purposes.)

In these databases the data is organized in tables or relations, and we will examine how we design (decide what data to put in what tables, etc.) such a database, how constraints on appropriate values for our data are enforced,  how the data is CRUDed (CRUD = Create, Read, Update, Delete), how consistency and accuracy of data is maintained, how concurrency (multiple users trying to read or write data at the same time) is handled, and how access privileges (the right to CRUD various parts of the data) are handled.  You will learn how to write SQL queries to create databases and tables and how to CRUD the data.

In the second part of the course we will look at non-relational databases, also called NoSQL (NoSQL = Not Only SQL) databases.  These newer databases are used when absolute data consistency is no longer needed – often when we are working with big data.  We will look at two rather different examples: MongoDB represents its data in JSON (this is just like a dictionary in python) and is used for many web applications. Mongo is very popular and we will work with it extensively.   Neo4j represents its data as a graph, with nodes and edges and is appropriate for some interesting applications.   In this part of the course we will look at some simple examples and how to use them.  As with the relational model, there is always the fun of designing your data model.  We will also discuss the more general issues surrounding concurrency and distributed databases.

It should be noted that increasingly some companies use both SQL and NoSQL databases, (referred to as a ‘hybrid model’) and need to get them to ‘talk to each other’.At last tally, about a quarter of all database installations included both a SQL database and MongoDB.

Finally we will discuss some issues which pertain to all kinds of databases:  First, how do you decide what data model (RDBMS? MongoDB/document? Neo4j/graphical? etc.) to use?  Second, what do you do when your application is too big for one server? How do distributed databases work, stay consistent and current? How do they handle failures of one component and upgrades? What are the pros and cons of various architectures?  Third, what is your Data Life Cycle?  What steps do you follow, starting with acquiring the data and ending with its deletion?  Are there ethical issues related to your project and what trade-offs will you make?

An aside: In the RDBMS part of this course we will be using three examples extensively:

Student Learning Outcomes

Students will be able to describe major uses of database systems, distinguishing between those which are still actively evolving and data warehouses, and describing the major functions of a DBMS  ( including those related to access control, security, backup and recovery)

Students will query and CRUD a database using SQL, including the use of aggregate functions, subqueries, and 'exists'.

Students will be able to assess if a database is in 3NF or BCNF and if not put it into 3NF or BCNF.

Students will be able to design a database using an E-R design and then translate that into SQL to create the tables and views, constraints and triggers.  This includes evaluating different designs (e.g. for subtypes of entities ) and understanding the tradeoffs in creating additional indices.  

Students will create a web page to query a SQL database, using PHP, and display the results on a web page.  

Students will be able to describe the issues surrounding concurrency and the solutions to those problems, including how those solutions also provide back-up and recovery and the importance of ACID transactions.

Students will install MongoDB on their own machines, create a MongoDB database from csv files and write code using  the aggregation pipeline and callback functions in JavaScript to denormalize the data and to CRUD the database.

Students will understand the implications of having a distributed database, why Mongo is BASE, and how eventual consistency is typically achieved.

Students will examine neo4j and apply their knowledge of neo4j and cypher to understand a major use case of neo4j, including describing the nodes and links and their attributes, lecturing on typical code, and evaluating the fitness of the graphical model.

Students will be able to describe the advantages and disadvantages of various database models.

Students will be able to read documentation, including following a sequence of references to supporting defintions,  and apply the documentation  to writing correct code.

Students will create database designs and queries by working together effectively on team projects.

 

Overview of content:

Unit 1: Introduction to Database Systems
Ch. 1 Kroenke and other readings

What is a database? What kinds of operations does a database support (CRUD)? What kinds of databases are there and how have they evolved? What kinds of issues do we consider in database design (e.g. access, integrity, consistency, recovery from errors etc.) 

Unit 2: Introduction to SQL
Ch. 2 Kroenke and other readings from the ACM and other sites

SQL, or Structured Query Language, is the lingua franca for querying databases. Our book  looks  first at Access (part of Microsoft Office) and then using MariaDB, an open source database product. NOTE: For purposes of this course, MySQL and MariaDB are identical.

Ww may look very briefly at Access with its graphical interface which implements QBE (Query By Example). Access is really a single-user database, and so not of major interest to us.  QBE, once touted as the wave of the future, has never become widely implemented and used – perhaps because SQL is so easy to learn. (Access is available on all the Simmons computers. )

But our major work with Relational Databases and SQL will be on the MariDB platform. (You will have access to the MyWebSQL  and Workbench to use MariaDB.)

 

 Unit 3: Database Design for RDBMS, Including  Normalization and Entity-Relationship Diagrams
Ch. 3-7 Kroenke and readings from the ACM and other sites

How do we decide if our database should have many small relations or one large one? How do we design a database so that it will not be vulnerable to various kinds of anomalies?

This Unit will include more advanced use of SQL and also designing web interfaces for querying databases and presenting the results of these database queries (thru php or other scripting language.) 

There is a major E-R Design project, done in teams.

Unit 4: General Issues in  RDBMS
Ch. 9  of Kroenke and other reading

We revisit the general issues of Unit 1 and discuss them from a more informed point of view. We examine issues surrounding concurrency and the methods for handling them, and how these tie into back-up and recovery.  We also examine issues surrounding security – especially preventing) SQL injection and granting and revoking privileges (to CRUD columns or tables in a database.


 

Unit 5: NoSQL and MongoDB
ACM Book site.

As mentioned in the Overview, MongoDB (also called Mongo)  stores its data in JSON – which looks just like dictionaries in python.  Mongo is a document database, as each JSON object (dictionary) may be thought of us a document.  You will use either your own computer or the CS VM to work with Mongo.

We will especially be interested in the advantages and disadvantages of using MongoDB as opposed to a SQL platform.  Seeing how MariaDB and Mongo differ helps to clarify the importance of each one's strengths.

There is a major team project.

Unit 6: NoSQL and neo4j
Neo4j book on Online Campus and ACM Book site.

 Neo4j is, as noted in the Overview, based on a graph model.  Both the nodes and edges can have attributes (properties). 

One attractive feature of neo4j is that insertions, etc. are constant time even as the database gets large.

As with the other databases we will look at some design issues and work with an implementation of a neo4j database.

As time allows we will take a short look at Blockchain, the platform which underlies Bitcoin.

Unit 7: Distributed databases, Data Life Cycle and Wrap-Up

In this unit we examine some of the important architectures for distributed databases, and the implications of those designs for concurrency and availability.  We discuss the CAP Theorem.  Finally we step back to ask about the major strengths and weaknesses of various kinds of databases and how one begins to choose a database for a particular application.

Whatever model of database you use, there is a life cycle to your data, starting with its acquisition, then  processing and storing, sharing, and deletion.  We will examine these steps and also see some examples of where ethical issues arise at various points.  We will discuss the kinds of trade-offs these issues engender.

Grading: 
There will be three  tests , two group projects and one short paired presentation/report in the course.  Points are allocated as follows:
     In class quizzes (part way thru Assignment 2B)   10%
     Test at end of Unit 2  – part 1  -                             15%
     Test at end of Unit 2– part 2  -                               10%
     E-R Design project  in Unit 3                                   15%
     Mongo test                                                                20%
     Mongo  project                                                         20%
     Neo4j report                                                              10%

While there is no explicit point allocation for participation, that feeds into the evaluations of the group projects.  Additionally, since attendance is required at all meetings, I reserve the right to deduct up to 15% for students with more than 5 unexcused (i.e. not for a religious holiday and not accompanied by a relevant doctor’s note) absences. Labs are counted in the attendance tally.

Time schedule:  A tentative time schedule is posted in the course site.
Take home tests are always due a week after they are released except as noted on the schedule. 
Projects done in small groups typically take longer. 
Work is due on the date of the assignment (with a grace period of the next class). Any extension beyond that must be arranged before hand, except for dire emergencies etc.

 

 

Last thoughts:
Database systems is a fun topic.  The concepts are reasonable, we have a great book, and design (of a database) is always fun.  We should have a great time together in this course, so let’s rolling!