Syllabus CS 333/533: Data Base Management Systems Spring 2008

Syllabus CS 221: Data Base Management Systems Fall 2025

Margaret Menzin    Office: E-425-D                                                                    Phone: X2704
                                    Email: menzin@simmons.edu                Home Phone: 781-862-5107
                                    My zoom office hours      https://simmons.zoom.us/j/4672991761

There is no class on Monday 9/22/25. This class will be made up.

                                      Student Hours: MWF 7:15-8:00 a.m ; MWF 12-2
                                                               (except for lunch days)
                                                               other days and times by appointment –
                                                                           I am almost always around/on email,
                                                                           but give me a heads-up the day before.

                                        Note: The Computer, Data and Mathematics Sciences (CDMS)
                                                    Department "eats" at the Fens on many Wednesdays 12-1.
                                                    Whenever feasible we will be outside in front of the Fens.

Common Syllabus Statements about Accessibility, Academic integrity and Sexual Harassment Policies will be found here .

Text: Kroenke, Database Processing 16^th edition (Pearson Prentice Hall) .

Required: Student Membership in the ACM for on-line reading from their site, including the
         books on Mongo and Neo4j.
         Student Membership in the ACM costs $19 for the year and gives you access to a huge
        number of professional books. This also saves you a lot of money, as there are no books
        required for MongoDB and some of the other NoSQL topics.

Software: MariaDB, MongoDB, and neo4j are all open source & either
on canary.simmons.edu or you will install them ( on the CS VM or your own
computer)

Expectations and How to Succeed in This Course

This is an intermediate level Computer Science course, and so I expect you to be able to follow the readings, ask questions, and stay up to date throughout the semester.
I love getting questions!
You are always welcome to email me the night before class if there is material that is unclear – and the more specific you can be, the better. If something is unclear to one student, then it will also be unclear to other students – so you are also helping your classmates when you ask for clarification.

In addition to asking questions by email, you can ask questions in class or on each VoiceThread. (Each VoiceThread has a slide near the end for you to enter questions.)
Attendance is required for all meetings, including the labs (baring health issues verified by a doctor’s note and religious obligations.) I take attendance and more than 5 unexcused absenses (including labs) may result in a loss of up to 15% in your course grade.
I would point out to you that you can’t tell at the start of the semester what will be a problem in the last week of class, so you should use the alloted absences judiciously.
There is a pattern to the classes:
There is almost always reading followed by a VoiceThread (video) – which need to be done before class.

The VoiceThread has some embeddded questions for you to answer – and this is a mandatory part of the HW assignment. Typically there will also be some straighforward questions (from the book) to answer. In addition to our text book, there may be additional readings posted in the course site or from the ACM online book site or the Simmons LIbrary.
The nature of this course is that there are a lot of files - so for each assignment I have described the "Workflow" (what you need to do , in what order for the assignment .) In addition, for each assignment there are almost always two folders – Files for HW; Files for Classwork. (The classwork is the class after the HW is done.) I have tried very hard to put all the files for a given assignment in these folders for all our work in most of the course. In order to reduce load of the moodle server please download all these files before coming to class. (Obviously you need the HW files in order to do the HW before class.) I have also tried to be sure that every folder, whether for a HW assignment or a class, has all the files you will need. That means that some files are repeated in multiple folders.
I strongly suggest that you make a folder for this course and use nested folders to imitate the structure of the course on moodle. I especially recommend keeping a copy of the Course Schedule on your Desktop.
The course will include both individual and group assignments.

Very Broad Overview of the Content

My favorite definition of a database management system is that it is ‘an organized, shared, collection of data (the database) together with the systems and tools for managing that data.’ (Our book adds the condition that the data be ‘self-describing’.)

In other words, we are going to be looking at systems for organizing the data – designing how to break it up into tables, graphs, or other units – ensuring that our data doesn’t get corrupted, modifying our data, handling multiple users, making sure that data is accessed only by appropriate users, etc.

In the first part of the course (about 2/3s of the course) we will look at Relational Data Bases or RDBMS. These databases are the workhorses of databases where it is important that the data be accurate and consistent. These databases all use SQL or Sequential Query Language. We will be working with the MariaDB platform. MariaDB is a fork of the very popular MySQL. (They were created by the same person and are identical for all practical purposes.)

In these databases the data is organized in tables or relations, and we will examine how we design (decide what data to put in what tables, etc.) such a database, how constraints on appropriate values for our data are enforced, how the data is CRUDed (CRUD = Create, Read, Update, Delete), how consistency and accuracy of data is maintained, how concurrency (multiple users trying to read or write data at the same time) is handled, and how access privileges (the right to CRUD various parts of the data) are handled. You will learn how to write SQL queries to create databases and tables and how to CRUD the data.

In the second part of the course we will look at non-relational databases, also called NoSQL (NoSQL = Not Only SQL) databases. These newer databases are used when absolute data consistency is no longer needed – often when we are working with big data. We will look at two rather different examples: MongoDB represents its data in JSON (this is just like a dictionary in python) and is used for many web applications. Mongo is very popular and we will work with it extensively. Neo4j represents its data as a graph, with nodes and edges and is appropriate for some interesting applications. In this part of the course we will look at some simple examples and how to use them. As with the relational model, there is always the fun of designing your data model. We will also discuss the more general issues surrounding concurrency and distributed databases.

It should be noted that increasingly some companies use both SQL and NoSQL databases, (referred to as a ‘hybrid model’) and need to get them to ‘talk to each other’.At last tally, about a quarter of all database installations included both a SQL database and MongoDB.

Finally we will discuss some issues which pertain to all kinds of databases: First, how do you decide what data model (RDBMS? MongoDB/document? Neo4j/graphical? etc.) to use? Second, what do you do when your application is too big for one server? How do distributed databases work, stay consistent and current? How do they handle failures of one component and upgrades? What are the pros and cons of various architectures? Third, what is your Data Life Cycle? What steps do you follow, starting with acquiring the data and ending with its deletion? Are there ethical issues related to your project and what trade-offs will you make?

An aside: In the RDBMS part of this course we will be using three examples extensively:

The College2015 example mimics part of the Registrar's Office at Simmons – it has students, courses, faculty, etc. This is a fairly complex system, but it is one we are all familiar with. Our familiarity with the system will let us think through the kinds of relationships between students and courses and faculty.
The CapeCodd example is about a simplified sporting goods store. Our text makes extensive use of this example and you will find that it has already been loaded into MariaDB. Cape Codd has products which it sells, sales people, buyers, suppliers, warehouses, and purchases by customers.
The QACS or Queen Anne Curiosity Shop example is a database which you will gradually construct together. As you learn more about working with a database you will apply that knowledge to build the QACS database. As you will see, QACS sells antique and reproduction furniture, and related items.

Student Learning Outcomes

Students will be able to describe major uses of database systems, distinguishing between those which are still actively evolving and data warehouses, and describing the major functions of a DBMS ( including those related to access control, security, backup and recovery)

Students will query and CRUD a database using SQL, including the use of aggregate functions, subqueries, and 'exists'.

Students will be able to assess if a database is in 3NF or BCNF and if not put it into 3NF or BCNF.

Students will be able to design a database using an E-R design and then translate that into SQL to create the tables and views, constraints and triggers. This includes evaluating different designs (e.g. for subtypes of entities ) and understanding the tradeoffs in creating additional indices.

Students will create a web page to query a SQL database, using PHP, and display the results on a web page.

Students will be able to describe the issues surrounding concurrency and the solutions to those problems, including how those solutions also provide back-up and recovery and the importance of ACID transactions.

Students will install MongoDB on their own machines, create a MongoDB database from csv files and write code using the aggregation pipeline and callback functions in JavaScript to denormalize the data and to CRUD the database.

Students will understand the implications of having a distributed database, why Mongo is BASE, and how eventual consistency is typically achieved.

Students will examine neo4j and apply their knowledge of neo4j and cypher to understand a major use case of neo4j, including describing the nodes and links and their attributes, lecturing on typical code, and evaluating the fitness of the graphical model.

Students will be able to describe the advantages and disadvantages of various database models.

Students will be able to read documentation, including following a sequence of references to supporting defintions, and apply the documentation to writing correct code.

Students will create database designs and queries by working together effectively on team projects.

Overview of content:

Unit 1: Introduction to Database Systems
Ch. 1 Kroenke and other readings

What is a database? What kinds of operations does a database support (CRUD)? What kinds of databases are there and how have they evolved? What kinds of issues do we consider in database design (e.g. access, integrity, consistency, recovery from errors etc.)

Unit 2: Introduction to SQL
Ch. 2 Kroenke and other readings from the ACM and other sites

SQL, or Structured Query Language, is the lingua franca for querying databases. Our book looks first at Access (part of Microsoft Office) and then using MariaDB, an open source database product. NOTE: For purposes of this course, MySQL and MariaDB are identical.

Ww may look very briefly at Access with its graphical interface which implements QBE (Query By Example). Access is really a single-user database, and so not of major interest to us. QBE, once touted as the wave of the future, has never become widely implemented and used – perhaps because SQL is so easy to learn. (Access is available on all the Simmons computers. )

But our major work with Relational Databases and SQL will be on the MariDB platform. (You will have access to the MyWebSQL and Workbench to use MariaDB.)

Unit 3: Database Design for RDBMS, Including Normalization and Entity-Relationship Diagrams
Ch. 3-7 Kroenke and readings from the ACM and other sites

How do we decide if our database should have many small relations or one large one? How do we design a database so that it will not be vulnerable to various kinds of anomalies?

This Unit will include more advanced use of SQL and also designing web interfaces for querying databases and presenting the results of these database queries (thru php or other scripting language.)

There is a major E-R Design project, done in teams.

Unit 4: General Issues in RDBMS
Ch. 9 of Kroenke and other reading

We revisit the general issues of Unit 1 and discuss them from a more informed point of view. We examine issues surrounding concurrency and the methods for handling them, and how these tie into back-up and recovery. We also examine issues surrounding security – especially preventing) SQL injection and granting and revoking privileges (to CRUD columns or tables in a database.

Unit 5: NoSQL and MongoDB
ACM Book site.

As mentioned in the Overview, MongoDB (also called Mongo) stores its data in JSON – which looks just like dictionaries in python. Mongo is a document database, as each JSON object (dictionary) may be thought of us a document. You will use either your own computer or the CS VM to work with Mongo.

We will especially be interested in the advantages and disadvantages of using MongoDB as opposed to a SQL platform. Seeing how MariaDB and Mongo differ helps to clarify the importance of each one's strengths.

There is a major team project.

Unit 6: NoSQL and neo4j
Neo4j book on Online Campus and ACM Book site.

Neo4j is, as noted in the Overview, based on a graph model. Both the nodes and edges can have attributes (properties).

One attractive feature of neo4j is that insertions, etc. are constant time even as the database gets large.

As with the other databases we will look at some design issues and work with an implementation of a neo4j database.

As time allows we will take a short look at Blockchain, the platform which underlies Bitcoin.

Unit 7: Distributed databases, Data Life Cycle and Wrap-Up

In this unit we examine some of the important architectures for distributed databases, and the implications of those designs for concurrency and availability. We discuss the CAP Theorem. Finally we step back to ask about the major strengths and weaknesses of various kinds of databases and how one begins to choose a database for a particular application.

Whatever model of database you use, there is a life cycle to your data, starting with its acquisition, then processing and storing, sharing, and deletion. We will examine these steps and also see some examples of where ethical issues arise at various points. We will discuss the kinds of trade-offs these issues engender.

Grading:
There will be three tests , two group projects and one short paired presentation/report in the course. Points are allocated as follows:
     In class quizzes (part way thru Assignment 2B)   10%
     Test at end of Unit 2 – part 1 -                             15%
     Test at end of Unit 2– part 2 -                               10%
     E-R Design project in Unit 3                                   15%
     Mongo test                                                                20%
     Mongo project                                                         20%
     Neo4j report                                                              10%

While there is no explicit point allocation for participation, that feeds into the evaluations of the group projects. Additionally, since attendance is required at all meetings, I reserve the right to deduct up to 15% for students with more than 5 unexcused (i.e. not for a religious holiday and not accompanied by a relevant doctor’s note) absences. Labs are counted in the attendance tally.

Time schedule: A tentative time schedule is posted in the course site.
Take home tests are always due a week after they are released except as noted on the schedule.
Projects done in small groups typically take longer.
Work is due on the date of the assignment (with a grace period of the next class). Any extension beyond that must be arranged before hand, except for dire emergencies etc.

Last thoughts:
Database systems is a fun topic. The concepts are reasonable, we have a great book, and design (of a database) is always fun. We should have a great time together in this course, so let’s rolling!