initialCapacity[]

Fundamentals of Software Architecture for Big Data

The course is intended for individuals looking to understand the basics of software engineering as they relate to building large software systems that leverage big data. You will be introduced to software engineering concepts necessary to build and scale large, data intensive, distributed systems. Starting with software engineering best practices and loosely coupled, highly cohesive data microservices, the course takes you through the evolution of a distributed system over time.

← Back

Download the codebase

The milk problem

An example architecture used for managing product inventory which highlights the use of database transactions.

History

The milk problem first surfaced while working with a well-known grocery store to track product inventory in real time. The choice of database was largely driven by a non-trivial performance requirement. The initial solution used an eventually consistent database which was available and partition tolerant. Read about the CAP theorem to learn more about the relationship between consistency, availability, and partition tolerance.

The challenge is that high availability comes at the cost of consistency. High availability databases are eventually consistent, and thus are notorious for dirty reads: allowing uncommitted changes from one transaction to affect a read in another transaction. As a result, the grocery chain was unable to produce an accurate count of milk on the shelves.

The below exercise introduces the reader to transactions while highlighting the challenges of dirty reads.

The exercise

Get the tests to pass!

Look for todo items in the codebase to get started.

Quick start

The below steps walk through the environment setup necessary to run the application in both local and production environments.

Install dependencies

  1. Install PostgreSQL.

    brew install postgresql
    brew services run postgres
    
  2. Install Flyway.

    brew install flyway
    
  3. Create a PostgreSQL database.

    createdb
    

Set up the test environment

  1. Create the milk_test database.

    psql -c "create database milk_test;"
    psql -c "create user milk with password 'milk';"
    
  2. Migrate the database with Flyway.

    FLYWAY_CLEAN_DISABLED=false flyway -user=milk -password=milk -url="jdbc:postgresql://localhost:5432/milk_test" -locations=filesystem:databases/milk clean migrate
    

Run tests

Use Gradle to run tests. You'll see a few failures at first.

./gradlew build

Set up the development environment

  1. Create the milk_development database.

    psql -c "create database milk_development;"
    
  2. Migrate the database with Flyway.

    FLYWAY_CLEAN_DISABLED=false flyway -user=milk -password=milk -url="jdbc:postgresql://localhost:5432/milk_development" -locations=filesystem:databases/milk clean migrate
    
  3. Source the .env file for local development.

    source .env
    
  4. Populate development data with a product scenario.

    psql -f applications/products-server/src/test/resources/scenarios/products.sql milk_development
    

Run apps

  1. Use Gradle to run the products server

    ./gradlew applications:products-server:run
    
  2. Use Gradle to run the simple client

    ./gradlew applications:simple-client:run
    

Hope you enjoy the exercise!

Thanks,

The IC Team

© 2023 by Initial Capacity, Inc. All rights reserved.

A workshop by

Initial Capacity