
Fundamentals of Software Architecture for Big Data

The course is intended for individuals looking to understand the basics of software engineering as they relate to building large software systems that leverage big data. You will be introduced to software engineering concepts necessary to build and scale large, data intensive, distributed systems. Starting with software engineering best practices and loosely coupled, highly cohesive data microservices, the course takes you through the evolution of a distributed system over time.

← Back

Download the codebase


A common architecture used for data collection.

Provenance collects and stores articles from and follows a Netflix Conductor-esque architecture.


In 2016 we began a project to address some of the concerns around fake news. While most were analyzing articles, we decided to take another approach; showing consumers where their content comes from. Our goal was to bring source [journalist] context to the foreground.

The exercise

Get the tests to pass!

Look for todo items in the codebase for where to get started.

Quick start

Download the codebase.

Create a jar file without running tests.

./gradlew assemble


Run the articles component tests to see what's failing.

./gradlew :components:articles:test

Review the todo comments in the ArticlesController class and get the tests to pass. Along the way it will be helpful to use the writeJsonBody method to convert articles to json.

writeJsonBody(servletResponse, articles);


Run the endpoints component tests to see what's failing.

./gradlew :components:endpoints:test  

Review the todo comments in the EndpointWorker class and get the tests to pass. Along the way it will be helpful to use XmlMapper to convert RSS feeds to Java objects.

RSS rss = new XmlMapper().readValue(response, RSS.class);

Test suite

Ensure all the tests pass.

./gradlew build

Schedule work

Review todo comments in the App class within the provenance-server component. Create and start a WorkScheduler.

WorkScheduler<EndpointTask> scheduler = new WorkScheduler<>(finder, workers, 300);

Pro tip: review the testScheduler test in the WorkSchedulerTest class.

Run locally

Build the application again then run it locally to ensure that the endpoint worker is collecting articles.

./gradlew build
java -jar applications/provenance-server/build/libs/provenance-server-1.0-SNAPSHOT.jar 

Make a request for all articles in another terminal window.

curl -H "Accept: application/json" http://localhost:8881/articles

Run with Docker

  1. Build with Docker.

     docker build -t provenance-server . --platform linux/amd64
  2. Run with Docker.

    docker run -p 8881:8881 provenance-server

Hope you enjoy the exercise!


The IC Team

© 2022 by Initial Capacity, Inc. All rights reserved.

A workshop by

Initial Capacity