Home

Intro to the "Cloud data pipeline" concept

Tango Brett - February 9, 2022
Oil pipeline to Data pipeline analogy
Let's get the ball rolling right off the bat with an example!

The Azure cloud can be used by companies that need to process data to help run their business.

Example: “Jenny’s Used Books” buys and sells used books in Chicago IL. USA.

  1. Business is going well, and Jenny now has 3 total business locations.
  2. The general business model is to buy books at 5-15% of the cover price depending on condition and assumed resale demand. This takes a store employee with a few years' experience to make an assessment and offer a price to the customer selling them their unwanted used books.
  3. Then the books are marked with a sticker for 50% of their original cover price.
Now we get to the data problem:

Based on past sales figures, how many books can Jenny’s stores buy for resale in a month?

For now, let us simplify a couple of things:

  1. All books will be bought at 10% of the cover price
  2. We will Not break the books into categories, Westerns, Science Fiction, Romance, Science and Nature, etc. (for now... Just to make the exercise less challenging)
  3. Let us start with January 2020 for our first month
  4. We will convert any store credits to dollars and “pretend” we did not give store credits, It’s cash only for this exercise.

Check out the video to really get an understanding of how the Cloud data pipeline works!

Used bookstore transactions - (Hey this isn't an art class)
Now let us do a Verbal flowchart

Gather the same data at all 3 stores.

Being:

  1. Sum of used books purchased per store.
  2. Total amount paid for #1 above
  3. Total number of books sold per store
  4. Sum of the dollar amounts received for sales.

With the data gathered at each of the 3 individual stores, let us now send the data from the 3 stores to a temporary storage location “in the cloud”.

So, this new cloud account will now “Ingest” the data:

"Ingesting" data from the 3 bookstores. (Email spreadsheets to Jenny, or upload to the cloud)
Let us clarify that “Ingest” is the first step of a “Data Pipeline.”

Here is the Data for January for the 3 Chicago locations:

  1. “JNC” Jenny’s North Chicago location: $2200 bought    $8000 sold
  2. “JCC” Jenny’s Central Chicago location: $2500 bought     $7500 sold
  3. “JSC” Jenny’s South Chicago location:      $1500 bought    $6000 sold

Lets send the data from the 3 locations to “the cloud” and call it Data Ingestion.

Now that we have some data coming in, it's time to expand our pipeline.

So, what else needs to be done?

Well, we need a place to store the data. So we need to:

  1. “Process” the data, for instance sum up the total cost of books purchased from Jenny’s North Chicago, Jenny’s Central Chicago, and Jenny’s South Chicago. (as well as total cost of books sold).
  2. Print out a report for Jenny, or her Management
    1. (If we are asked to, we can give an opinion on what she should do.)

Let;s use a flowchart:

Very simple used bookstore data pipeline

Well, this doesn't get any simpler. (Or less useful to Jenny)

Obviously, this process needs a lot of improvement to be of value to the management decision process.

So.. What Have we learned? 

Actually, we did hit our goal of visualizing an extremely simple (if not useful) cloud data pipeline! (Yay!)

Questions? comments? Contact me (or your instructor, or your fellow students).

EmailBrett.Long@withyouwithme.com

In the meantime, join your peers and instructors on the Cyber Discord community!

If you want to break into the tech industry then sign up to our platform and begin your training today.

Leave a Reply

Your email address will not be published. Required fields are marked *

Join our community

We have a Discord server where you’ll be able to chat with your instructors and cohort. Stay active in your learning!
Join discord