Let's get the ball rolling right off the bat with an example!
The Azure cloud can be used by companies that need to process data to help run their business.
Example: “Jenny’s Used Books” buys and sells used books in Chicago IL. USA.
Business is going well, and Jenny now has 3 total business locations.
The general business model is to buy books at 5-15% of the cover price depending on condition and assumed resale demand. This takes a store employee with a few years' experience to make an assessment and offer a price to the customer selling them their unwanted used books.
Then the books are marked with a sticker for 50% of their original cover price.
Now we get to the data problem:
Based on past sales figures, how many books can Jenny’s stores buy for resale in a month?
For now, let us simplify a couple of things:
All books will be bought at 10% of the cover price
We will Not break the books into categories, Westerns, Science Fiction, Romance, Science and Nature, etc. (for now... Just to make the exercise less challenging)
Let us start with January 2020 for our first month
We will convert any store credits to dollars and “pretend” we did not give store credits, It’s cash only for this exercise.
Check out the video to really get an understanding of how the Cloud data pipeline works!
Now let us do a Verbal flowchart
Gather the same data at all 3 stores.
Sum of used books purchased per store.
Total amount paid for #1 above
Total number of books sold per store
Sum of the dollar amounts received for sales.
With the data gathered at each of the 3 individual stores, let us now send the data from the 3 stores to a temporary storage location “in the cloud”.
So, this new cloud account will now “Ingest” the data:
Let us clarify that “Ingest” is the first step of a “Data Pipeline.”
Here is the Data for January for the 3 Chicago locations:
“JNC” Jenny’s North Chicago location: $2200 bought $8000 sold
“JCC” Jenny’s Central Chicago location: $2500 bought $7500 sold
“JSC” Jenny’s South Chicago location: $1500 bought $6000 sold
Lets send the data from the 3 locations to “the cloud” and call it Data Ingestion.
Now that we have some data coming in, it's time to expand our pipeline.
So, what else needs to be done?
Well, we need a place to store the data. So we need to:
“Process” the data, for instance sum up the total cost of books purchased from Jenny’s North Chicago, Jenny’s Central Chicago, and Jenny’s South Chicago. (as well as total cost of books sold).
Print out a report for Jenny, or her Management
(If we are asked to, we can give an opinion on what she should do.)
Let;s use a flowchart:
Well, this doesn't get any simpler. (Or less useful to Jenny)
Obviously, this process needs a lot of improvement to be of value to the management decision process.
So.. What Have we learned?
Actually, we did hit our goal of visualizing an extremely simple (if not useful) cloud data pipeline! (Yay!)
Questions? comments? Contact me (or your instructor, or your fellow students).