Veteran's groups that can analyse the facts - Can help other veterans!

As Guy Parent, the Canadian Veterans Ombudsman in 2015 would say,

"There is nothing more powerful than freeing the facts. Facts arrived at by rigorous research and evidence-based analysis generate and focus debate. They empower citizens and enhance citizen engagement with government. This combined effect cannot easily be ignored, and it creates the conditions needed to shape public policy."

We need to think about the data we need to know, prioritize the questions, collect, clean, analyse and present so that leaders, public servants and public. To help the veteran and citizen

Data Challenge #2 - Maccas

Challenge your Data Analytics skills!

How it works:

Each fortnight we will present a data challenge to be solved. These challenges will be a mix of Excel, Power BI, Tableau and SQL and are designed to increase your proficiency in skills across the data pathway. We encourage everyone on the Data Pathway to take a crack and challenge yourself!

The winners will be announced on the SitRep, fortnightly on Fridays at 4 pm AEST.
The stream can be found here: https://www.twitch.tv/withyouwithstream
and winners will receive a badge to proudly display on the WYMW Data Discord!

Challenge #2 - Maccas

Download the Dataset here

Scenario

You are working with a team of Data Analysts currently doing a health survey of fast-food chains in the USA. You have been given Maccas (McDonalds) to analyse.
Your task is to determine the correlation between various metrics such as protein, sugar, fat etc - and effectively plot the relationship.

Challenge

Using Excel, Power BI or Tableau, complete the following:

1. Create a Correlation Plot to answer the following questions:

Q1 - Correlation visualisation of Calories, Calories from fat, Carbs, Cholesterol, Dietary fibre, Protein, Saturated fat, Sodium, Sugars, Total fat and trans fat
Q2 - Identify the correlation between sugars and carbs - Is it correlated? If so, is it positively or negatively, strong or weak correlation? 
Q3 - Create a slicer showing the Correl between Protein and total fat within the Beef and Pork Category
Q4 - Using the slicer show the Correl between calories and carbs within the breakfast menu category
Q5 - Using the slicer find the ratio of protein to sugars between within the Iced mocha menu item 
Q6 - Create a dashboard with a Card, slicer and visualization demonstrating the correlation between 2 different measurements within 2 different menu categories 

2. Create a Scatterplot to answer the following questions:

Q1 - Create a scatter plot identifying protein and total fat adding a trend line, is there a correlation?
Q2 - What is the median of the data presented and what is the average of the data presented?
Q3- Add 95 percentile lines on the scatter plot, which items are within this range?
Q4 - Which item is within the 99th percentile range?
Q5 - Create a dashboard with visualisation of scatter plot, correlation card and slicer (for menu category and item category)

Lastly, to practice your skills, you can complete the following questions:
(note these last questions will not be assessed for the challenge, but are good practice!)

Q1- AVG total calories of each ‘category’ menu item -- (which has the highest avg)
Q2 - AVG total calories of Fat in each ‘Category’ menu item --(which has the highest avg)
Q3 - Total fat in one Specific Menu category -- (specific which menu item this is)
Q4 - Percentage difference between highest total fat in breakfast item compared to lowest total fat in breakfast item)
Q5 - Count of each ‘category’ of breakfast item -- ( which item has the highest count of menu items/ which has the lowest)

Submit your answers

Submissions close 04/12/2020 at 12:00pm AEST.
Submit your answers directly to Jimmy via Discord.
Good luck!

If you’re having any difficulty, jump into our Data Analyst Discord Server and join the discussion!

Not yet on the Data Analytics pathway? Click the button below to explore our courses:

Tip for Tracking Cell Refences in Excel

If cell referencing in Excel sends you cross eyed then this little tip will get you out of trouble. So good even the little birdie wouldn't stop singing about it!

Data Noob - Data Warehouse Vs Data Lake

When I  stepped in to the Data Analytics world I encountered a mind boggling array of confusing, apparently ambiguous and difficult to decipher terminology. All sorts of cute acronyms, dazzling visualisations, pretty dashboards and "snake oil" solutions to business pain points.

Be gentle with me - I'm a Noob in this space but I'll welcome conversations in the interest of learning more.

What on earth are Data Lakes, Data Warehouses, Data Clouds, Data Deserts, Data Silos, Data Swamps? Data is the new oil? Yes, Data Ecosystems are a thing. I haven't found evidence of a Data Porcupine yet - but if one exists it will probably be in the Cyber Security world.

Let's start with a Data Warehouse. A country woman, I've found a surprising real world equivalent to a Data Warehouse. I've included in this article a picture of a fairly complex Bulk Grain Silo Installation which works fairly well as a rough visualisation of how a Data Warehouse operates.

Grain handled in bulk is stored in silos which generally only contain one type of grain - for example, wheat, corn, or barley. Types of grain for storage are designated to silos before loading. Grain is loaded and unloaded in a highly organised, linear process. One truck lines up behind another to load to, or out of the designated silo. Grain is moved around the installation using either augurs or conveyor belts. Grains can only move in pre-determined directions - up, down, in, or out. Grain has to move in pre determined paths before it can be mixed or processed. Grains may be mixed or processed in some way to produce a designated value added product. At some point a human must be involved to ensure that the correct raw materials are mixed together to produce a saleable, quality assured, digestible product. When bulk grain moves it has viscosity - it flows and it gets blocked. Movement, age and use degrade quality. Grain has to be cleaned and preserved against invaders.

If you call a grain a piece of data you can get an idea of how a data warehouse stores and processes data. Designate the wheat silo "Sales Data". Designate the corn silo as "Costs". Call a conveyor a "Search" and call a feed mixer a "Report, or an Insight". Call the retail packaging a "Visualisation".

Can you see the similarity?

By contrast a Data Lake is a much more fluid thing. First of all you can store a lot more in it and the potential for scaleability is a lot higher. They can exist anywhere the "landscape" allows for them. You can draw a cup of water, or if you have the right tool you can empty the whole lake and wait for it to refill. You can draw water from any location in the lake. The water will have trickled or flowed into the lake from any number of sources - eg clouds, rain, runoff, soakage from underground, illegal industrial dumping, rivers, or some kid emptied their lunch water bottle in to it. The water may have been in the lake for a hundred years, or it may have filled yesterday from flooding. Yes - people do talk about data swamps - where the water is muddy, sludgy and slow moving.

Call a water molecule a piece of data. Clearly in a Data Lake you need special tools to refine, package and move the water. The data does not become actionable information/insight until it is packaged in a digestible fashion. On the upside you get access to datasets only limited in size by the amount of storage you have available.

Do you see the connection?

Data warehouses are structured and organized but have limitations in terms of size and scaleability. In a data warehouse the interrelationships between data types must be pre determined - not so in a Data Lake. Data lakes are huge, flexible, and scaleable. You get access to a huge dataset but it takes more effort to structure the data in order to produce insight and understanding. In some ways a Data Warehouse is often needed within a Data Lake to structure information so it can be interpreted by a human being. Humans beings generally struggle to reduce unstructured data to a digestible form by ourselves - there are simply too many things to look at. Data Lake analytic tools basically take unstructured data and turn it into structured data so human beings can interpret it.

Of course this is a very over simplified version of the technicalities of how Data Warehouses and Data Lakes operate. We all need start somewhere right?

At least that's how this Data Noob sees things.

Fortnightly Data Challenge #1 - Museums

Challenge your Data Analytics skills!

How it works:

Each fortnight we will present a data challenge to be solved. These challenges will be a mix of Excel, Power BI, Tableau and SQL and are designed to increase your proficiency in skills across the data pathway. We encourage everyone on the Data Pathway to take a crack and challenge yourself!

While you will be able to compete in the challenges with just the skills learnt in the course, to really excel and come out on top we encourage you to do some research of your own! After the challenge is complete, I (Jimmy) will go step-by-step through the challenge with a video walkthrough. Submit your answers through Discord - and follow along for all updates and discussion!

The winners will be announced on the SitRep, fortnightly on Fridays at 4 pm AEST.
The stream can be found here: https://www.twitch.tv/withyouwithstream
and winners will receive a badge to proudly display on the WYMW Data Discord!

Challenge #1 - Museums

Download the Workbook here

The workbook contains the following sheets:
1. museums_USA
Your dataset for the challenge. Contains name, type, location, income and revenue for museums in the USA.

2. challenge questions
This sheet has 10 questions to be answered using COUNT, SUM and AVERAGE formulas, including IF and IFS.

3. Selector Challenge
This sheet contains the scenario, the challenge and points score.

Scoring

You will be scored on both the Challenge Questions (score out of 10) and the Selector Challenge (score out of 5) for a total of 15 points.

Selector Challenge

Scenario

You are a Data Analyst at Museum HQ™ in the USA.

You keep recieving requests from managers of states asking: 1. How many museums are in their state, 2. the sum and average revenue per museum, and how does this differ by museum type.

Rather than querying the data each time you receive a request, you have decided to build an interactive worksheet for users to make a selection and receive the metrics they need. 

Challenge

Build an interactive worksheet that shows the Count of Museums, Sum of Revenue and Average of revenue for the selected Museum Type and State. You can use the cells to the left or create a new worksheet. Points are awarded based on the levels of interactivity for users as per the below table. 

Points Score: Levels of Interactivity

Users can type or copy/paste directly into cells1 point
Users can select from a dropdown2 points
Users can type & the dropdown list filters3 points
Allow the use of wildcards+1 point
Users can select any number of selectors+1 point

Submit your answers

Submissions close 30/10/2020 at 12:00pm AEST.
Submit your answers directly to Jimmy via the Data Discord.
Good luck!

Not yet on the Data Analytics pathway? Click the button below to explore our courses:

What would an ex military cowgirl do in a Hackathon team?

I first saw the word "Hackathon" and thought it might have something to do with clearing scrub. You know - a bunch of people get together to hack away at weed eradication. I saw the term "Pen Testing" and thought "don't they test the Pens before they leave the factory?"

When one of the jet techie types I work with suggested I join a Hackathon team (after gently explaining what it all meant with a straight face) my first thought was:

"What could a middle aged, ex Army, office lady/cowgirl with almost no understanding of cyber security possibly contribute to a gun Hackathon Team of Cyber Security Jets?" So I rounded up my friendly neighbourhood Cyber Security Jet and asked the question.

Turns out the answer is: "quite a bit".

So what is a Hackathon?

Basically, a mission/objective is set and teams are put together with the desired skill sets to achieve it. Teams may require a  whole range of skill sets including Data Analysts, Penetration Testers, Cyber Security Specialists, Database Operators, Baristas (coffee is a must), IT Technicians, Project Managers, Team Captains, Subject Matter Experts - any skill set relevant to the objective.  You may be unable to operate a computer and still be a key member of a Hackathon Team. The sneaky mindset of the successful wargamer or tactician is highly valued. If you're not so great on a computer or new to the scene you still bring a very valuable objective perspective to the table and add to the diversity of the team.

No different really to the way military teams are organised and configured really. If you need a driver, a demolitions expert and an interpreter then that is how you configure your team.

Mission types vary. Cyber Security and Penetration Testing are perhaps the best known. Hackathons are also used for Intelligence Gathering. Sometimes they have Programming/Coding Competitions - sometimes it's about developing new applications or showcasing new ideas. Hackathons are fantastic networking opportunities for industry specialists and a great way to practice/display your skills - especially if you win! You can use them to gain experience for your Resume and make business contacts.

Teams are usually put together well before the event to allow for troops to task allocation, setup and rehearsal. Most of the time a reasonably decent laptop with webcam/mike and standard internet connection are all you need. Most of the popular operating systems are more than up to the task. My Cyber Security Jet says that YouTube would be your best friend when it comes to learning the software. You should expect to have to download software.

So what can someone like me offer the team? Analytical mindset, curiosity, communication skills, organisational skills, leadership skills, life experience, strategic and tactical problem solving, "think outside the box" creativity, agile mindset, diversity, range, motivation, drive and tenacity. As a recent major Hackathon Winner said to me - one of the advantages Veterans have is that we just keep trying  until we find a solution.

You better believe that a veteran can contribute to a Hackathon - regardless of the level or nature of your computer skills. I'm just about to enter my first Hackathon - I'll let you know how it goes.

'Bad data in - Bad data out'

For those veterans that are still wondering what data analysis is and what role they can play! Here is a quick story to get you read in.

Veterans out there probably look at the chart below and wonder...what are these people talking about?! Every time I talk to someone new to the space they say - "Data Analyst - that is like Artificial Intelligence right." The answer is - 'not really - the machine learning and AI algorithms are only a small percentage of data analysis.' You can't enable the top of the pyramid if the foundation and the lower rungs are not constructed properly. I will explain....

A very complicated picture.

In the Army, you need to organize how you bring in data to the decision making process. If the section commander needs to make a decision, he needs to collect it - shots fired, 'what was that', ask questions and demands 'send situation report' - move/store it - write it on their map, send by radio - transform it - prepare a contact report and/or medical evacuation request - aggregate it - make a decision, call for help - learn from it - After Action Review - optimize it - run the scenario in a simulation 1,000 times and see if there were better options - this is also done through repetitive, challenging and interesting training.

So what? If you are collecting the wrong/inaccurate information, not storing it properly, preparing incomplete or faulty reports, that lead to bad decisions that support zero learning - good luck optimizing. 'Bad data in - Bad data out.' You need to properly collect, store, transform and aggregate to make good decisions, learn and optimize.

Computer Errors: What can you do?
After pushing the optimize button on Artificial Intelligence using 'bad data in - bad data out'.

There is a risk when we all move toward the shiny AI and machine learning solutions and think that it will solve our problems. Small Wars Journal points out that, "The real concern, though, is that military leaders may not comprehend significant risks associated with blindly using such tools." Getting a computer programmer to feed a bunch of data into a machine learning tool - while asking 'what's a battle?' is not going to optimize the problem. That is because, they don't understand the problem. This is where you - the veteran - is critical in government, business or military to solve problems.

Further, as Deloitte points out, there is a concern of data overload. Sending the section commander 10-figure grids by voice, the color of the sky and everyone's favorite sports team during a fire fight is not going to help. As Deloitte points out, Commanders want to know “How does any of this information help me understand . . . what [operational] decisions are needed? Most of this is just information without analysis.” They will need your help. They need an analytics translator.

You are able to use the four step process below to understand the data. You define "I want to know where is the enemy hits the section?" - transforming the data by getting grids, contact reports, interviews with the section to understand their perspective - you analyse what you have and communicate your findings. As of today - a computer can't do this. They need real world problem solvers to investigate and ask the right questions. That is where you come in. As Deloitte points out in the section "Anticipating enemy fire: Mission analytics for threat assessment," if you do data analytics properly and define the right questions, you can start to know where the hostile actions will occur before they happen. That is what you would provide the company/government/military when trained. You are working out what information is important to the organization to make the right decisions - preferably before the firefight.

WYWM Basic Excel and Analysis Skills - Common Errors - Rallypoint

What is your role in Data Analytics? It is asking the right questions, seeing what data is out there and then asking more questions....WYWM will give you the tools to help transform, analyse and communicate - but the tools are useless if you don't define the problem. If the section members are feeding the Section Commander 'bad data in,' then he will be forced to give 'bad data out.'

Working in Kandahar Afghanistan - helping everyone do data analysis...

Caleb

Day in the life of Data

What is a day in the life of Data Analytics like?

Glad you asked! Our very own David Lotzof is here to pull back the curtain and give you insight into the career field.

(Apologies in advance for my lack of video/audio editing skills)

Be sure to comment with what you think!

Data and Ethics

Week 6 Basic Excel and Analysis - How Ethics and Data go together

I realise this isn’t really excel or data analysis. I do think it is particularly important with in the data industry to know why and how Data and Ethics should come together. While at the core of our company is to be transparent – it would seem, this is not always the case.

What is Ethics?

Ethics - moral principles that govern a person's behaviour or the conducting of an activity.

Here’s why I think Ethics should be talked about a whole lot more.

A recent survey has suggested that on average people delete or ‘unsubscribe’ from up to 39 emails per month they are sure they have not signed up to. I know for sure I would have a similar number… there are hundreds of unread emails sitting in my junk mailbox. This suggests to me that many companies we interact with in our daily lives are quite happy selling our personal data to anyone who will pay for it.

What does this do to your company’s reputation and the loyalty of your customers? Right at this moment - not a lot. Most companies will not openly admit or state their intentions on the ‘subscribe’ page before you click the button. It is written in the fine print – but who has time to read all of that? They bank on this. There still seems to be a lot of ignorance within the general population that suggests they have never even given the subject much thought.

I had a recent conversation with a cyber security specialist, he suggests we need to protect ourselves – essentially because no one else will. His biggest concern is how our data is being stored and what has been implemented to keep it safe. I totally agree with everything he was saying, but I am also a bit of an optimist. Surely laws to protect everyone would be just another level of protection to prevent the misuse of people’s personal data.

The law hasn’t quite caught up, and I guess will always be on the back foot when it comes to technology. The tech industry moves at such a rate it would be impossible to have laws already in place before for all eventualities. I feel as a company it is incredibly important to implement high level of ethical standards regarding data from the very beginning.

The whole point of a company is to make money providing a service or product. It’s that simple. Everyone wants to make money the lines get blurred easily if this is a company’s sole target.  

Here are a few stats that I found interesting.

Only 20% of people surveyed were slightly concerned about their data being collected by companies but a whopping 70% said they were concerned about the government collecting their data. However, when asked what type of organisation they trusted the most with their data Government Bodies and Non-Profits scored the best with 40% each.

What does this mean? Possibly that we may not want the government learning our little secrets but trust that once they have it, we trust them not to use it. Kind of like a child and parent - you don’t really want your folks finding out what you've been up too, but you know they will love you anyway.

If you would like to read more about data and ethics there are a lot of interesting reads across the internet. I have included some good references, pod casts and books to get you started.

Another one of our Associate Instructors introduced to me incredibly interesting podcast called Blind Insights by David Olney. I advise checking both Mel and David out if you are in any way interested in a career with Data.

Mel O’Sullivan Author Page – Find all of Mel’s articles HERE.

If you have a few minutes to spare, please complete the following survey.

Create your own user feedback survey

For more information about how Personal Data is being stored and how you can protect yourself go to the following sites.

Australian Privacy Foundation – Personal data https://privacy.org.au/resources/fundamentals/data-collection/#:~:text=Data%20Collection%20Privacy%20laws%20generally%20include%20a%20requirement,contain%20requests%20for%20personal%20data%20that%20is%20irrelevant.

Accenture
https://www.accenture.com/au-en/insight-data-ethics

Links to previous videos:

Week 1 – Cleaning Data

Week 2 – Merging Data and Pivot Tables

Week 3 – Averages and Percentages

Week 4 – Building a Report

Week 5 – Common Errors

Thanks for watching this 6-week series on Basic Excel and Analysis skills. If you have any questions, please get in touch. You can Book a Call with me or drop me an email.

Christine
Associate Instructor -Data
With You With Me

WYWM Basic Excel and Analysis Skills - Common Errors

Common Errors in Excel

This week we will be looking at a few of the common errors and problems that we may have encountered while merging data from different spreadsheets and performing some of the formulas. Excel is a fantastic but doesn’t always work the way we expect. The newest versions are fantastic and sometimes will not allow you to complete a formula if it can identify any errors.

When copying formulas from another spread sheet sometimes when you close the spreadsheet you copied from the values can appear as '#REF!'. To fix this you need to copy the formula as values. To do this, perform the look up or formula in you first column. Insert a second column, copy and paste in the new column as values.

Common Excel Errors

  1. ######## - Column isn’t wide enough
  2. #NAME? – There is text in a formula that excel cannot recognise.  This could be a spelling error
  3. #VALUE? – There is an error within the cells being used to calculate the formula
  4. #NULL? - There is a missing command within your formula
  5. #REF! – Missing or deleted parts of a formula or from another workbook that is no longer open.

Clearing the formatting in a column or cell is sometimes the easiest way to correct formatting problems. These errors mostly occur when multiple people are using the same spreadsheet. Or when Copying and pasting from several different sheets. To clear the formatting from a cell or group of cells follow these steps. Select the group of cells, go to the home tab, go to the editing group section, click clear, select formatting.

Excel is always improving its functionality and usability. Most errors are simple typos or errors that take not much time to fix.

I hope this video and article has helped you understand why a few of the most common errors occur and more importantly shown you where to look so you can easily correct them.

Week 1 – Basic Excel and Data Analysis – Cleaning Data

Week 2 – Basic Excel and Data Analysis – Merging Data and Pivot Tables

Week 3 – Basic Excel and Data Analysis – Averages and Percentages

Week 4 – Basic Excel and Data Analysis – Building a Report

Author Page

If you would like to learn more about excel or talk to me about all things Data or what WYWM courses could do for you – Book a call with me.

See you all next week,
Christine
Associate Instructor - Data