#______________________________Install Packages
install.packages("tidyverse")
install.packages("skimr")
install.packages("janitor")
install.packages("ggplot2")
install.packages("gt")
install.packages("webshot2")
#______________________________Load Packages
library(tidyverse)
library(skimr)
library(janitor)
library(ggplot2)
library(lubridate)
library(stringr)
library(gt)
library(webshot2)
#______________________IMPORT DATA FROM MONTHS 3 TO 4 OF 2016
getwd()
<- read_csv("~file.csv")
dailyactivity_df_3_4 glimpse(dailyactivity_df_3_4)
OUTPUT: 457
Rows: 15
Columns$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366
$ ActivityDate <chr> "3/25/2016", "3/26/2016", "3/27/2016", "3/28/2016"
$ TotalSteps <dbl> 11004, 17609, 12736, 13231, 12041, 10970, 12256,
$ TotalDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.87,
$ TrackerDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.87,
$ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
$ VeryActiveDistance <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.32,
$ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.83,
$ LightActiveDistance <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.64,
$ SedentaryActiveDistance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
$ VeryActiveMinutes <dbl> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 36,
$ FairlyActiveMinutes <dbl> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18,
$ LightlyActiveMinutes <dbl> 205, 274, 268, 224, 243, 223, 239, 200, 244, 314,
$ SedentaryMinutes <dbl> 804, 588, 605, 1080, 763, 1174, 820, 866, 636,
$ Calories <dbl> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 1868,
WellaBeat
This project was also done in SQL, which can be found at WellaBeat - SQL;
Business Case
Founded in 2014, Wellabeat is the company that developed one of the first wearables specifically designed for women and has since gone on to create a portfolio of digital products for tracking and improving the health of women.
Focusing on creating innovative health and wellness products for women, their mission is to empower women to take control of their health by providing them with technology-driven solutions that blend design and function. Giving women the tools to reach their fullest potential through personalized wellness solutions aligned with their cycles. Here are the products that WellaBeat offers:
Wellabeat app: The Wellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Wellabeat app connects to their line of smart wellness products.
Leaf: Wellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Wellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Wellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Wellabeat app to track your hydration levels.
Wellabeat membership: Wellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
Here are some of the metrics that the app tracks based on information found on their site:
- meditation rate
- starts tracking at “first step”
- starts tracking at “first shut eye”
- ovulation cycle, temperature, pregnancy term….
- cardiac coherence!!??
- stress tolerance, once again is based on heart rate and temperature levels?
- activity level
Purpose
- The goal of this project is to analyze Wellabeat’s provided data to reveal more opportunities for growth.
- Focus on one Wellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart non-Wellabeat devices.
- Then, using this information, make high-level recommendations for how these trends can inform Wellabeat marketing strategy.
- My report will be directed to the executive team:
Urška Sršen: Wellabeat’s cofounder and Chief Creative Officer
Sando Mur: Mathematician and Wellabeat’s cofounder, key member of the Wellabeat executive team
Deliverables
I will produce a report with the following deliverables:
- A clear summary of the business task.
- A description of all data sources used
- Documentation of any cleaning or manipulation of data
- A summary of my analysis
- Supporting visualizations and key findings
- Top high-level content recommendations based on the analysis
Business task
The executive team believes that analyzing smart device fitness data from other products on the market could help unlock new growth opportunities for the company. I have been asked to focus on one of Wellabeat’s products and analyze external smart device data to gain insight into how consumers are using their smart devices. The insights I discover will then help guide marketing strategy for the company.
Scope/major activity
Data Collection
FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle dataset contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
It is believed that the provided dataset has limitations.
Identify One Product
After reviewing WellaBeats site I realized that the information provided by the company differs from what is actually displayed on the current site. I’ll use of the following products to focus the analysis. All of the listed products provide the same data to the user, the varying features are the types of devices they are:
Ivy+ – wrist wearable tracker
Leaf Urban – could be worn as necklace, in a wrist band
Leaf Chakra -could be pinned to a collar, blouse, placed in wrist band
Create Recommendations
- What are some trends in smart device usage?
- How could these trends apply to Wellabeat customers?
- How could these trends help influence Wellabeat marketing strategy?
Ask
There are many questions I have that went unanswered;
- Why are we using data from 2016?
- The data provided appears to target the fitness/workout community: steps per day, workout type, workout duration… while WellaBeat appears to be more of mental wellness for pregnant women!?
- Is the company thinking about branching into that highly competitive domain?
- Why doesn’t WellBeat provide its own data for analysis?
- Does WellBeat collect data?
- Does WellBeat know how to track the metric most used or viewed by its users?
- Does WellBeat know how to track pages in the app that are most viewed by its users?
- Why isn’t the science behind the claims described on the site more transparent? Being open with the science behind the metrics might create a loyal customer base!
- WellaBeat claims it targets women, does WellBeat know exactly how many of its users are women?
- Does WellaBeat know the age range or concentration age of its user?
- The site makes it obvious that the target market are pregnant women, does WellaBeat have any idea how many of their users are/were/will be pregnant?
Prepare
Data Background
- Data is old: 3/2016 to 5/2016
- Covers 30 users of FitBit Fitness tracker
- Period of 62 days
- After reviewing Wellabeat’s site it appears that the data provided is geared more towards fitness rather than monitoring what Wellabeat monitors
- Workouts metric used by WellaBeat is in common with the data provided
- Activity level and calories burned are additional common metrics
- One very important metric in the data is the manual weigh-ins that’s not been tracked by WellaBeat
- The data provided shows activity levels, distance walked, calories burned…. are those manually triggered by the user or does the product continuously collect data? What I need to know is if the user actually turns on the data tracking manually or does the app track continually. I’d like to know how invested are the users. In other words if the app tracks all that information automatically, then how do we know if the user pays attention, reviews the data or makes life decisions based on what’s been tracked? In other words, is the data provided for ananlysis helping any of the users, or just metrics that the user doesn’t review. If the user does review it then how do we know how many of the users actually review the data, how helpful are these metrics to the user? Are we solving a problem? I have a nike app on my phone I used 2-3 times to track my running path, length and distance. I haven’t used it in years and yet it still tracks my every day movements! That’s data it collects that I never look at or even use the app.
- Data doesn’t mention the gender, race, or age of the 30 users
- Activity level measured in the app? how is that actually calculated? is the app linked to an exercise device?
- Does the app track the distance traveled by the phone and if the movement is greater than a specific speed it assumes it’s a hight intensity workout?! ….
- With all these questions, it’s obvious that there is a clear bias in the data
Process
Activity Data
Import Data
Filter Data
#_______________________Filter the dailyactivity to few columns and
<- dailyactivity_df_3_4 %>%
activity_3_4 select(Id,ActivityDate, TotalSteps, TotalDistance, Calories)
Convert Data Type
#______________convert ActivityDate from <char> to <date> and group by Id
<- activity_3_4 %>%
edited_activity_3_4 mutate(mDate = mdy (ActivityDate)) %>%
group_by(Id) %>%
arrange(mDate, .by_group = TRUE)
glimpse(edited_activity_3_4)
OUTPUT: 457
Rows: 6
Columns: Id [35]
Groups$ Id <dbl> 1503960366, 1503960366, 1503960366,
$ ActivityDate <chr> "3/25/2016", "3/26/2016", "3/27/2016",
$ TotalSteps <dbl> 11004, 17609, 12736, 13231, 12041,
$ TotalDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16,
$ Calories <dbl> 1819, 2154, 1944, 1932, 1886, 1820,
$ mDate <date> 2016-03-25, 2016-03-26, 2016-03-27,
Observations
The obvious path would be to visualize the correlation between all the metrics collected in the data such as calories burned vs workout type, workout length, workout load type… but in my opinion that would be a waste of time and is irrelevant to our business task. It’s been known for centuries what the relationship is between working out, workout load, time under stress, and calories burned.
Each variable contributes directly to calories burned, we are not doctors nor would 30 users monitored for 62 days change the science, so I will not spend time on the majority of the data provided.
My focus will be on UX
- Does the product provide a solution to the users?
- Is there a benefit to the user for using the product?
- There is one metric that’s very important and displays active interaction with the users and that’s manual weigh-in
- For the user to take the time of getting weighed, and then manually enter the information in the app is extremely important. It’s possible that the app used to collect the data could be linked to a bluetooth or wifi enabled scale and automatically logs in the weight once a value is read!?
- That would be a great service/device that could be a focus for WellaBeat.
Weight Data
Import Data
<- read_csv("~/weighin.csv") weightloginfor_df_3_4
Convert Data Type
- We’ll extract the needed columns
- Convert Date column
as_date
- Save new date data in mDate
- Group by Id
- Sort by Id
<- weightloginfor_df_3_4 %>%
edited_weighin_3_4 select(Id, Date, WeightPounds, BMI) %>%
mutate(mDate = as_date(mdy_hms (Date))) %>%
group_by(Id) %>%
arrange(mDate, .by_group = TRUE)
Observations
- What’s interesting is that 33 out of 30 users logged in to the manual weight-in page for March. Do we assume some are duplicates?
- Maybe they created multiple accounts?
- Is it a requirement for all users to login to the weight-in page before the app starts tracking and making calculations?
- Are the calories burned calculations based on weight of the user?
- Is it possible that once the user installs the app and create an account it automatically shows that they logged in to the weigh-in page?
- I need to dive into this data more and see how many of the users may have entered their weight and how many just logged in and never entered their weight.
- Is this data from newly enrolled users? or
- Was the data extracted from a random timeframe?
Merge Datasets
- Let’s merge both sets using the common Id and mDate
# _____________ MERGE BOTH DATASETS
<- left_join(
merged_3_4
edited_activity_3_4,
edited_weighin_3_4,by =c("Id","mDate"))
# ____________ VERIFY DIMENSION
dim(merged_3_4)
1] 457 9 [
Next Month Data
- Now we’ll repeat the same steps for data covering April to May timeframe
- I’ll omit the intermediate explanations and show the code at once
#_____________________READ WEIGHT LOGIN FROM 4 TO 5 OF 2016_________
<- read_csv("~/weightLogInfo_merged_4_5.csv")
weightloginfor_df_4_5
#_____________________FILTER DATASET AND CONVERT COLUMN
<- weightloginfor_df_4_5 %>%
edited_weighin_4_5 select(Id, Date, WeightPounds, BMI) %>%
mutate(mDate = as_date(mdy_hms (Date))) %>%
group_by(Id) %>%
arrange(mDate, .by_group = TRUE)
dim(edited_weighin_4_5)
OUTPUT1] 67 5
[
#______________________MERGE BOTH 4 TO 5 DATASETS
<- left_join(
merged_4_5
edited_activity_4_5,
edited_weighin_4_5,by =c("Id","mDate"))
dim(merged_4_5)
OUTPUT1] 940 9
[
#______________________MERGE THE TWO MERGED TO YIELD FILTERED ENTIRE DATASETS
<- rbind(merged_3_4,merged_4_5)
merged_data_3_5
#let's look at all columns
str(merged_data_3_5)
# ____ Take out ActivityDate and Date columns
<- merged_data_3_5 %>%
data_3_5 select(Id, mDate, TotalSteps, TotalDistance, Calories, WeightPounds, BMI)
#_____________________SAVE NEW DATASET _____________________
file.create("~/bellabeat_cleaned_filtered_3_5_V2.csv")
write_csv(data_3_5,"~/bellabeat_cleaned_filtered_3_5_V2.csv")
Analysis
Logins
- Let’s see how many users actually logged in to input information
#__________________LET'S SEE HOW MANY USERS LOGGED IN TO THE WEIGHIN APP
<- data_3_5 %>%
logins group_by(Id) %>%
summarize(number_of_logins = n()) %>%
arrange(desc(Id)) %>%
gt()
gtsave(logins,"logins.png")
Weighed in
- Let’s see how many users took the time to weighin and input their values in the application
#__________________LET'S SEE HOW MANY OF THE USERS ACTUALLY ENTERED A WEIGHT (WEGHINS)
<- data_3_5 %>%
weighins ungroup() %>%
filter(WeightPounds != "NA") %>%
group_by(Id) %>%
summarize(number_of_weighins = n()) %>%
gt()
gtsave(weighins, "weighins.png")
- As you can see 13 out of the 35 users entered their weights in the application
Logged and Weighted in
#________________LET'S JOIN THE TWO TOGETHER TO SEE HOW MANY LOGGED-IN AND WEIGHED-IN
<- left_join(logins, weighins,by = ("Id")) %>%
login_weighin arrange(desc(number_of_weighins)) %>%
gt()
gtsave(login_weighin, "logged_weighed.png")
Observations
- Out of 30 users that were tracked (the data show 35 users which might be from duplicate accounts?!) so out of the 30 users 13 actually logged in to the weight app on a regular basis as is displayed in the number of logins column
- Over a 60 day period you can see in column 2 that the users logged in on an extremely consistent basis, 30, 40 and over 50 times
- 13 users entered their weight in the app. Once again, I really have no way of knowing if the weight-in side of the app is any different from the main app, but I have to assume that it is because the login information is included in the weight-in data not in the daily activity data.
- 6 out of 13 entered their weight in 1 time over a 60 day period. I need to follow up on these users to inspect their activity levels over the time tracked. Maybe their lone entry could be explained by their acitivity level.
- 5 of of the 13 entered their weight multiple times. Again over a 60 day period you really don’t expect much change in weight unless you are trying to actually gain or lose it. So once again we need to investigate their activity level to see if there is a correlation between multiple entry and activity, maybe they started or finished on a program?
- 1 user had 33 and another user had 44 weigh-ins. That seems to be extreme as to someone is either on a hardcore weight loss program or maybe pregnant or possibly post pregnancy. Once again activity level needs to be investigated.
- Overall 13 out of 30 users is a great percentage and that appears to be a function that provides a great UX and needs to be added to BellaBeat.