It’s been a bit more than a year since I got a fitbit and I have been pretty excited about tracking my activity and heart rate. I should say I’m quite surprised about the sleep data. Tracking sleep has been, in fact, the most exciting feature, and I now strive to get at least 7 hours of sleep per night.


I write fiction and non-fiction.
I write open-source software.
I create generative art.

All of these are available for free in different media. If you like what I do, and want me to keep creating, you can contribute using the links below.

Patreon Become a Patron!
Paypal


Let’s first see a glimpse of the data, just to know what type of data we are dealing with.

# A tibble: 5 x 8
  date_time           dateTime   dataset_time variable value total_value
  <dttm>              <date>     <time>       <chr>    <dbl>       <dbl>
1 2018-06-10 00:00:00 2018-06-10 00'00"       steps        0        7256
2 2018-06-10 00:01:00 2018-06-10 01'00"       steps        8        7256
3 2018-06-10 00:02:00 2018-06-10 02'00"       steps        0        7256
4 2018-06-10 00:03:00 2018-06-10 03'00"       steps        0        7256
5 2018-06-10 00:04:00 2018-06-10 04'00"       steps        0        7256
# … with 2 more variables: date <lgl>, time <lgl>

Density plots

Let’s now inspect the overall distribution for heart rate and step values.

When do I move?

I will start by focusing on the data for steps.

I’m curious to see what times of the day have the most activity. Because I have a quite large amount of data points (~751 K) I will use geom_hex() to count for me and simplify rendering.

Well, I should have remembered that for the vast majority of minutes (regardless of the hour of the day), the count is exactly zero. Let’s only look at the positive counts:

We see now some patches that have high activity (> 100 steps), particularly around 9:00, 12:00 and 18:00. These mostly correspond to “going to work”, “activity around lunch time (?)”, and “going home / physical activity”. For all other cases, it looks like I move around 10-20 steps per minute, regardless of the minute within the hour.

Last 10 minutes

The result above is interesting because I usually have to be reminded by Fitbit to “move up to 250 steps in the hour”. I receive this notification on the last 10 minutes of the hour and I would think that during those 10 minutes I put more steps than during the first 50. Data show I’m wrong:

That being said, I want to keep my reminder on. I feel like having it turned on definitely adds ~ 1000-2000 steps per day.

Daily average

Let’s get one level above and aggregate each day as a unit. This plot shows a nice trend, with months from May to August showing an increase on the number of steps. Keep in mind that November will show little average steps because for that month we have incomplete data (last day in database is 2019-11-13),

At this part of the analysis, I should make clear that I took vacations from 2019-06-27 to 2019-07-11. We will use this information in the analysis to make some things clear.

Distribution

We looked at the average daily steps for each month, how about the distribution of daily steps? We see that most days I come quite close to the default target of 10K steps. There are some days with very little steps (see below) and, obviously, some days with extreme number of steps.

Extreme events

Using the boxplot below, we can define extreme events as instances where I walked more than 20K steps. I chose to plot this by day of the week, to get an insight about periodicity of events.

Because I walked a lot during the vacations, I highlighted the days on top of the previous boxplot. Most of the extreme events are definitely during the vacations. Moreover, none of the days I walked less than 10K steps, pretty amazing!

There are some extreme low events, these are quite likely the days I just don’t wear the fitbit (or days I forget to wear it for most of the day). Just because I can order the data and make another plot, I went ahead and did it!

We usually go for walks on Saturdays and/or Sundays. Knowing this little piece of data, it’s quite expected to see Saturdays being the days with higher number of steps (and hence higher success rate on the 10K target).

Season

I want to turn the focus now to the seasonality of the data. I will use a helper function getSeason() that I took from StackOverflow.

We can inspect the effect of season on my walking.

The plot above is not good, it fails to communicate. I think this is a better way to show the data.

A year’s heart rate in one plot

I’m borrowing heavily from Nick here. But I thought it was a brilliant plot, so I took it for a ride with my data. I actually changed a few things, I decided to keep the native sampling rate and use geom_line() instead of down-sampling and using geom_tile(). The overall trend is clear, movements during the morning and the afternoon that correlate well with going and coming back from work. Somewhere around July 2019 you can see the change in timezone when I took vacations. There are a couple of days in late May with continuously high or lacking values, I take this as one of the days I forgot the fitbit at home, likely spurious measures.

Code

The code for this post is quite long and I thought it would get in the way. I am happy to share upon request, hit me up on Twitter.