Saturday, January 18, 2014

Insight from Fitbit?

So, I decided that I was going to do something with my fitbit data.  I had to learn something from looking at close to a year's worth of data, right?  I hope.  My first thought was from my scale, it seemed that my body fat routinely increased when my weight was going down.  This didn't make intuitive sense, and it might not actually be true.  Time to test it with data!

My first attempt to do something useful was in Excel.  I quickly grew tired of the stupidity of scatter plots in Excel.  It was because I generated this plot:


Given that I had some issues with my labeling, I will be clear: the x-axis is weight (in pounds) and the y-axis is body fat (in %).  The black lines connect data points in time sequence.  You will note that there seem to be some clusters of data where the points are just a little bit too nicely spaced from one to the next, almost forming a solid line of light blue boxes.  This is due to fitbit cheating.

Biggest complaint so far is the fitbit is inconsistent about representing missing data points.  For blood pressure, they show up as "0/0".  For weight, they come as interpolated data points.  I basically have to algorithmically filter them out or I'm not going to have a very believable relationship between weight and fat.  I gave up trying after doing some conditional statements.  I decided that R would be a better tool.

To be able to use R, I was going to have to export the data I wanted to a csv.  I decided to find an online R platform and did find one that uses my favorite plotting package (ggplot2).  I am a big fan of R-serve, which I use at work all of the time, but didn't find any servers online that let people mess around on it.

Unfortunately, it didn't like the forward slashes in the excel csv export.  I foolishly decided to use TextWrangler's grep'ing capability to find and replace the dates.  I say foolishly, because I thought that I remembered how to use it, but didn't really.  I had to refer to some find and replace strings I had pulled together for work.  Ultimately, the date find took the form of:
(?P\<month>\d)/(?P\<day>\d)/(?P<year>\d),
and the date replace looked like:
20\P<year>-0\P<month>-0\P<day>,
I got there, anyways.  It then did make it quick to replace the slash in the blood pressure column.  I wanted to replace "0/0" with "0,0".  And I replaced the "Blood Pressure" with "Systolic" and "Diastolic".  This was a simpler find string, that I made too complicated, perhaps:
,(?P<systolic>\d+)/(?P<diastolic>\d+),
 and the replacement:
,\P<systolic>,\P<diastolic>,
The only problem was that I had 352 rows of data and it made only 351 replacements.  I hate searching out the one problem child.  One trick is to throw the csv into excel and look for column shifting (I've created a new column with this replacement).  It should be pretty quick given that excel will generally automatically recognize the comma as the column separator.  And if it doesn't - "Text to Columns . . ." works in a jiffy.  It turns out that one row has a non-interger reading for the diastolic, so it didn't match my grep.  Rather than fix the pattern, I made an ad hoc change to the csv.

And . . . it wouldn't upload.  So, back to the desktop version of R.  So here is my best effort so far (using ggplot2):
The plot also includes a linear regression of the data and the standard error band around the estimate.  Despite my initial theory, it looks like my body fat readings are reasonably well correlated with my weight, but only reasonably well.  Its a 20% adjusted R2, but the p-value for the model is 4.1e-8.

The code to generate this is right here:
codehere::codehere
It occurred to me that I should really first be interested in how my weight evolved over time.  This is what fitbit.com provides:
I think it hides too much of the variability.  Here's my plot:

The code to generate this plot is simpler:
ss
Clearly, my goal weight is 160 pounds. I'm a bit closer than the last data point here, down to 163.5 pounds.  But what is driving my weight gain & loss?  Is it potentially related to my activity level?  Let's see.

Let's at least start with what my calorie burn looks like.  First, just day by day (are there any trends) and then by day of week.  Here we go:
and the daily average (violin plots with the mean represented as a green dot):
By far, it looks like Saturday is my busiest day, in terms of activity.  But the distribution of Saturday overlaps the other days of the week from lazy to super active.  I guess I have my slow weekend days, as well.  Another view is very derivative and slightly easier to create - that is a monthly view:
I really don't know why January looks so high (this is only the tail end of January 2013), perhaps its just the small number of data points.  It does look like after summer I slowed a bit down and picked it up a bit through November, getting lazier in December again.

And the code to generate the above:
codehere::codehere

Not surprisingly, there is a strong relationship between steps and calories burned.  I think that it'd be reasonable for some dispersion due to other activities that I log (e.g., lifting, biking).  It looks like the outlier day of less than 5,000 steps, was one where I rode 50 miles on my bike.  So I guess it could make sense.

This simple relationship has an adjusted R2 of 49%, roughly meaning that about half of the variability in calorie burn is explained by steps.  What really happens by fitbit is a bit more complicated.  I believe that their calculations incorporate the amount of time that you are active - in various states.  The better model would look like the following:

This model (its construction shown in the code below) has a 90% adjusted R2 with and an F-stat for the model of over 825.  I think that indicates significance.  Interestingly, the time sedentary is not a (very) significant variable, but the model thinks that the intercept is.  Which probably makes sense, indicating that there is just a baseline number of calories that somebody of my (roughly constant) weight would burn.  The true model that they use is one of both body mass and activity.  They report Activity Calories burned and the above model against that has a 98% R2, so I think that indicates a match.

codehere::codehere

Now let's get to a more interesting question/hypothesis.  My guess would be that the more active I am over a period of time, the more likely I am to lose weight.  Seem reasonable?  Let's check it out.  First we need to think about the variables available to us and whether they'd reasonably be expected to show a relationship.  I think that the answer is no, given that all of the observations are for an individual day (no trending).  Keep in mind that the best we are going to be able to do is capture the outflow or burning of calories.  I haven't tracked my food intake over any meaningful period of time.  And fitbit doesn't provide that in the dataset anyway.

So, I will have to do some transformations first, but I'll save that code until after the graph.  So let's look at a weekly time period (average activity calories in a week) versus the weight change over the course of that week in the form of weight(date)-weight(date, seven days later.  With that defined, I can look at the visuals:

While at first glance, this doesn't seem super good.  Its mostly a cloud; don't let the fit line and the standard error band trick you.  If there was something there, it would be in the direction that I think that it should be.  That is to say, as my activity increases, my weight decreases.  The model's adjusted R2 is only 3%.  This isn't to say that there couldn't be a different date range that we should be looking over, but I would be concerned that we are finding a spurious relationship.  Even if we did find something, I think that it would be worth doing an in-sample/out-of-sample test for significance.

Before I forget, here is the code:
codehere::codehere

So I'm out of reasonable questions to answer with this data.  How about some random questions:

  • Do steps correlate with Floors climbed?  (i.e., when I'm active, am I active in both ways)
  • How long do I sleep when I track my sleep?  I have been super inconsistent in tracking it, even though I use my fitbit as an alarm clock on weekday mornings.  How does the fitbit data compare to the AskMeEvery data that I've been collecting for the last two weeks?
Let's tackle these.  For the first, here is the graphic:
With a 35% R2, I think its safe to say that there is some correlation, but its really not definitive.  See the data point at about 17,000 steps - that would be one floor climbed that day.  It can happen.  I guess.  150 floors seems like a lot.  but I also took ~17,000 steps that day.  So there is something there.

And now for sleep.  Here's what I've gotten from AskMeEvery:
So I sleep a lot on weekends.  I try to get 7.5 during the week, it happens, though 7 is much more likely.  How did the fitbit data compare?

Given the limited data from AskMeEvery, I think these are essentially equivalent.  What it does indicate, I think, is that on a night that I think that I get 7.5 hours, I'm really getting ~6.  The rest of the time is getting to sleep and restlessness during the day.

Finally, the code for the last two fitbit graphs:
codehere::codehere

Reflection on the data and FitBit

I have looked around on the web about what people think or have learned by using their FitBit.  It can be summed up as:

  • I didn't realize how sedentary I am/was
  • I walk more because I'm wearing a FitBit
  • I like getting badges
Aside from general behavior changing on the margin, I'm unimpressed with what I see out there.  I think my original review holds up fairly well.  I think the following two changes should be made (at a minimum):
  • Add the alarm alerting you that you've been sedentary too long.  Let the user choose this, but provide links or other guidance on the website about what might be a useful interval.  I addressed this in more detail in my initial review.
  • Add the option that goes beyond their "Step Goal Milestones" that alert you when you've hit "75%, 100% or 125% of your daily goal."  These are fine notifications, but what if you just get to 70% of your goal.  You never really know that.  I'd prefer time-based notifications that put your day into perspective, allowing the user the time(s) of the day that they'd like to receive them (for me, 8am and noon would be most useful).  I want to be motivated beyond just a fixed goal, which I'm sure most users have never changed from the day they set up their FitBit.  The message I want is like the following:
    • You've taken X steps so far today.  This is at the Yth percentile of the last week and Zth percentile of the last month.  Only A steps until your goal!
To keep its users engaged, I think FitBit really needs to do more.  I'm not super convinced the solution is my first anniversary email:
Maybe it'd be more interesting to tell me where I fit in the distribution of all FitBit wearers.  Answer questions like the following:
  • I've worn my consistently over the last year.  Have I worn it more than 90% of their customer group?  Shouldn't that make me feel good?
  • For people my age (weight/sex/zip code), where am I in terms activity over the last 12 months? Weight gain / loss?  Body fat gain / loss?
  • Talk some about the FitBit community - in aggregate, how many pounds lost, miles travelled, steps taken?
  • [I will add more as I think of them]
Other thoughts out there?

No comments: