Supported by a 2013 NIH Director’s New Innovator Award, Dr. Onnela and his team crystallized the concept of digital phenotyping, constructed the Beiwe research platform, and developed the statistical methods and tools that turn smartphone data into biomedical and clinical insights. You can read more about our multifaceted approach and research platform here. Let’s start with the name.

“Beiwe” is a transliteration of the Nordic goddess of sunlight and mental health. We pronounce it bee-we.

We have defined digital phenotyping as the “moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices,” in particular smartphones.

Digital Phenotyping

We coined the phrase digital phenotyping to describe the “moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices, in particular smartphones.” Although definitions evolve, in this era of precision health and medicine it is imperative to provide precise definitions for the terms we use. So let’s take a closer look…

“moment-by-moment quantification”
Emphasizes the collection and analysis of data that is generated continuously rather than obtained or sampled at certain points in time, often in “waves”

In longitudinal studies, the goal is to gather data on each individual on an identical set of occasions, for example, before and after an intervention or annually over a period of time. If data is collected at n points, then n observations over time are considered in the analysis stage. If data is collected continuously, each individual may have tens or hundreds of thousands of data points per day. This temporally dense data must be aggregated or summarized before it can be used for analysis.

While “continuous” and “real-time” are often used interchangeably, these terms refer to two different concepts. Continuous refers to data collection that happens (essentially) constantly rather than at discrete points in time. Real-time refers to data processing and analysis, specifically, the computational ability to generate output shortly after having received input. Real-time processing requires continuous data collection, but continuous data collection does not necessarily imply real-time processing.

“individual-level human phenotype”
Emphasizes that data is collected at the level of the individual rather than the group, and that data analyses focus on within-person changes over time rather than comparisons across individuals

Smartphone usage is different between different groups—men and women, teenagers and older adults—and information on covariates (age, sex, etc.) may not be available to adjust for potential confounders. The way people use their smartphones is also highly idiosyncratic, which makes comparisons across individuals difficult. Within-person comparisons, where every individual is their own control, are valid. And while group-level data can be useful, it can’t be used for making inferences about individuals without committing an ecological fallacy, i.e., making inferences about individuals from inferences about the group to which those individuals belong.

“in situ”
Highlights that data collection is intended to occur passively, in naturalistic or free-living settings, where people live and experience their lives

Passive data is collected from smartphone sensors and logs without an individual knowing it’s happening. This is important for capturing “real-world” behavior. Active data collection, for example, asking individuals to take surveys, is not naturalistic—the only reason they’re doing these activities is because they’re participants in a study. Passive data collection makes it possible to study patient symptoms and quality of life over long periods of time, which is especially important for studying chronic conditions. In a study at McLean Hospital, we’ve collected data from a cohort of bipolar patients continuously for 4.5 years.

A good illustration of the importance of passive data collection is an asthma study conducted by Apple and its academic collaborators, using Apple’s ResearchKit software development kit (SDK). In the study, participants were asked to complete intake surveys, daily surveys on asthma and medication adherence, and weekly surveys on healthcare utilization and quality of life. From a total of 40,683 downloads of the study application in the U.S., 7,593 participants were enrolled after confirming eligibility, obtaining electronic consent, and verifying their email. There were 6,470 “Baseline Users,” participants who responded to at least one survey; there were 2,317 “Robust Users” who completed 5 daily or weekly surveys; and there were 175 “Milestone Users” who completed the 6-month survey. This attrition, from 7,593 to 175 participants, translates into a loss of 97.7% of the cohort over a 6-month period. The authors of the study are to be complimented for openly discussing the successes and the important lessons of the study, and they concluded that the “ResearchKit methodology” is good for studies with a “hypothesis that can be answered in a short time period (1–3 weeks).”

“using data from personal digital devices”
Highlights the importance of using devices people already own and use rather than introducing additional instrumentation

Just as asking people to use their own devices in unnatural ways, such as taking frequent surveys, is likely to lead to attrition, the same is true for introducing an additional device, whether it be a loaner phone or a wearable device or something else. “Digital device” refers to any digital consumer electronics device that can be used to collect data, is programmable, and permits the running of third-party applications.

Individuals do not appear to use loaner phones the same way they use their own phones, and the use of loaner devices creates logistical challenges and may lead to a loss of devices. For example, a recent study implemented a 30-day loaner iPhone and smartwatch recirculation program as part of an mHealth intervention to improve recovery and access to therapy following acute myocardial infarction. Of the participants enrolled with a loaner phone, 72% (66/92) returned the phone and 70% (64/92) returned the watch, which included 1 damaged phone and 1 damaged watch. The study reported a 61% cost savings by using loaner devices compared to purchasing an iPhone for each participant who did not already own one. While optimizing loaner returns could lead to further cost savings, the use of loaner devices appears to be costly especially given the relatively short time window of the study.

“in particular smartphones”
Focuses on smartphones as part of the solution to the phenotyping problem

To be a solution, smartphones need to be broadly available. Although a smartphone digital divide exists, evidence suggests that it is rapidly narrowing. In 2018, 80% of U.S. adults owned a smartphone; globally, 6.3 billion smartphone subscriptions are expected by 2022. Though smartphone ownership is higher in the general population than in people with serious illness, this is also beginning to change. With ever increasing smartphone ownership, persistent disparities in recruitment and retention of underrepresented minorities in research will hopefully be mitigated.

Smartphone-based digital phenotyping requires the user to install an application and to consent to data collection. Because the collected data is personal, participants need to understand what data is being collected and for what purpose.

Digital phenotyping vs. remote patient monitoring and mobile phone sensing

The terms “digital phenotyping,” “remote patient monitoring,” and “mobile phone sensing” overlap, although their goals and methods vary.

Remote patient monitoring (RPM)
Uses non-invasive, wearable devices that automatically transmit data to a back-end system or smartphone application for patient self-monitoring and/or health provider assessment and clinical decision-making.

Most RPM studies introduce additional devices and require patients to use them on a regular basis. Not surprisingly these studies found that most devices result in only short-term changes in behavior and motivation—for approximately 3 to 6 months. RPM appears to be most useful in settings where there are opportunities to use the data to change clinical care for a few months, for example, as part of rehabilitation.

Mobile phone sensing 
Use of various types of sensor data to enable social networking, augmented gaming, virtual reality, and smart transportation systems, among other applications

Because most mobile phone sensing systems are deployed in urban areas, it’s also referred to as “urban sensing.” In participatory sensing, the participant is directly involved in the sensing action (e.g., tagging locations); in opportunistic sensing, the participant is not aware of sensing or involved in decision making. Mobile phone sensing is increasingly used to study health and wellness, for example, as fall detection systems or for cardiovascular disease management, primarily in non-clinical cohorts.

Well-known platforms include EmotionSense, an Android application used in a research project for sensing emotions and activities of individuals, and Darwin, developed to reason about human behavior and context.

Digital phenotyping
Digital phenotyping has a singular mission: to reveal more granular and precise phenotypes through the collection, storage, and analysis of data from personal digital devices. The database of Genotypes and Phenotypes (dbGaP), developed by the National Center for Biotechnology Information (NCBI) branch of the National Institutes of Health (NIH), archives and distributes data and results from studies investigating the interaction of genotype and phenotype in humans. As more digital phenotyping data is collected and archived in these types of repositories, more precise phenotype-genotype analyses will be possible.

Frequently Asked Questions

What is the difference between mobile health (mHealth or m-health) and digital phenotyping? Mobile health is a broad category and can be defined in different ways, but it usually refers to the “delivery of healthcare services via mobile communication devices.” Digital phenotyping, by definition, refers to the collection and analysis of moment-by-moment individual-level human phenotype data in situ, in the wild, using data from personal digital devices, in particular smartphones. The main goal of digital phenotyping is to advance evidence-based research in the biomedical sciences. It can be seen as part of deep phenotyping, extending other approaches to phenotyping and naturally complementing genotyping and genome sequencing efforts.

How much does digital phenotyping cost? How does it compare with the cost of clinical trials or genome sequencing? The digital phenotyping approach is cost-effective and scalable. The total cost is a combination of fixed costs (such as ongoing platform maintenance) and variable costs (such as server cluster uptime). A small pilot study with some tens of subjects with a couple of months of data collection using Beiwe through our not-for-profit Beiwe Service Center might cost around $80 per subject-month (total cost). A large study with a thousand or so subjects with a year of data collection brings the cost down to about $3.50 per subject-month (total cost). A recent study in JAMA Internal Medicine estimated that clinical trials cost a median of $41,117 per patient and $3,562 per patient visit, and approximately 2/3 of the studied trials had duration of 6 months or less. A typical primary end point in an antidepressant trial in patients with major depressive disorder is the change in clinician-administered MADRS total score (range 0–60) from baseline (week 0) to the end of follow up (week 6). In a typical stroke clinical trial, the primary outcome measure is the change in clinician-administered mRS total score (range 0–6) from baseline to the end of follow up (typically day 90). Adding smartphone-based digital phenotyping data collection to trials like these as an exploratory end point to quantify lived experiences, depending on sample size, might add a cost of $50 per patient to the trial. In the future, we anticipate that digital phenotyping will be even more cost-effective. Phenotyping is often contrasted with genotyping. The first sequencing of the whole human genome cost roughly $2.7 billion in 2003, whereas in 2020, research-grade whole genome sequencing costs around $600.

Why does Beiwe collect raw data? In short, research requires research-grade raw data. Software development kits for Android (ResearchStack, etc.) and Apple iOS (ResearchKit, HealthKit, CareKit, etc.) collect processed data summaries rather than raw sensor and phone-usage data. This introduces an opaque layer between the data generating process and data analysis, making it difficult to compare data across devices or pool data across studies because the data summaries are likely different. The use of predefined data summaries results in a loss of information, narrowing down potential use cases of data to those conceived at the time of data collection (e.g., number of steps taken), and as such diminishes the value of data biobanking. Collection and storage of raw data make it possible to compute any summaries of interest at a future date, thus enhancing study replicability and facilitating reanalyses of data. There are downsides to collecting raw data, notably the large volume of data and the difficulty of keeping phone sensors awake, but both of these challenges are manageable. Raw data enable investigators to ask and answer questions they care about and ensure the transparency of data collection and data analysis.

What is the distinction between a smartphone application (app) and a research platform? A smartphone app is simply a software application that runs on a smartphone. It is important to note that the Beiwe app is just one of the three components of the Beiwe platform. The other components are the Beiwe back end and the Beiwe data analysis pipeline. The Beiwe back end makes use of Amazon Web Services (AWS) cloud computing infrastructure and is used to manage studies (e.g., study creation, addition of users, regeneration of passwords) and collect data. For the latter, it uses AWS Elastic Beanstalk, which automatically handles the details of capacity provisioning and load balancing, making it in essence infinitely scalable. The data analysis pipeline performs data preprocessing, checks data quality, transforms data, carries out imputation, and computes summary statistics of interest. The input to the pipeline are raw data collected by Beiwe, and the output is a p x T matrix, one per subject, where the p rows correspond to different daily summary statistics (e.g., total distance traveled obtained from GPS data and total call duration obtained from communication logs) and the T columns correspond to days. In supervised learning, the goal is to find associations between passively collected data and any other type of data (e.g., surveys or clinical data), and in this setting the obtained matrices can be fed into different longitudinal statistical models, such as generalized estimating equations (GEE) or generalized linear mixed models (GLMM), depending on the goals of the analysis. In unsupervised learning, the goal might be to find anomalies in behavioral data or to perform clustering using a range of possible methods.


Is the Beiwe app available on Android and iOS? Is the Beiwe app a native app or a web app? What is the difference?  Our lab has developed both Android and iOS versions of the Beiwe app that connect seamlessly to the same research platform, enabling researchers to recruit individuals with phones using either operating system.Because Beiwe relies on sensor data, both Android and iOS versions of the Beiwe app are native applications rather than web applications. A web app requires only a browser and an internet connection (either a cell signal or Wi-Fi), is easy and inexpensive to develop and maintain, but cannot typically access phone sensor data. A native app works independently of the web and can access phone sensor data, but in general much more difficult and expensive to develop and maintain. Further, because native apps collect raw phone sensor and usage data, they cannot rely on software development kits but instead rely on a codebase that has been developed from scratch for this specific purpose.

Who has developed Beiwe? The Beiwe research platform has been developed by the Onnela Lab at the Harvard T.H. Chan School of Public Health with funding from the National Institutes of Health (NIH). Specifically, the large majority of the development work has been enabled by a 2013 NIH Director’s New Innovator Award to Dr. Onnela. The lab has worked with two software development groups to create the front-end smartphone applications for Android and iOS devices, and also on the development of the back-end data collection system. The lab generated the data analysis methods and data analysis pipeline internally.

Why can various different studies use the same platform? Each study within Beiwe is independent of any other study; each study has its own subjects, its own study coordinators and investigators, and contains its own active and passive data collection features and sampling schedules. The subjects within each study are generated by their Beiwe user IDs (e.g., yixg8437) and temporary passwords. Once subjects have downloaded the app and entered their user IDs and passwords, the system automatically connects them with the right study, which among other things means that the subjects receive the surveys configured for that study and passive data are collected according to the specifications of the study. This includes what data streams are collected, how they are sampled, how frequently uploaded, and whether Wi-Fi or cellular network is used for upload. The flow of information in the studies making use of Beiwe is from the user to the system, which is what is to be expected from a phenotyping platform. It is possible to make use of the Beiwe back end to design a sister component of the Beiwe app for delivering interventions. This is however likely to be study-specific, and falls outside of our main research area. In contrast, phone sensors are what they are, and therefore the most one can do is to collect as much of the available data as possible and try to make the most sense of the collected data.

How about reproducibility and replicability of Beiwe studies? Only 6% of biomedical studies have been found to be completely reproducible (Prinz et al, 2011). From this point of view, we do not need more studies but rather we need more studies that are reproducible. To achieve reproducibility, it is key to focus on both data collection and data analysis. With the Beiwe platform, we attempt to address both of these stages. We started by building a platform that collects research-grade data. The old adage about data analysis captures the sentiment perfectly: garbage in, garbage out. Therefore, our first step was to improve the quality of measurements. Many researchers have advocated the role of better measurement in studies that involve any type of quantification of human behavior, a point that has been made repeatedly and vigorously by Andrew Gellman among others. Beiwe captures all study settings in human readable JSON formatted configuration files, and the platform enables an investigator to export and import these files with a single click. Therefore, an investigator wishing to replicate a previous Beiwe study only needs this one file to collect identical data in an identical manner. Data analysis can be replicated by studying the scripts that are used to analyze the output matrices of the Beiwe platform.