Our research focuses on digital phenotyping—both the development of mathematical and statistical methods for analyzing intensive high-dimensional data, and its application in research studies in medicine and public health. A full list of publications can be found on the Onnela Lab website and on Google Scholar.

While large-scale phenotyping can accelerate advances the biomedical sciences, we are held back by the inability to precisely specify phenotypes—the observed manifestations of genomes within lived environments—at the individual level. Precision medicine and precision public health will require a better understanding of phenotypes and their links to genotypes.

Social, behavioral, and cognitive phenotypes are particularly challenging to study because of their temporal nature, context dependence, and a lack of tools for measuring them objectively in naturalistic settings. Surveys are still widely used but suffer from well-documented biases, including the tendency of individuals to reconstruct, rather than recall, their past. The phenotyping problem is especially severe in psychiatry and neurology research, where precise markers are desperately needed, and individuals may not be able to provide accurate self-reports.

The ubiquity of smartphones presents an opportunity to capture social and behavioral markers in free-living settings, offering a scalable solution to the phenotyping problem. Our data collection platform, Beiwe, is designed to collect data from smartphones, while our data analysis platform, Forest, makes sense of the millions of data points we collect.

Social and behavioral phenotypes remain poorly characterized
The concept of the “phenome”—the entire set of phenotypes in an organism—and the field of phenomics are relatively new. Many have advocated a stronger role for phenomics in the biomedical sciences for understanding the pathways between genotypes and phenotypes and their relationship to human disease. This task is complicated by a genetic system that is pleiotropic (each gene influences many phenotypic traits) and polygenic (many genes influence each phenotypic trait).

Compared to the human genome, the human phenome is vast and its dimensionality is unknown; phenotyping is now a key rate-limiting and cost-limiting factor in our understanding of disease. Two possible strategies for genome-wide phenotyping are psychometric theory, including item response theory, and web-based ascertainment and phenotyping. Conventional laboratory-based methods for studying behavioral phenotypes are expensive and do not scale well. Smartphones offer a promising solution to the phenotyping challenge, one that enables the capture of social and behavioral phenotypes in their real-world contexts.

Digital phenotyping is a scalable solution to the phenotyping problem
We introduced the concept of digital phenotyping as the “moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices.” While this definition applies to a range of personal devices, we focus on smartphone-based digital phenotyping.

Smartphones are ideally suited to digital phenotyping given their widespread adoption, the extent to which users engage with the devices, and the richness of data they generate. Smartphone ownership has rapidly increased over the past few years, and their personal nature creates unique opportunities for quantifying human behavioral phenotypes in naturalistic settings, leveraging the lived experiences of study participants. Smartphones move measurement outside the clinic or lab and eliminate the need for specialized equipment.

Smartphones can collect a range of social and behavioral data: spatial trajectories (via GPS), physical mobility patterns (via accelerometer), social networks and communication dynamics (via call and text logs), and voice samples (via microphone). Data is either active (e.g., surveys and voice samples) and requires input from the individual, or passive (e.g., GPS traces), without any participation from the individual. These data, when coupled with appropriate statistical methods, can shed light on many scientific and clinical questions.

  • GPS data can be used to learn how an individual divides their time between distinct locations, such as home and work, and how the number of locations and time spent at these locations change in time.
  • Phone communication logs can convey information about the size and reciprocity of a person’s social networks, and could be used to indicate cycling between depression and mania for patients with bipolar disorder.
  • Speech samples recorded using the phone’s microphone could be used to detect vocal markers of mood, and might have prognostic value for neurological disorders characterized by speech impairment.


Digital phenotyping and precision medicine
While digital phenotyping is part of deep phenotyping, it is also closely aligned with the goals of precision medicine, an approach to disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle.

The Precision Medicine Initiative (PMI) Working Group Report highlights the role of more precise measurement of behavioral factors that contribute to disease onset and progression, treatment response, and health outcomes. The PMI Cohort Program aims to build a cohort of one million Americans, made possible through “advances in genomic technologies, data collection and storage, computational analysis, and mobile health applications,” where data from sensors and software applications can give researchers a clear view into “factors that have previously been difficult to capture with accuracy.” Mobile health technologies are presenting new opportunities to correlate body measurements and environmental exposures with health outcomes.