Using Beiwe and Forest

Collaboration

Our digital phenotyping research is focused on the development of mathematical and statistical methods for analyzing intensive high-dimensional data, but we also participate in applied digital phenotyping studies in medicine and public health. Contact the Principal Investigator, Dr. JP Onnela, by email to explore collaborative options. See the Onnela Lab web page for more information.

Beiwe Service Center

Overview
The Beiwe Service Center (BSC) is a formal service core of Harvard University that makes the Beiwe platform available for academic and commercial entities under the software-as-a-service model. Studies in the United States are created on our deployment of the Beiwe platform that runs on our main production server cluster on the BSC Amazon Web Services (AWS) account. Studies in the European Union use a different server as detailed below. This model makes it possible for investigators to use Beiwe without having to deploy and maintain their own version of the open-source code base. Service fees are used to cover operational costs, fix software bugs, and monitor and maintain the platform. The cost model for using Beiwe through the BSC is explained below. Harvard University also requires a copy of your study IRB; once we receive it, we can usually create your study within 48 hours.

If you’re interested in using Beiwe as a service through BSC, please fill out this form, and we’ll be in touch shortly.

Costing model
Pricing of any BSC service contract depends on three parameters: (i) length of study in months; (ii) length of data collection per participant in months; and (iii) number of study participants. Length of study refers to the time period between study creation and completion; length of data collection per participant refers to the average length of data collection for each individual during the study; and number of study refers to the total number of individuals that data will be collected from during the contract period. The cost covers a fixed number of hours of monthly support, with additional support for the first month, including a Research Assistant or Project Manager to provide guidance on how to use the platform and how to set up surveys and passive data collection.

Beiwe
We run a U.S. production deployment of Beiwe (accessible through studies.beiwe.org) and a U.S. test deployment of Beiwe (accessible through staging.beiwe.org) in the us-east-1 region (Virginia) on AWS. The production server is mainly intended for investigators in the United States, whereas the staging server is reserved for testing features currently under development.

To support studies within the European Union (E.U.), we deployed Beiwe in the eu-north-1 region (Stockholm) on AWS (accessible through eu.beiwe.org). As of February 2021, AWS offers five regions in the E.U.: Stockholm, Ireland, Frankfurt, Milan, and Paris. Of these, Stockholm is the least expensive, but it is approximately 6% more expensive than us-east-1 (Virginia) for EC2 and RDS instances; S3 is the same price in Stockholm as it is in Virginia. These small differences in AWS costs mean that the same study implemented through BSC is somewhat more expensive in the E.U. than the U.S. We anticipate that locating the server in Stockholm instead of, say, Ireland should add about 40 milliseconds of latency for participants and investigators in Ireland, which is small enough to be likely unnoticeable on Beiwe.

Forest
Forest can be run locally but it has also been integrated to the Beiwe back-end on AWS. This is the preferred big-data computing paradigm: one moves computation to data rather than vice versa. Integrated with Beiwe, it can be used to generate on-demand analytics, most importantly daily summary statistics of collected data, and these summary statistics are stored in a PostgreSQL database on AWS.

The system implements an API for Tableau, which supports the creation of customizable workbooks and dashboards. An investigator running a local copy of Tableau will be able to securely connect to the back-end database and view data summaries of collected data on a daily basis and will be able to troubleshoot any potential issues with data collection. Although Tableau is commercial software, the company has free viewer licenses available, and academic users may be able to get the software for free for the first year; please consult their web page for more information.

Starting in 2021, after its public release, Forest will be incorporated in the Beiwe Service Center offering.

Costing Model Examples:

Example 1: A university-based investigator wants to collect data from 50 participants, where length of data collection per participant is 3 months, and study duration is 12 months. The total cost of the contract would be $12,146.

Example 2: A university-based investigator wants to collect data from 25 participants, where length of data collection per participant is 12 months and study duration is 12 months. The total cost of the contract would be $15,333.

Example 3: An investigator based at a for-profit entity wants to collect data from 160 participants, where length of data collection per participant is 3 months, and study duration is 12 months. The total cost of the contract would be $23,584.

These numbers are based on the 2020 costing model. For more information, visit the Beiwe Service Center website.

Open-source Deployment of Beiwe

The wiki pages associated with the Beiwe back-end repository provide deployment instructions for both a single AWS server and an AWS server cluster. The single cluster deployment is not actively maintained, so the server cluster deployment is preferred.

Open Source

Overview
Both Beiwe and Forest are available as open-source software under the BSD-3 license to improve transparency and reproducibility of research, as well as to encourage community input and participation in their development. We’d love for you to contribute! If you’re interested, please get in touch.

Though Beiwe and Forest are free, the code is provided as is; we do not provide support in installation or use. Investigators are responsible for all costs related to using the software, including deployment, monitoring, and maintenance costs, as well as AWS costs.

Beiwe
The Beiwe codebase (excluding Forest or any other data analysis tools) consists of approximately 150,000 lines of code. The three repositories are:


Beiwe is a fairly complex piece of software. Typical studies collect about 600MB of data per person per month, which needs to be encrypted on the device, transmitted to the back end, and then decrypted and re-encrypted using very strong encryption methods. A person with professional-level expertise in software engineering and cloud computing, in particular AWS, would be expected to deploy the Beiwe back-end in 10-20 hours; less experienced individuals might require a few days. Most investigators will not be able to deploy Beiwe on their own. Given that the system requires daily monitoring, as well as updates for patches and front-end applications, unless this expertise is available in-house, the most economical way to use Beiwe is through the Beiwe Service Center, where these costs are shared among a number of investigators and system monitoring is done by the individuals who created the system.

Forest
The Forest data analysis library is currently under development, but will ultimately implement all of our methods for analyzing data collected using Beiwe as a Python 3.6 package, released under the BSD-3 open-source license. Forest is currently under closed development and its first public release is anticipated for spring 2021. The Forest repository is located at https://github.com/onnela-lab/forest.