Architecture

Team members

Faculty
Mani Srivastava (Thrust 3 Lead, UCLA )
Timothy Hnat (Chief Software Architect, Memphis)
Tyson Condie (UCLA)
Simone Carini (UCSF)
Santosh Kumar (Memphis)
Syed Monowar Hossain (Memphis)
Nasir Ali (Memphis)
Nusrat Nasrin (Memphis).

Students, Post Docs
Bo-Jhang Ho (UCLA)
Matteo Interlandi (UCLA)
Addison Mayberry (UMass)

Mobile Sensor Big Data Architecture

Development and validation of any new mHealth biomarker requires conducting research studies in lab and field settings to collect raw sensor data with appropriate labels (e.g., self-reports). A general-purpose software platform that can enable such data collection consists of software on sensors, mobile phones, and the cloud, which all need to work together. Each of these software must be modular so as to enable seamless mix-and=match to customize it for various study needs. The software architecture for such a platform needs several attributes.

First, it must support concurrent connections to a wide variety of high-rate wearable sensors with an ability to plug-in new sensors.

Second, all three platforms must ingest the large volume of rapidly arriving data for which native support does not yet exist in the smartphone hardware or operating system without falling behind and losing data.

Third, it needs to support reliable storage of a quickly-growing volume of sensor data, the archival of which is critical to the development and validation of new biomarkers.

Fourth, it is desirable to quickly analyze incoming data to monitor signal quality so that any errors in sensor attachment or placement can be promptly fixed to maximize data yield.

Click on image for larger version
System Overview

There are three core users of mCerebrum and Cerebral Cortex. (1) The user wearing sensors and interacting with mCerebrum which uploads data to Cerebral Cortex, (2) the health science researcher that conduct studies, visualize field data, run population-scale analysis, and (3) the data science researcher that constructs models through machine learning, runs interative analysis through web-based dashboards, and buils scalable data analytics across large populations. (Click on image for larger version)

Fifth, the smartphone and/or the cloud needs to support the sense-analyze-act pipeline for high-rate streaming sensor data. This is necessary to prompt self-reports (for collection of labels) as well as confirm/refute prompts for validation of new biomarkers in the field. Sense-analyze-act support is also needed to aid development and evaluation of sensor-triggered interventions.

Sixth, it needs seamless sharing of streaming data from multiple sensors to enable computation of multi-sensor biomarkers (e.g., stress, smoking, eating).

Seventh, the platform needs to be general-purpose and extensible to support a wide variety of sensors, biomarkers, and study designs.

Eighth, it needs to be architecturally scalable so that it can support concurrent computation of a large number of biomarkers (each of which requires complex processing) without saturating the computational capacity or depleting the battery life of the smartphone.

Ninth, the smartphone platform needs to carefully control interruptions to study participants from various sources (e.g., self-report, ecological momentary assessment (EMA) and interventions (EMI), fixing sensor attachments) limiting user burden and cognitive overload while satisfying the numerous study requirements.

Tenth, the cloud platform must support concurrent data collection from hundreds, if not thousands of smartphone instances deployed in the field and reliably offload raw sensor data, derived features and biomarkers, and self-reports.

Eleventh, the cloud platform needs to provide a dashboard to remotely monitor the quality of data collection and participant compliance so as to intervene when necessary to ensure high data-yield.

Twelfth, for mobile sensor big data analytics, the cloud platform must support export of sensor data, features, biomarkers, and self-reports for population-scale analysis, as well as offer exploratory visualization and analysis. Last, but not least, the cloud platform must support annotation of data with metadata and provenance information so as to enable comparative analysis, reproducibility, and third party research.

The big data computing architectures for all three platforms (sensors, smartphones, and cloud) by MD2K are aimed at meeting all of the above requirements. The publications listed below provide details of these architectures.


Publications

  1. Bo-Jhang Ho, Bharathan Balaji, Nima Nikzad and Mani Srivastava.
    Emu: Engagement Modeling for User Studies. In UbiTention 2017: 2nd International Workshop on Smart & Ambient Notification and Attention Management. 2017. URL BibTeX

  2. Syed Monowar Hossain, Timothy Hnat, Nazir Saleheen, Nusrat Jahan Nasrin, Joseph Noor, Bo-Jhang Ho, Tyson Condie, Mani Srivastava and Santosh Kumar.
    mCerebrum: An mHealth Software Platform for Development and Validation of Digital Biomarkers and Interventions. In The ACM Conference on Embedded Networked Sensor Systems (SenSys). 2017. URL BibTeX

  3. Timothy Hnat, Syed Hossain, Nasir Ali, Simona Carini, Tyson Condie, Ida Sim, Mani Srivastava and Santosh Kumar.
    mCerebrum and Cerebral Cortex: A Real-time Collection, Analytic, and Intervention Platform for High-frequency Mobile Sensor Data. In AMIA (American Medical Informatics Association) 2017 Annual Symposium. 2017. BibTeX

  4. Barbara E Bierer, Rebecca Li, Mark Barnes and Ida Sim.
    A Global, Neutral Platform for Sharing Trial Data. New England Journal of Medicine, 2016. URL BibTeX

  5. Muhammad Ali Gulzari, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie, Todd Millstein and Miryung Kim.
    BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark. In Proceedings of the 38th International Conference on Software Engineering. 2016, 784–795. URL, DOI BibTeX

  6. Cheng Zhang, Junrui Yang, Caleb Southern, Thad E Starner and Gregory D Abowd.
    WatchOut: Extending Interactions on a Smartwatch with Inertial Sensing. In Proceedings of the 2016 ACM International Symposium on Wearable Computers. 2016, 136–143. URL, DOI BibTeX

  7. Gabriel Reyes, Dingtian Zhang, Sarthak Ghosh, Pratik Shah, Jason Wu, Aman Parnami, Bailey Bercik, Thad Starner, Gregory D Abowd and Keith W Edwards.
    Whoosh: Non-Voice Acoustics for Low-Cost, Hands-Free, and Rapid Input on Smartwatches. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (to appear). 2016. URL BibTeX

  8. Barbara E Bierer, Rebecca Li, Mark Barnes and Ida Sim.
    A Global, Neutral Platform for Sharing Trial Data. New England Journal of Medicine, 2016. URL BibTeX

  9. Markus Weimer, Yingda Chen, Byung-Gon Chun, Tyson Condie, Carlo Curinoo, Chris Douglas, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Shravan Narayanamurthy, Raghu Ramakrishnan, Sriram Rao, Russel Sears, Beysim Sezgin and Julia Wang.
    REEF: Retainable Evaluator Execution Framework. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 2015, 1343–1355. URL, DOI BibTeX

  10. Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein and Tyson Condie.
    Titian: Data Provenance Support in Spark. Proc. VLDB Endow. 9(3):216–227, 2015. URL, DOI BibTeX

  11. Salma Elmalaki, Lucas Wanner and Mani Srivastava.
    CAreDroid: Adaptation Framework for Android Context-Aware Applications. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. 2015, 386–399. URL, DOI BibTeX

 

 

 

Copyright © 2020 MD2K. MD2K was established by the National Institutes of Health Big Data to Knowledge Initiative (Grant #1U54EB020404)
Team: Cornell Tech, GA Tech, Harvard, U. Memphis, Northwestern, Ohio State, UCLA, UCSD, UCSF, UMass, U. Michigan, U. Utah, WVU