MD2K researchers at the University of California, Los Angeles, have had research papers accepted at two prestigious conferences. Dr. Tyson Condie and Matteo Interlandi, a postdoctoral scholar, were co-authors on both papers.
The research detailed in each of the papers focused on developing processes that help improve large-scale, or Big Data analytics using the Apache Spark cloud computing platform.
BigDeBug: Debugging Primitives for Interactive Big Data Processing in Spark was presented during the 38th International Conference on Software Engineering (ICSE), held May 14-22 in Austin, Texas. It was co-authored by Muhammad Ali Gulzar, Interlandi, Seunghyun Yoo, Sai Deep Tetali, Condie, Todd Millstein, and Miryung Kim, all of UCLA.
ICSE is the premier conference in software engineering sponsored by the Association for Computing Machinery (ACM) and the IEEE Computer Society.
The paper detailed research on ways to make debugging of big data less time-consuming and expensive. The researchers devised BIGDEBUG, a method of debugging in Apache Spark that uses real-time interactive primitives. “Big Data Analytics with Datalog Queries on Spark” will be presented at the 2016 ACM SIGMOD/PODS Conference, scheduled for June 26-July 1 in San Francisco, California. That paper was co-authored by Alexander Shkapsky, Mohan Yang, Interlandi; Hsuan Chiu, Condie, and Carlo Zaniolo. The research presented in this paper developed compilation and optimization techniques that efficiently support recursion in Apache Spark. Recursion is a programming technique that breaks down a problem into smaller pieces that can be solved by use of the same algorithm and then combining the results for a solution on the larger problem. Recursion is especially important in the context of graph-structured data analysis.
MD2K, or the Center of Excellence in Mobile Sensor Data-to-Knowledge, is a NIH center funded through the Big Data-to-Knowledge (BD2K) initiative. MD2K is developing innovative tools to make it easier to gather, analyze and interpret health data generated by mobile and wearable sensors. The goal of the big data solutions being developed by MD2K is to reliably quantify physical, biological, behavioral, social, and environmental factors that contribute to health and disease risk.
Condie and Interlandi are part of the team that is developing a Big Data software analytics platform that will be able to efficiently process both population-scale data for biomedical discovery and individual data for just-in-time intervention.
The papers:
- BigDebug: debugging primitives for interactive big data processing in spark
- Big Data Analytics with Datalog Queries on Spark
The conferences: