Debjyoti Paul
PhD Student. Research Interest: learning driven big data analytics.

Book Chapter  Journal  Conference  Workshop  Tech Report]



  • Geotagged US Tweets As Predictors Of County-Level Health Outcomes (Project Website)
    By Quynh C. Nguyen. Hsien-wen Meng,    Geoffrey Loomis,    Matt McCullough,    Dapeng Li,   . Debjyoti Paul,    Suraj Kath,    Feifei Li,    Elaine O. Nsoesie,    Ken R. Smith
    Vol.0, To Appear American Journal of Public Health (AJPH),  2017.

    To leverage geotagged Twitter data to create national indicators of the social environment, with small-area indicators of prevalent sentiment and social modeling of health behaviors, and to test associations with county-level health outcomes, while controlling for demographic characteristics. We used Twitter's streaming application programming interface to continuously collect a random 1% subset of publicly available geo-located tweets in the contiguous United States. We collected approximately 80 million geotagged tweets from 603‚ÄČ363 unique Twitter users in a 12-month period (April 2015-March 2016). Across 3135 US counties, Twitter indicators of happiness, food, and physical activity were associated with lower premature mortality, obesity, and physical inactivity. Alcohol-use tweets predicted higher alcohol-use-related mortality. Social media represents a new type of real-time data that may enable public health officials to examine movement of norms, sentiment, and behaviors that may portend emerging issues or outbreaks-thus providing a way to intervene to prevent adverse health events and measure the impact of health interventions.

  • Conference


  • Compass: Spatio Temporal Sentiment Analysis of US Election
    By Debjyoti Paul,    Feifei Li,    Murali Krishna Teja,    Xin Yu,    Richie Frost
    In Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017),  pages 1585-1594,  August,  2017.

    With the widespread growth of various social network tools and platforms, analyzing and understanding societal response and crowd reaction to important and emerging social issues and events through social media data is increasingly an important problem. However, there are numerous challenges towards realizing this goal eāffectively and eĀciently, due to the unstructured and noisy nature of social media data. Će large volume of the underlying data also presents a fundamental challenge. Furthermore, in many application scenarios, it is oČen interesting, and in some cases critical, to discover paäerns and trends based on geographical and/or temporal partitions, and keep track of how they will change overtime. Ćis brings up the interesting problem of spatio-temporal sentiment analysis from large-scale social media data. Ćis paper investigates this problem through a data science project called %u201CUS Election 2016, What Twiāer Says%u201D. Će objective is to discover sentiment on Twiäer towards either the democratic or the republican party at US county and state levels over any arbitrary temporal intervals, using a large collection of geotagged tweets from a period of 6 months leading up to the US Presidential Election in 2016. Our results demonstrate that by integrating and developing a combination of machine learning and data management techniques, it is possible to do this at scale with effective outcomes. Će results of our project have the potential to be adapted towards solving and inÉfluencing other interesting social issues such as building neighborhood happiness and health indicators.

  • Compass: Spatio Temporal Sentiment Analysis of US Election What Twitter Says! (Project Website)
    By Debjyoti Paul,   Feifei Li ,   Murali Krishna Teja,   Xin Yu,   Richie Frost,   
    In Proceedings of KDD (KDD 2017),  pages 1585-1594,  2017.