Debjyoti Paul
PhD Student. Research Interest: learning driven big data analytics.

Book Chapter  Journal  Conference  Workshop  Tech Report]



  • Geotagged US Tweets As Predictors Of County-Level Health Outcomes (Project Website)
    By Quynh C. Nguyen. Hsien-wen Meng,    Geoffrey Loomis,    Matt McCullough,    Dapeng Li,   . Debjyoti Paul,    Suraj Kath,    Feifei Li,    Elaine O. Nsoesie,    Ken R. Smith
    Vol.0, To Appear American Journal of Public Health (AJPH), 2017.

    To leverage geotagged Twitter data to create national indicators of the social environment, with small-area indicators of prevalent sentiment and social modeling of health behaviors, and to test associations with county-level health outcomes, while controlling for demographic characteristics. We used Twitter's streaming application programming interface to continuously collect a random 1% subset of publicly available geo-located tweets in the contiguous United States. We collected approximately 80 million geotagged tweets from 603‚ÄČ363 unique Twitter users in a 12-month period (April 2015-March 2016). Across 3135 US counties, Twitter indicators of happiness, food, and physical activity were associated with lower premature mortality, obesity, and physical inactivity. Alcohol-use tweets predicted higher alcohol-use-related mortality. Social media represents a new type of real-time data that may enable public health officials to examine movement of norms, sentiment, and behaviors that may portend emerging issues or outbreaks-thus providing a way to intervene to prevent adverse health events and measure the impact of health interventions.

  • Conference


  • Compass: Spatio Temporal Sentiment Analysis of US Election
    By Debjyoti Paul,    Feifei Li,    Murali Krishna Teja,    Xin Yu,    Richie Frost
    In Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), pages 1585-1594, August, 2017.

    With the widespread growth of various social network tools and platforms, analyzing and understanding societal response and crowd reaction to important and emerging social issues and events through social media data is increasingly an important problem. However, there are numerous challenges towards realizing this goal effectively and efficiently, due to the unstructured and noisy nature of social media data. The large volume of the underlying data also presents a fundamental challenge. Furthermore, in many application scenarios, it is often interesting, and in some cases critical, to discover patterns and trends based on geographical and/or temporal partitions, and keep track of how they will change overtime. This brings up the interesting problem of spatio-temporal sentiment analysis from large-scale social media data. This paper investigates this problem through a data science project called US Election 2016, What Twitter Says. The objective is to discover sentiment on Twitter towards either the democratic or the republican party at US county and state levels over any arbitrary temporal intervals, using a large collection of geotagged tweets from a period of 6 months leading up to the US Presidential Election in 2016. Our results demonstrate that by integrating and developing a combination of machine learning and data management techniques, it is possible to do this at scale with effective outcomes. The results of our project have the potential to be adapted towards solving and influencing other interesting social issues such as building neighborhood happiness and health indicators.