Last summer, I had the opportunity to pursue my interests in the intersection of sociology and data science under Professor Sharath Chandra Guntuku in the Computer and Information Science Department at Penn. I am so grateful that I could take advantage of this opportunity through a summer funding award from Wharton’s Summer Program for Undergraduate Research (SPUR).
During the program, I met with my research professor once a week and communicated the various analyses I had done during the week and the possible next steps forward. A weekly schedule made me more accountable and allowed me to finish my tasks on time.
Through the project, I analyzed how the perception and self-reporting of pain varied across different cultures in the United States. I chose to study pain since some of the trends in pain perception were very interesting in the country. For instance, I read a paper where I realized that the US is the only country where pain has been increasing over the past two decades. Contrary to expectations, pain also increases until one’s 50s but soon decreases after in the US. These phenomena were also prominent for those without a bachelor’s degree compared to those with one. While some of these trends had been attributed to the increase in obesity and opioid use in the US, there was no apparent reason as to why these trends were prevalent. Thus, through my project, I wanted to look at how pain has varied over different cultures and if the increase in pain could be attributed to specific social and cultural changes instead.
I started with two datasets: the first was an annual national survey performed by Gallup, and the second was a 1 percent set of random tweets from Twitter. I analyzed the self-reporting of pain amongst communities by looking at the WP68 variable that tracks the question, “Did you feel pain for a lot of yesterday?” Gallup also provided weights for the dataset to ensure equal representation of all communities, and so I used these weights while calculating the average percentage of different communities that report feeling pain.
I started by analyzing how some social and demographical factors varied with the WP68 variable since Gallup had this information about users. However, the Twitter dataset didn’t have information about users’ personal characteristics, so here I used the American Communities Project instead. The American Communities Project classifies each county in the US into one of 15 categories such as big cities, native farmlands, etc. Through the American Communities Project, I used topic analysis to gauge how different communities talked about pain on Twitter. While college towns spoke about education and relationships, big cities talked about joint pain and drugs.
It was interesting to see the world through the lens of data, and the project gave me a unique outlook into how research in the applications of data science looks like. The program allowed me to consider research as a potential career and, more importantly, taught me multiple skills about working in the real world and recovering from challenges!