SPUR 2024: Pak-whan Kanjanakosit W’27, SEAS’27

I bet everyone who codes knows about Stack Overflow: the go-to platform for answers to any coding problems. Not only that, people can discuss other topics in other communities: gardening advice, bicycle community, or even updates on Apple technologies. Stack Overflow has long been a key resource for 25+ million users worldwide.

Sadly, we suspect that Stack Overflow might be dying. With the rise of Large Language Models (LLMs) like ChatGPT, programmers might be turning towards these AI models for instant answers. For this reason, my research advisor, Professor Neha Sharma, and I are interested in the changed dynamics on the platform of Stack Overflow — is the platform still useful to users, or is it being replaced to the point of no return? We focus on explaining the evolution of community structure, along with the potential changes in users’ behavior in Stack Overflow and other smaller StackExchange communities like Academia and Mathematica.

As someone without Python experience, this project is a good kickstart for my interest in data analytics. Before diving deeper into research, I spent some time familiarizing myself with Python through online courses on the Python package for data science and also tried making queries about the data through MySQL on the Stack Overflow Data Dump.

The starting point was a bit hard, as this was my first time doing research in data analysis. With my advisor’s guidance, I started small by trying to understand the dataset we were working with by reading the schemas. By understanding how the schema works, the questions about what results we need to find naturally came. Later on, the literature review helped me understand what others were working on in the field of network science, and that helped me to come up with the methodologies to answer the initial questions.

Getting the results from the questions asked was just the beginning: analyzing and interpreting the data was a complex task. To make sure that our hypothesis is correct, we need to ask follow-up questions and might need other datasets to confirm or reject our hypothesis. Working with real-world datasets is different from working with the ones in classes: some events could become outliers or hard to explain. The results of previous questions open the door to many more explorations for better conclusions.

My research situation was different from other peers: intending to spend time with my friends and family over the summer in Thailand, I wrote in the proposal to do the research online. Throughout the summer, I practiced my time management skills, juggling between doing research, taking the CIS1600 course, and preparing for a math placement test, and also didn’t forget to make good memories with my high school friends and travel with my family over the summer.

SPUR research gave me a great time to grow both personally and professionally, and I am grateful for this experience and every supporter along the way.