StackOverflow Assistant

StackOverflow is one of the most popular question-and-answer websites related to computer science and programming. A large amount of questions and answers is constantly being posted, however, the quality of these posts is often insufficient.

This project solves the problem of improving the quality of responses in StackOverflow. The goal is to develop an assistant that integrates into the forum interface and is displayed in the input field. It assesses the answer in an online mode while the user is typing it in. To access new answers, machine learning is applied and historical StackOverflow data is used to train classification models. The internal task that the models are solving is to predict the amount of user upvotes for an answer given the question it has been posted to.

Data

The data is taken from kaggle: https://www.kaggle.com/datasets/stackoverflow/stackoverflow. Data was preprocessed and cleaned up: removal of HTML tags, links, conversion to lowercase, replacement of abbreviations with full forms.

Binary classification

As the easiest version of the task, we create labeling for binary classification. The model has to predict whether the answer score is positive or not. Applying this filter results into 587993 (0.66%) positive, 300366 (0.33%) nonpositive samples in train subset and 65175 (0.65%) positive, 33588 (0.34%) nonpositive samples in test subset.

Metrics

To evaluate the classification results, the following indicators were calculated: accuracy, macro score F1 and weighted score F1.

In data_analysis.ipynb, the dataset was analyzed and training/test partitions were created. In tfidf_baseline.ipynb, the TF-IDF and logistic regression baseline were trained. In transformers_baseline.ipynb, the Transformer-based models were trained.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data_analysis.ipynb		data_analysis.ipynb
tfidf_baseline.ipynb		tfidf_baseline.ipynb
transformers_baseline.ipynb		transformers_baseline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StackOverflow Assistant

Data

Binary classification

Metrics

About

Uh oh!

Releases

Packages

Languages

Etwaswie/StackOverflow

Folders and files

Latest commit

History

Repository files navigation

StackOverflow Assistant

Data

Binary classification

Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages