Stack-Overflow-Tags-Communities

This repository contains dataset and communities generated based on the technique explained in our paper: "Finding Semantic Relationships in Folksonomies", published in IEEE/WIC/ACM International Conference on Web Intelligence 2018.

- Folders starting with prefix "E_" contains communities and pajek network files generated for each model used to represent tags.
- Co_occ folder contains communities of baseline.
- manual_communities folder contains communities created by authors.

- dataset folder contains:
1. sof_tags: tags used in our experiments to find communities
2. WIKI_sof: wikipedia articles related to Stack Overflow tags. Extracted from Wiki of Stack Overflow tags
3. CAT_sof: Wikipedia category links related to Stack Overflow tags
4. CAT_sof_llda_keywords: labeled LDA keywords generated for Wikipedia category links
5. E_excerpt: excerpts of tags, stopwords removed and words stemmed
6. E_llda: keywords generated by labeled LDA for each tag
7. E_wiki: labeled LDA keywords for cateogry links related to each tag

- qualitative_analysis.xlsx file contains details of quantitative and qualitative results of our approach (explained in discussion section of the paper).

- Trained Glove models can be downloaded from the following link: https://goo.gl/MK6Cnt and contains the following files:

1. Stack_Overflow_Data_score_positive_Content_Words_Stem_Glove_2.vectors.txt.w2vec: word vectors of Glove trained using SOF_we dataset
2. Stack_Overflow_Data_score_positive_tags_Glove_vec_min_count_10.vectors.txt.w2vec: word vectors of Glove trained using co-occurring tags only in SOF_we dataset.
3. wiki_embeddings_Glove_vec_min_count_10.vectors.txt_w2vec: word vectors trained using WIKI_we dataset.

Visit original content creator repository
https://github.com/imansaleh16/Stack-Overflow-Tags-Communities

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *