This repository contains dataset and communities generated based on the technique explained in our paper: "Finding Semantic Relationships in Folksonomies", published in IEEE/WIC/ACM International Conference on Web Intelligence 2018. - Folders starting with prefix "E_" contains communities and pajek network files generated for each model used to represent tags. - Co_occ folder contains communities of baseline. - manual_communities folder contains communities created by authors. - dataset folder contains: 1. sof_tags: tags used in our experiments to find communities 2. WIKI_sof: wikipedia articles related to Stack Overflow tags. Extracted from Wiki of Stack Overflow tags 3. CAT_sof: Wikipedia category links related to Stack Overflow tags 4. CAT_sof_llda_keywords: labeled LDA keywords generated for Wikipedia category links 5. E_excerpt: excerpts of tags, stopwords removed and words stemmed 6. E_llda: keywords generated by labeled LDA for each tag 7. E_wiki: labeled LDA keywords for cateogry links related to each tag - qualitative_analysis.xlsx file contains details of quantitative and qualitative results of our approach (explained in discussion section of the paper). - Trained Glove models can be downloaded from the following link: https://goo.gl/MK6Cnt and contains the following files: 1. Stack_Overflow_Data_score_positive_Content_Words_Stem_Glove_2.vectors.txt.w2vec: word vectors of Glove trained using SOF_we dataset 2. Stack_Overflow_Data_score_positive_tags_Glove_vec_min_count_10.vectors.txt.w2vec: word vectors of Glove trained using co-occurring tags only in SOF_we dataset. 3. wiki_embeddings_Glove_vec_min_count_10.vectors.txt_w2vec: word vectors trained using WIKI_we dataset.
https://github.com/imansaleh16/Stack-Overflow-Tags-Communities
Leave a Reply