Wednesday, March 24, 2010

List of terms and concepts

(character, word) n-gram, skipgram, word space model (WSM), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), string kernel, external plagiarism, intrinsic plagiarism, (dis)similarity measure, recall, precision, granularity, subsequence, synonyms, longest common subsequence, author style, genre, morpheme, suffix array, suffix tree, obfuscation, permutation, antonym, index, inverted index, crowd-sourcing, coding, hash function, stemming, Jacquard coefficient, Kullback–Leibler divergence, Levenshtein distance, stylometry, (character, word) frequency, cosine distance, sliding window, bag of (characters, words), outlier detection, space partitioning, kd-tree, metric tree, curse of dimensionality, dimensionality reduction, Principal Component Analysis (PCA), Isomap, locality sensitive hashing (LSH), authorship identification, stop word, cluster pruning, part of speech tag (POS or POST), Penn Treebank part of speech tag set, Snowball stemming algorithms, hyponym, hypernym, Kolmogorov Complexity measure, Lempel-Ziv compression, cohesion word, readability test, compression, Support Vector Machine (SVM), Artificial Neural Net, boosting, mean average precision (MAP), clipping, synset, arg max, kappa statistics, token, corpus, td-idf, context-free grammar (CFG), (semantic, syntactic) class, Levin's verb classes, decision tree (DT), quasi-Newton method, sentence-to-sentence similarity, word correlation factor, n-gram phrase correlation, dot plot, Fuzzy fingerprint, text chunk, text statistics, closed word class, Zipf's law, vocabulary richness, shingle, near duplicate detection, average word frequency class, text complexity, understandability, readability, Context Dependent Thinning (CDT), Random Permutation (RP)

No comments:

Post a Comment