Monday, March 1, 2010

Initial project ideas

I will do my Master's Thesis project at the Swedish Institute of Computer Science (SICS) and I there we will try to solve the mysteries of Plagiarism Detection. This means that I will have to dig into the fields of Computational Linguistics, Information Retrieval and Machine Learning.. yay! :D

J and M at SICS have, a lot of, ideas about how to detect plagiarism and these ideas might be kind of unique. These ideas deals with finding the meaning of a text and also other semantical patterns and this might be something that has not been done that much when detecting plagiarism detection. What I have heard todays methods uses mostly statistical methods with rights to the words used in the text and not the actual meaning of the text.

We have decided upon four different linguistical hypothesis that will be used when detecting plagiarism. But in order to describe them we need some notation; sj denotes the sentence at index j in a textual document, wi the word at index i in a sentence s, oi is a synonym of the word
wi , wi and xi are both words but not neccessarily the same word or even synonyms.
Then two parts of the same text are considered the same if:
1. (Equality) s1 = w1 + w2 + ...wn and s2 = w1 + w2 + ...wn
2. (Synonyms) s1 = w1 + w2 + ...wn and s2 = w1 + o2 + ...wn
3. (Permutation) s1 = w1 + w2 + ...wn and s2 = w2 + w1 + ...wn
4. (Topicality) s1 = w1 + w2 + ...wn and s2 = x1 + x2 + ...xn but s1 and s2 have
the same semantical meaning.

From these hypothesis we hope to revolutionalise the worl of plagiarism detection! Let's hope it works...

1 comment:

  1. Hey..Dis is Minni..good blog..

    I'm pursuing B-Tech and as a part of ma bachelors degree I need to submit a project..Incidentally I have chosen the same project and couldn't complete it..I'm struck up half way..Will u please help me out ? Thanks in advance

    ReplyDelete