Plagiarism detection: Status update

So, it has been a while since my last post and that is mainly because we have run in to some problems in the project.

First we had a problem with detecting plagiarism on a fine grained level. Our models provided us to decide whether or not a document had plagiarized passages but we needed to be able to detect it on a character level. After some thinking we decided it might be OK to skip some granularity so we will now try to detect plagiarism on a sentence level. We hope that this level of granularity will provide enough detail and claim that one who plagiarizes will most do that in full sentences. Let us hope that the upcoming experiments will prove this to be correct.. :)

Since we decided to change the level of granularity I had to update the tagging in the training data so that we will be able to learn on a sentence level instead of the previous character level. In doing so I ran in to some difficulties but I hope that I have gotten past them now.. although I expect that there might be a bug somewhere because there was some strange behavior when I did some testing.

I would say that we now are past the Design Phase of the project and is now in the Implementation Phase (at least I am.. :) ). So I expect that there will be a lot more problems or difficulties ahead. All and all there will be some exiting weeks coming up and in the 1st of June we have to be done with the implementation so cross your fingers that everything will proceed in the best possible way...

Until then..

Plagiarism detection

Monday, May 10, 2010

Status update

No comments:

Post a Comment

Search This Blog

Blog Archive

Word Cloud

About Me