So, as I have mentioned again and again our Final Year Project is based on blogs.
It's basically a combination of three features:
Most Cited Topics
Opinion Retriever
Summarizer.
We are and will be applying data mining and NLP (Natural Language Processing) techniques to perform the above three tasks.
We have collected quite a few and still in the process of collecting more blog posts through the technology of RSS Feeds. These collected posts act as our collected data and will be the input to our system.
We are collecting them by using a software called RSS Feeder by Omar Al Zabir, which collects posts and then stores them in an Access Database. It has a perfect user interface, but what we need is the database.
We intend that by the end our system would be able to perform: (this line has frequently been used in the documents)
Most Cited Topics:
If the user states a time period, Blog Digger would search and display the hot topics of that time period.
Opinion Retriever:
The user specifies a topic and the opinion of bloggers on that topic is displayed in terms of percentages of positive and negative opinions.
Summarizer:
The summary of the selected blog posts is displayed.
Showing posts with label Blog Digger. Show all posts
Showing posts with label Blog Digger. Show all posts
Friday, February 15, 2008
My Final Year Project
The start of the 7th semester, we all had just one thing roaming about in all the rooms of our brains. Final Year Project aka FYP. This is all we thought about the entire day. The first question we would ask others would be "Did you get FYP?". Mostly followed by "Have you made a group yet" and then "What are you guys planning to make?".
We would stay in the university till 4:30, discussing ideas, trying to get some good ones, searching over the net, talking to people, discussing with our teachers.
It seemed like every idea we came up with got rejected. No, it didn't seem like it, it was what really was happening.
It seemed like we spent the most part of our day, going to the ground floor, and discuss one idea with a teacher, asking for his ideas, and he really gave us some excellent ones, and then going to other teachers to get them approved. So most of the day was spent in travelling from the first floor to the ground and then going up again, from one corridor to the next, from one corner to the other.
After browsing through many ideas, we got one that we thought would be accepted by the committee. We didn't have many doubts about it being accepted. What we were worried about was us being able to develop it. But everything doesn't go the way you expect it to go, does it?
And so was the fate of this little, harmless expectation of ours. Our idea was brutally rejected by the committee.
And we are back to square one, again.
The prospect of going through the same ordeal of searching, analyzing (we never really did this part), discussing and getting the approval horrified us. The fact that not much project ideas have been rejected didn't do any good to improve our depressed situation.
We again started going to teachers for ideas, also trying to know why our first idea was rejected and seeing if we could do anything to change the rejected status to the one favorite status of "Accepted".
Couldn't do anything about the first idea, but one of our teacher again presented us with ideas he had already given and the ones we failed to understand.
We were now left with no option then to pick up any of those ideas.
We had realized by then that we can't up with any good ideas, so we should better accept the reality, and try to get our interests in one of the proposed ideas.
We liked both the ideas he gave, but we chose the one that had to do with blogs.
Thus, I am here writing a blog about my project that is based on blogs.
If it wasn't for my project, I don't think I ever would have started blogging. But now when I have started blogging, I am liking it.
A shameful confession:
I didn't know a thing about blogs till this project.
So we started a project that was a data mining project. Basically, blog mining.
But our supervisor, the teacher who gave us the idea of this project, wanted us to not to name the project as "Blog Miner".
He wanted to come up with a different name, he said and I quote:
We would stay in the university till 4:30, discussing ideas, trying to get some good ones, searching over the net, talking to people, discussing with our teachers.
It seemed like every idea we came up with got rejected. No, it didn't seem like it, it was what really was happening.
It seemed like we spent the most part of our day, going to the ground floor, and discuss one idea with a teacher, asking for his ideas, and he really gave us some excellent ones, and then going to other teachers to get them approved. So most of the day was spent in travelling from the first floor to the ground and then going up again, from one corridor to the next, from one corner to the other.
After browsing through many ideas, we got one that we thought would be accepted by the committee. We didn't have many doubts about it being accepted. What we were worried about was us being able to develop it. But everything doesn't go the way you expect it to go, does it?
And so was the fate of this little, harmless expectation of ours. Our idea was brutally rejected by the committee.
And we are back to square one, again.
The prospect of going through the same ordeal of searching, analyzing (we never really did this part), discussing and getting the approval horrified us. The fact that not much project ideas have been rejected didn't do any good to improve our depressed situation.
We again started going to teachers for ideas, also trying to know why our first idea was rejected and seeing if we could do anything to change the rejected status to the one favorite status of "Accepted".
Couldn't do anything about the first idea, but one of our teacher again presented us with ideas he had already given and the ones we failed to understand.
We were now left with no option then to pick up any of those ideas.
We had realized by then that we can't up with any good ideas, so we should better accept the reality, and try to get our interests in one of the proposed ideas.
We liked both the ideas he gave, but we chose the one that had to do with blogs.
Thus, I am here writing a blog about my project that is based on blogs.
If it wasn't for my project, I don't think I ever would have started blogging. But now when I have started blogging, I am liking it.
A shameful confession:
I didn't know a thing about blogs till this project.
So we started a project that was a data mining project. Basically, blog mining.
But our supervisor, the teacher who gave us the idea of this project, wanted us to not to name the project as "Blog Miner".
He wanted to come up with a different name, he said and I quote:
"Tum loog naam soch rahay ho ya mein apni ammi se pochoon......... unhein buhat shauq hai naam rakhnay ka"
And so we named it "Blog Digger".
Tuesday, February 5, 2008
Word Net
So here's a somewhat technical post.
See how hard I am trying to write a technical blog?
So, we (a group of 4 people) are currently working on a project, will tell you the details in another post.
During this project, we faced a problem of identifying whether a word falls under the noun category or not. We found the answer immediately. Word Net.
Word Net is basically a database which has classified words into the parts of speech it falls under. There are many other things that Word Net can help you with. Getting Hypernym, for instance.
You can find the Word Net application on it's website:
http://wordnet.princeton.edu/
We downloaded the application from this site, it also gives you the code of word net (after all it's open source)
But the problem was, using word net in our code.
We are working in C#. Word Net is a C++ project.
We tried to call a C++ code from C#.
For this we first tried to get a hang of dll. We made a sample dll in C++ code. Called it through C++ code. Worked perfectly. Yahooooo!!!!
But then we tried to call the same C++ dll through C# code.
Unsuccessful.
Tried hard to get it right.
Surfed through blogs and articles.
Posted questions on the forum.
Got answers too.
But nothing worked.
Yeah, I know we are dumb. :(
But then while trying to get our hands on the Word Net Database. We found a site:
http://sourceforge.net/project/showfiles.php?group_id=135112
which provides the Word Net database in mySQL.
We downloaded it. and yes we thought that we can use it.
Unfortunately we are working on SQL. Not a big problem. We could just as easily have used mySQL. not much difference.
But we always try to do the hardest, most dumbest and senseless thing on earth.
And we continued with the tradition here too.
We decided to convert the database in SQL.
And yeah I did think about downloading a software that could convert mySQL to SQL.
But was only able to find softwares that could convert mySQL database into SQL database. Found none that could do the same with query files.
Then I tried to make a mySQL database with the query files.
again faced some problems.
Now when you download the zip file from the above mentioned site, it provides you with the query files to create and populate database.
On the website the schema is also given.
Using the schema, we created the database on Microsoft SQL Server 2005.
And then we used the query files provided to populate the database. But ofcourse they were in mySQL, so we created a small C# code which reads from .txt file the queries in mySQL, does the necessary manipulation on them and then writes the SQL Queries in a .txt file.
Seems easy enough na?
Well it was easy, but very time consuming. Or maybe I am dumb and couldn't find a less time consuming way.
What I did was I used to open the query file in Firefox. (Wasn't able to open it with Microsoft SQL Server, Internet Explorer or Notepad).
I wanted to populate one table at a time. So I went through the page and found the queries for the table I wanted to populate. Then I would copy paste one line on a note pad. That one line would actually comprise a large number of queries. Then I saved the .txt file and converted into a .txt file with SQL queries through the following C# code:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace ConvertMySQLtoSQL
{
class Program
{
static void Main(string[] args)
{
String Path = "Osense.txt";
String TbName = "sense";
StreamReader Reader = new StreamReader(Path);
String Content = Reader.ReadToEnd();
Content = Content.Replace("),", ");\n");
String[] Lines = Content.Split("\n".ToCharArray());
for (int i = 0; i < Lines.Length; i++)
{
Lines[i] = "INSERT INTO " + TbName + " VALUES " + Lines[i];
}
Reader.Close();
String OutputFile = TbName + ".txt";
StreamWriter Writer = new StreamWriter(OutputFile);
for (int i = 0; i < Lines.Length; i++)
Writer.Write(Lines[i] + Environment.NewLine);
Writer.Close();
}
}
}
See how hard I am trying to write a technical blog?
So, we (a group of 4 people) are currently working on a project, will tell you the details in another post.
During this project, we faced a problem of identifying whether a word falls under the noun category or not. We found the answer immediately. Word Net.
Word Net is basically a database which has classified words into the parts of speech it falls under. There are many other things that Word Net can help you with. Getting Hypernym, for instance.
You can find the Word Net application on it's website:
http://wordnet.princeton.edu/
We downloaded the application from this site, it also gives you the code of word net (after all it's open source)
But the problem was, using word net in our code.
We are working in C#. Word Net is a C++ project.
We tried to call a C++ code from C#.
For this we first tried to get a hang of dll. We made a sample dll in C++ code. Called it through C++ code. Worked perfectly. Yahooooo!!!!
But then we tried to call the same C++ dll through C# code.
Unsuccessful.
Tried hard to get it right.
Surfed through blogs and articles.
Posted questions on the forum.
Got answers too.
But nothing worked.
Yeah, I know we are dumb. :(
But then while trying to get our hands on the Word Net Database. We found a site:
http://sourceforge.net/project/showfiles.php?group_id=135112
which provides the Word Net database in mySQL.
We downloaded it. and yes we thought that we can use it.
Unfortunately we are working on SQL. Not a big problem. We could just as easily have used mySQL. not much difference.
But we always try to do the hardest, most dumbest and senseless thing on earth.
And we continued with the tradition here too.
We decided to convert the database in SQL.
And yeah I did think about downloading a software that could convert mySQL to SQL.
But was only able to find softwares that could convert mySQL database into SQL database. Found none that could do the same with query files.
Then I tried to make a mySQL database with the query files.
again faced some problems.
Now when you download the zip file from the above mentioned site, it provides you with the query files to create and populate database.
On the website the schema is also given.
Using the schema, we created the database on Microsoft SQL Server 2005.
And then we used the query files provided to populate the database. But ofcourse they were in mySQL, so we created a small C# code which reads from .txt file the queries in mySQL, does the necessary manipulation on them and then writes the SQL Queries in a .txt file.
Seems easy enough na?
Well it was easy, but very time consuming. Or maybe I am dumb and couldn't find a less time consuming way.
What I did was I used to open the query file in Firefox. (Wasn't able to open it with Microsoft SQL Server, Internet Explorer or Notepad).
I wanted to populate one table at a time. So I went through the page and found the queries for the table I wanted to populate. Then I would copy paste one line on a note pad. That one line would actually comprise a large number of queries. Then I saved the .txt file and converted into a .txt file with SQL queries through the following C# code:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace ConvertMySQLtoSQL
{
class Program
{
static void Main(string[] args)
{
String Path = "Osense.txt";
String TbName = "sense";
StreamReader Reader = new StreamReader(Path);
String Content = Reader.ReadToEnd();
Content = Content.Replace("),", ");\n");
String[] Lines = Content.Split("\n".ToCharArray());
for (int i = 0; i < Lines.Length; i++)
{
Lines[i] = "INSERT INTO " + TbName + " VALUES " + Lines[i];
}
Reader.Close();
String OutputFile = TbName + ".txt";
StreamWriter Writer = new StreamWriter(OutputFile);
for (int i = 0; i < Lines.Length; i++)
Writer.Write(Lines[i] + Environment.NewLine);
Writer.Close();
}
}
}
Subscribe to:
Posts (Atom)