Tuesday, February 5, 2008

Word Net

So here's a somewhat technical post.
See how hard I am trying to write a technical blog?

So, we (a group of 4 people) are currently working on a project, will tell you the details in another post.
During this project, we faced a problem of identifying whether a word falls under the noun category or not. We found the answer immediately. Word Net.

Word Net is basically a database which has classified words into the parts of speech it falls under. There are many other things that Word Net can help you with. Getting Hypernym, for instance.

You can find the Word Net application on it's website:
http://wordnet.princeton.edu/

We downloaded the application from this site, it also gives you the code of word net (after all it's open source)

But the problem was, using word net in our code.
We are working in C#. Word Net is a C++ project.
We tried to call a C++ code from C#.

For this we first tried to get a hang of dll. We made a sample dll in C++ code. Called it through C++ code. Worked perfectly. Yahooooo!!!!
But then we tried to call the same C++ dll through C# code.
Unsuccessful.
Tried hard to get it right.
Surfed through blogs and articles.
Posted questions on the forum.
Got answers too.
But nothing worked.
Yeah, I know we are dumb. :(


But then while trying to get our hands on the Word Net Database. We found a site:
http://sourceforge.net/project/showfiles.php?group_id=135112
which provides the Word Net database in mySQL.

We downloaded it. and yes we thought that we can use it.

Unfortunately we are working on SQL. Not a big problem. We could just as easily have used mySQL. not much difference.
But we always try to do the hardest, most dumbest and senseless thing on earth.
And we continued with the tradition here too.
We decided to convert the database in SQL.


And yeah I did think about downloading a software that could convert mySQL to SQL.
But was only able to find softwares that could convert mySQL database into SQL database. Found none that could do the same with query files.

Then I tried to make a mySQL database with the query files.
again faced some problems.

Now when you download the zip file from the above mentioned site, it provides you with the query files to create and populate database.
On the website the schema is also given.
Using the schema, we created the database on Microsoft SQL Server 2005.
And then we used the query files provided to populate the database. But ofcourse they were in mySQL, so we created a small C# code which reads from .txt file the queries in mySQL, does the necessary manipulation on them and then writes the SQL Queries in a .txt file.

Seems easy enough na?
Well it was easy, but very time consuming. Or maybe I am dumb and couldn't find a less time consuming way.

What I did was I used to open the query file in Firefox. (Wasn't able to open it with Microsoft SQL Server, Internet Explorer or Notepad).
I wanted to populate one table at a time. So I went through the page and found the queries for the table I wanted to populate. Then I would copy paste one line on a note pad. That one line would actually comprise a large number of queries. Then I saved the .txt file and converted into a .txt file with SQL queries through the following C# code:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace ConvertMySQLtoSQL
{
class Program
{
static void Main(string[] args)
{
String Path = "Osense.txt";
String TbName = "sense";
StreamReader Reader = new StreamReader(Path);
String Content = Reader.ReadToEnd();
Content = Content.Replace("),", ");\n");


String[] Lines = Content.Split("\n".ToCharArray());
for (int i = 0; i < Lines.Length; i++)
{
Lines[i] = "INSERT INTO " + TbName + " VALUES " + Lines[i];
}


Reader.Close();
String OutputFile = TbName + ".txt";
StreamWriter Writer = new StreamWriter(OutputFile);
for (int i = 0; i < Lines.Length; i++)
Writer.Write(Lines[i] + Environment.NewLine);
Writer.Close();

}
}
}

No comments: