Hi all,
I'm trying to implement K-means on a database.I have a table like this:
The main task is to group id's and return them based on similarity in intensities depending on the number of clusters that the user want to see them.
id I1 I2 I3 I4 I5
1 1 2 3 4 5
2 11 12 13 14 15
3 21 22 23 24 25
there may several thousands of id's with their respective intensities
What to do is
Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
Assign each object to the group that has the closest centroid.
When all objects have been assigned, recalculate the positions of the K centroids.
Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
In my case objects are the intensities that are extracted from a database and k points are random intensities among the extracted intensities.If I give the value k = 3 then it has to make 3 clusters out of the database by using the above method. (http://people.revoledu.com/kardi/tutorial/kMean/NumericalExample.htm)
distance in our case is sqrt (sum(diff(observed intensities - random k intensities)square))
the code till now I implemented is
private void Form1_Load(object sender, EventArgs e)
{
OdbcCommand m = new OdbcCommand();
m.CommandText = "select I1,I2,I3,I4,I5 from ion_isotopic";
m.Connection = OdbcCon;
DataSet be = new DataSet();
OdbcDataAdapter b = new OdbcDataAdapter(m);
b.Fill(be);
// close the connection
// set the grid's data source
dataGridView1.DataSource = be.Tables[0];
}
private void button2_Click(object sender, EventArgs e)
{
int s = 0;
double[] k1 = new double[100];
double h = 0;
Dictionary<double, List<double>> d1 = new Dictionary<double, List<double>>();
List<double> l = new List<double>();
string d ="";
double q = 0, v = 0,o=0 ;
a = new double[dataGridView1.RowCount, 8];
// MessageBox.Show(dataGridView1.RowCount.ToString());
try
{
for (int i = 0; i < dataGridView1.RowCount - 1; i++)
{
// textBox1.Text = dataGridView1.Rows[i].Cells[0].ToString();
for (int j = 0; j < 6; j++)
{
// MessageBox.Show(dataGridView1.Rows[i].Cells[j].Value.ToString());
a[i, j] = double.Parse(dataGridView1.Rows[i].Cells[j].Value.ToString());
// MessageBox.Show("1 - "+dataGridView1.Rows[i].Cells[j].Value.ToString());
}
}
}
catch
{
}
richTextBox1.Clear();
int ch;
Random Rnd = new Random();
int k = int.Parse(textBox4.Text);
t = new double[k, 5];
for (int i = 0; i < k; i++)
{
ch = Rnd.Next(0, dataGridView1.RowCount);
try
{
for (int j = 0; j < 5; j++)
{
t[i, j] = double.Parse(dataGridView1.Rows[ch].Cells[j].Value.ToString());
// MessageBox.Show("2 - "+dataGridView1.Rows[ch].Cells[j].Value.ToString());
richTextBox1.AppendText(dataGridView1.Rows[ch].Cells[0].Value.ToString() + ", " + t[i, j].ToString() + "\n");
}
}
catch
{
}
}
for (int j = 0; j < dataGridView1.RowCount - 1; j++)
{
for (int i = 0; i < k; i++)
{
for (int m = 0; m < 5; m++)
{
q = (Math.Pow((t[i, m] - a[j, m]),2));
// MessageBox.Show("res - " + q.ToString());
v = v + q;
l.Add(a[j,m]);
}
// MessageBox.Show("total - " + v.ToString());
o = Math.Sqrt(v);
// MessageBox.Show("res1 - " + o.ToString());
k1[i] = o;
}
h = k1[1];
for (int i1 = 0; i1 < 5; i1++)
{
if (h > k1[i1])
{
s = i1;
h = k1[i1];
}
}
d1.Add(s,l);
}
Now, I don't know any other way or idea to store all the id's of intensities into the cluster number and again send them back into the loop until the end of the loop.
This may not be a perfect explanation for this. But I hope I have done up to some manageable extent.
Please help me out in this.
Thanks