Hello all,

I am trying to write a clustering algorithm where, given location of certain points (I.e, latitude and longitude) I need to choose 50 of these to be cluster centers such that the diameter of each cluster is minimized. To that end, I believe that the program should choose the next center to be the point that is farthest away from all the previous centers. This is the algorithm I am working on, however, I believe it is not working properly because it chooses points that are very close to each other to be centers:

while number of centers <= 50 {
	max_dist = -9999
	for each point not a center {
		min_dist = 9999
		for each center {
			distance = (distance between current non center and current center)
			if distance < min_dist {
				min_dist = distance
			}
		}
		if min_dist > max_dist {
			max_dist = min_dist
			next_center = current non center
		}
	}
	numer of centers++
	add next center to list of centers
	erase next center from list of non centers
}

If anyone can offer any insight on this, I would greatly appreciate it. The algorithm seems to only be selecting points far away from the first center (which is chosen randomly outside of the above loop.)

This is the algorithm I am working on, however, I believe it is not working properly because it chooses points that are very close to each other to be centers:

If anyone can offer any insight on this, I would greatly appreciate it. The algorithm seems to only be selecting points far away from the first center (which is chosen randomly outside of the above loop.)

The pseudocode looks okay; I've attached a quick little .NET console program that appears to confirm your approach. This suggests to me that there might be some errors in your implementation of that algorithm. If you post actual code, we could have a look at it.

I am trying to write a clustering algorithm where, given location of certain points (I.e, latitude and longitude) I need to choose 50 of these to be cluster centers such that the diameter of each cluster is minimized. To that end, I believe that the program should choose the next center to be the point that is farthest away from all the previous centers.

Questions:

  • Are you assigning non-center points to the cluster with the nearest center?
  • Are you limited to choosing existing points as cluster centers, or can you use any lat./long. coordinate for a center?
  • Does "diameter" mean the distance from a cluster's center to the farthest point in each cluster?

It sounds like an iterative approach--something like Lloyd's algorithm--might be appropriate:

  1. Pick an initial set of centers (randomly, or however you like)
  2. Determine which non-center points are in each cluster
  3. Find the centroid of each cluster (i.e., the average lat./long. for all points in the cluster)
  4. Use the centroid of each cluster as the new center for that cluster (or the existing point closest to the centroid if you're limited to those)
  5. If the new cluster is "good enough" (e.g., cluster centers have moved less than some minimum distance), you're done; otherwise, go back to step 2

The size of each cluster isn't directly checked by this method, but it should result in reasonably small groupings--clusters with most of their points grouped off-center tend to move in that direction as you iterate, which will likely end up assigning the outliers to other clusters.

commented: like the spirit :) ... +2

Hello,

Thank you for your suggestions. Turns out my algorithm was correct, problem was with how I was calculating the distance.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.