Hi,
I've got two tables in a database - one a list of schools (around 50 entries), and one a list of pupils (around 17,000). Both are spatially-attributed using PostGIS. Basically I need to allocate each pupil to their nearest school. Currently I'm looping through the schools repeatedly, using the ST_DISTANCE commands and 3(!) different SQL commands to find which pupils are nearest the school at the time. Once a pupil is allocated a school, they aren't selected in the next loop.
As a nasty hack this works fine but it takes a long time (> 10 mins) to run the code through enough repetitions. For my project specification I really need this to be much, much faster (< 5 mins hopefully), which might involve multithreading.
Regardless, this code doesn't appear very efficient. Can anybody suggest some improvements or better SQL queries?
Thanks very much! Nick Campbell
Table structures
SCHOOLS (id, name, urn [unique identifier], the_geom, capacity, cap_remain [remaining capacity after each loop])
RANDOM_CHILDREN (id, pref_school)
try:
conn = psycopg2.connect(......);
except:
print("Unable to connect");
sys.exit()
cur = conn.cursor()
cur2 = conn.cursor()
cur3 = conn.cursor()
scount = cur.execute("""SELECT COUNT(gid) FROM schools""")
scount = cur.fetchone()[0] ## Number of schools
pcount = cur.execute("""SELECT COUNT(id) FROM random_children""")
pcount = cur.fetchone()[0] ##Number of pupils
print scount,"schools and",pcount,"pupils."
i=0
while (i < 294): #Arbitrary number
schools = cur.execute("""SELECT name, the_geom, urn, capacity, cap_remain FROM schools WHERE cap_remain > 0 ORDER BY capacity ASC""")
rows = cur.fetchall()
print "Working school by school, allocating one at a time.."
for item in rows:
print " School name: ",item[0]," places:",item[4],"/",item[3]
loc = item[1]
urny = item[2]
nearest_pupil = cur2.execute("""SELECT id, ST_DISTANCE(the_geom, '%s') AS dist FROM random_children WHERE pref_school IS NULL ORDER BY dist ASC LIMIT 1""" % loc)
kids = cur2.fetchall()
bestkid = kids[0][0]
# Put the URN of the nearest school into the individual pupil
# Then....take one off the capacity of the school
# After one iteration, each school will have lost a "place"
# Keep going until all school places are filled
query = {'urn':urny, 'id':bestkid }
qry = "UPDATE random_children SET pref_school=%(urn)s WHERE id=%(id)s" % query # Gives child a school
#print "Query", qry
cur3.execute(qry)
qry = "UPDATE schools SET cap_remain = cap_remain-1 WHERE urn=%(urn)s" % query # Takes a place from school
#print "Query 2", qry
cur3.execute(qry)
conn.commit() # Must commit otherwise it doesn't run the command....
print " ....success"
i=i+1