Hey,
I've just joined up here hoping somebody might be able to help me with a project I've got on at work at the moment.
I've been learning python using the method, let's just do it and see what happens and I appear to be coming up to conflicts consistently and am now 100% stuck on where else to head.
Basically, I've got a CSV with 4 columns in it:
Domain = string
Page = string
Linking = string
Size = integer
I need to complete various functions on these that seemed only basic to me at first but soon got really complicated.
I'm converting my CSV to a graphml file (xml based) that will run in yED.
I need to be able to get a list of all the 'nodes':
Every unique item within the 'Domain' column is a node
Every unique item within the 'Linking' column is a node
Every item within the 'Page' column is a node - however, this is where it gets complicated really and I'm struggling to put it in plain text, every Unique version of the 'Domain' and 'Page' column needs to be listed, i.e. if "Page1" was listed twice but the 'Domain' column was different for these occurences "Page1" would need to be listed twice (I decided to do this with MD5 Hash Tags)
That is the first stage of this project anyway, there is another bit after (connecting all the nodes up) but I can't get onto that until I solve this :(
This is the code that I currently use:
#Import needed packages
import csv, array, md5, decimal
from useful_funcs import collections
#Import CSV (or Database in future)
inputFile = open("C:\\Users\\RobH\\Desktop\\xml.csv", "r")
reader = csv.reader(inputFile)
#Declare memoryTable
memoryTable = []
#Store CSV (or DB) in memoryTable
for row in reader:
memoryTable.append(row)
#Declare hashTable
hashTable = []
#Declare hash2Table
hash2Table = []
#Hash columns 0 and 1
n=0
for r in memoryTable:
i = 0
string2Hash = ''
while i < len(r)-2:
string2Hash += r[i]
i+=1
#Get MD5 of hash
string2Hashmd5 = md5.new(string2Hash)
string2Hashmd51 = string2Hashmd5.hexdigest()
#Append hash to memoryTable
memoryTable[n].append(string2Hashmd51)
n+=1
#Hash2 columns 0, 1 and 2
n=0
for r in memoryTable:
i = 0
string4Hash = ''
while i < len(r)-2:
string4Hash += r[i]
i+=1
#Get MD5 of hash2
string4Hashmd5 = md5.new(string4Hash)
string4Hashmd51 = string4Hashmd5.hexdigest()
#Append hash2 to memoryTable
memoryTable[n].append(string4Hashmd51)
n+=1
#Sort memoryTable
from operator import itemgetter
memoryTable.sort(key=itemgetter(4))
#Copy memoryTable to hashTable and hash2Table
for row in memoryTable:
hashTable.append(row)
hash2Table.append(row)
#Remove all hash duplicates from hashTable
hashTable2 = collections.removeduplicates(hashTable,4)
#collections.printy(hashTable2)
#Search memoryTable for hash duplicates and add up all values for first edges
#Append added up hash values to hashTable
#Remove all hash2 duplicates from hash2Table (nodes)
hash2Table2 = collections.removeduplicates(hash2Table,5)
#collections.printy(hash2Table2)
#Search memoryTable for hash2 duplicates and add up all values for second edges
HashSize = []
roww = 0
for r in hash2Table2:
Col3 = [hash2Table2[roww][5]]
HashSize.append(Col3)
roww+= 1
#collections.printy(HashSize)
#collections.printy(memoryTable)
#Append added up hash2 values to hash2Table
hash2Table2_2 = list(hash2Table2)
i = 0
while i < len(HashSize)+1:
x = 0
templist = []
for r in HashSize:
if r[0] == memoryTable[x][5]:
templist.append(memoryTable[x][3])
x+= 1
y = 0
templist1 = []
while y < len(templist):
numberr = decimal.Decimal(templist[y]) * 100
templist1.append(numberr / 100)
y+=1
templist2 = sum(templist1)
#print templist2
hash2Table2_2.append(templist2)
i+= 1
#collections.printy(hash2Table2_2)
#something isn't working right... not sure what
#value = int(templist[0])
#print value
#listy = sum(r[0] for r in templist)
#print listy
#collections.printy(hash2Table2_2)
and the collections package is:
def printy(hashTable):
ret = ''
for r in hashTable:
print r
def removeduplicates(hashTable,column):
ret = ''
listOfHashTable = list(hashTable)
col = column
prev = 0
i = 0
z = 1
while i != z:
z = len(listOfHashTable)
for r in listOfHashTable:
if r[col] == prev:
rownumb = listOfHashTable.index(r)
listOfHashTable.pop(rownumb)
prev = r[col]
i = len(listOfHashTable)
return tuple(listOfHashTable)
If nobody wants to help me that's ok - I'm sure I'll solve it at some point but at the moment it's really REALLY annoying me :(
Thanks a lot,
Rob