Hi there!
I have run into a new problem, this time with the re.findall() module.
The objective of this code is to iterate over rows in a Excel sheet and print them in a other Excel sheet with a separation of column between the species name and the gene name.
It seems that the regular expression is working fine for the first 4 rows of the excel sheet. But the last 3 are not printed, but they contain the same sorts of names as the 4 that are working. So it should be working...
*Example:
species name: gene name:
Homo sapiens CYP2C19
Homo sapiens CYP2C9
Danio rerio CYP39A1
Xenopus leavis CYP39A1 **Text is printed until here **
Mus musculus Cyp2c65
Mus musculus Cyp2c66
Danio rerio Cyp2c38
*
Without the re.findall() module all the rows are read, so is it a bug in the re.findall() module?
Can someone tell me what I'm doing wrong?
Here is part of my code (reading only the rows of the Excel sheet):
(see below for the Excel sheet used in this test)
import xlrd
import xlwt
import re
# inputfile:
wb = xlrd.open_workbook('Test_input.xls')
#Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)
print "Number of rows: %s Number of cols: %s" % (sh.nrows, sh.ncols)
# Create a output workbook and worksheet
wbk = xlwt.Workbook()
sheet_total = wbk.add_sheet('names total')
sheet_split = wbk.add_sheet('names split')
#Check the sheet names
wb.sheet_names()
#Algorithm for reading en writing from file to file per row:
#Index individual cells:
rowx = 1
colx = 0
row = 0 # row counter for new Excel sheet
counter_row = 1 # while counter
print 'Printing rows of Excel sheet:'
sheet_total.write(row,0,'Rows') # writes heater in new Excel sheet
sheet_split.write(row,0,'Rows')
sheet_split.write(row,1,'Rows')
while counter_row < sh.nrows:
row_cell = sh.cell(rowx,colx).value
tuples = re.findall(r'(\w+\s\w+)\s*(CYP\w+)', row_cell)
print 'TUPLES:', tuples
rowx += 1
print 'print_row:', rowx, colx, row_cell
row += 1
for tuple in tuples:
print tuple ## The whole match, print on sheet 1
sheet_total.write(row,0,tuple)
print tuple[0] ## Species name (group 1), print sheet 2, col 1
sheet_split.write(row,0,tuple[0])
print tuple[1] ## Gene name (group 2), print sheet 2, col 2
sheet_split.write(row,1,tuple[1])
if rowx == sh.nrows:
rowx = 1
counter_row += 1
wbk.save('reformatted.data.xls')