Hi to all PERL programmers,
I have data like this with 6 columns
LINES XY1 XY2 XY3 XY4 XY5
P1 Z/Z T/T -/- T/T T/T
P2 A/A A/A G/G Z/Z T/T
1 G/G T/T G/G T/T G/G
2 T/T A/A C/C C/C T/T
3 T/T G/G T/T G/G T/T
4 A/A C/C A/A A/A A/A
5 A/A A/A T/T T/T A/A
First I want to find how many columns (from XY1 to XY5) are different for P1 and P2 ,
Eq means:
Both P1 and P2 should contain same same letters (alleles) or if any one of P1 or P2 contains Z/Z or -/- I should consider them as eq only.2. I will compare lines column values from 1 with P2 for all columns (from XY1 to XY5) in horizontal way and continue for remaining lines from 2 to 5. if they match I would like to give 1 else 0
- I will make sum for lines 1 to 5 across all the columns from columns XY1 to XY5 but I will include only columns showing different for P1 and P2 in my sum count.
- I will calculate percentage of matching lines 1 to 5 with P2 by dividing sum with number of different markers between P1 and P2.
I am expecting like this
I am expecting like this
LINES XY1 XY2 XY3 XY4 XY5
P1 eq nq eq eq eq SUM %
P2 1
1 0 0 1 0 0 0 0
2 0 1 0 0 1 1 100
3 0 0 0 0 1 0 0
4 1 0 0 0 0 0 0
5 1 1 0 0 0 1 100
Like this I have data in more than 5000 rows and at present I am doing in excel 2010 with different formulas but it is taking lot of my energy.
I would like to do this PERL and I am newbie in PERL, I am succeeded in file reading onto screen.
I really need help in solving this in PERL with code. Any help would be appreciated