I am very very new to perl and am not even sure if awk is what I should be using instead. Hopefully someone out there can help me put together a script for this problem. Here is some sample data (it is tab delineated in my file):
20 scaffold189_125 13634 13384 13884 D aga-miR-1890_MI -2 1.28e-01
20 scaffold189_125 17953 17703 18203 D aga-miR-1890_MI -2 1.28e-01
22 scaffold208_17_ 4663 4413 4913 D aga-miR-315_MIM -2 9.75e-03
20 scaffold1052_5_ 11884 11634 12134 D aga-miR-34_MIMA -1 4.34e-03
20 scaffold1052_5_ 11705 11455 11955 D aga-miR-73_MIMA -2 6.32e-02
20 scaffold208_22_ 4742 4498 4998 D aga-miR-24_MIMA -1 3.29e-01
I want to take a file with several columns. I want the data to sort by column initially to determine all values which are the same in column 2 and group them accordingly. Any non-grouped lines should not be deleted.
20 scaffold189_125 13634 13384 13884 D aga-miR-1890_MI -2 1.28e-01
20 scaffold189_125 17953 17703 18203 D aga-miR-1890_MI -2 1.28e-01
20 scaffold1052_5_ 11884 11634 12134 D aga-miR-34_MIMA -1 4.34e-03
20 scaffold1052_5_ 11705 11455 11955 D aga-miR-73_MIMA -2 6.32e-02
notice that scaffold208_17 and scaffold208_22 are not the same.
Then, for each set with matching values in column 2, I want the program to take the values in columns 4 and 5 and take those values as a range. I want it to then scan all the values in column 3, within a group with a matching column 2, and determine if the value in column 3 falls in that first range of the group. If it does fall within the range, I want it deleted, if it does not then it stays. I want this to loop for every line with a matching column2 - checking to see if the value in column 3 falls within any of the ranges of the other lines with a matching column 2. This will avoid redundancy in my data set. I then need the program to do this for every grouping with matching column 2.
The output would include any lines not part of a group (such as scaffold208_17) and non-redundant lines belonging to a group
for the set above:
20 scaffold189_125 13634 13384 13884 D aga-miR-1890_MI -2 1.28e-01
20 scaffold189_125 17953 17703 18203 D aga-miR-1890_MI -2 1.28e-01
22 scaffold208_17_ 4663 4413 4913 D aga-miR-315_MIM -2 9.75e-03
20 scaffold1052_5_ 11884 11634 12134 D aga-miR-34_MIMA -1 4.34e-03
20 scaffold208_22_ 4742 4498 4998 D aga-miR-24_MIMA -1 3.29e-01
Many thanks!