Hello everyone I am trying to find duplicates for part of a string, not the whole stHello everyone
I am trying to find duplicates for part of a string, not the whole string. The strings are stored in a file. Each line of file contains a string and many of which looks something like this (not all of the lines).
0:1:CME,20100601,14:07:53.375,CCD,GE,201009,FUT,XGCCD,0G4L7D294,1ig53ov1n1qm3z,Leg Fill,00006L3W,S,00000,2,1,99.175,20100601,14:01:04
where the '0' at the very beginning is common throughout a block of lines. The other block will have a '1' common through out the block and so on. The string starting from CCD untill the end can be duplicated and I have to find how many such duplicate lines are there against each '0' and '1' and so on. The file can contain any combination of any strings, not just the one mentioned in the above example but if at all it contains duplicates then the string starting from position of 'C' of the 'CCD' till the end would be repeated.
After I find the duplicates. I have to compare it with the other file which contains all unique strings extracted from the first file that is having duplicates. I actually want to know if the file having the non-duplicate values contains all strings that appear in the first file (with duplicates). I want to make sure that all of the strings have been extracted uniquely and stored in the other file (with unique values)
Can anyone please help. Would be grateful.