hello everyone,

got issue regarding reading duplicate records in a text file...

i need to read the file and look for any duplicates data/keys in the text file and write

them to another file (all the duplicates records)...

how can i do that, in looping...:-/

any help...:)

thanks!

Member Avatar for ztini

That really depends on what you are classifying as a duplicate record. What is your primary key? Is it composite? Can you provide an example of the data?

I smell homework here... :)

Okay, so start us off. You have a text file. Can you open it and read lines from it, in a loop? Go ahead and do that, just dumping the lines straight to the screen.

That'll make a good first step.

heheheh...;)

actually my issue here is how to get the duplicate data in the text file (.txt) and display it.

all i know is displaying the data without the duplicates...

i need to get the duplicate data and put it in a file.

here's my syntax
i have a Scanner an ArrayList and a TreeSet
in the while loop of the scanner i put the arrayList


while (scanner.hasNextLine()) {
String line = scanner.nextLine();
arrayList.add(line);
}

//in the TreeSet i add the arrayList
set.addAll(arrayList);

//i declare iterator here to get the data without the duplicates
Iterator it=set.iterator();
while (it.hasNext()){
System.out.println("print data w/o duplicates "+(String)it.next());
}

--how can i display the duplicates only...

thanks!:)

Member Avatar for ztini

Again what IS a duplicate record?

Take this data for example:
Id Name Title Salary
1 Rodgers, Frank Developer 55,000
2 Smith, Joe Developer 55,000
3 Rodger, Frank Team Lead 55,000

Are records 2 & 3 duplicate b/c they share the same salary as 1?
Is 3 a duplicate of 1 b/c they share the same name?
Is 2 a duplicate of 1 b/c they share the same title?
Are none of them duplicates b/c they have different id numbers?

Solving your homework depends on what a duplicate record actually is---simply saying you need duplicate records is too abstract to code; you need parameters that define the duplication.

i need to read the file and look for any duplicates data/keys in the text file and write

Can you clarify this? Are you trying to eliminate duplicate lines of data, or do you need to parse the lines into data and keys. That would be an added step.

In any case, if you want to display the duplicates only, maybe you should check each item against the rest of the list when you're reading it into the list. If it's already in the list, do whatever you need to do with it - write it to a file, skip adding it to the list, put it in another list, paint it blue and ship it to Waukegan, whatever you like.

actually im not trying to eliminate the duplicates lines, all i need is to put all the duplicates lines to another file...

for instance, in my file "data.txt" i have duplicates lines

AAAAA
BBBBB
BBBBB
CCCCC

i need to copy the [BBBBB] lines to another file...

how should i do that in a loop?..

thanks

Maybe you should check each item against the rest of the list when you're reading it into the list. If it's already in the list, do whatever you need to do with it - write it to a file, skip adding it to the list, put it in another list, paint it blue and ship it to Waukegan, whatever you like.

Or, if you want a second loop, after you've read everything in to the file, you pretty much have to check each item against each other item. Now you're talking about a generic problem of eliminating duplicated items from a list.
The easiest thing to do is to sort the list and go through it - is this item like the one after it? If so, put it in a second list. Write the second list to the file.

i dont know how to display the duplicate lines in a loop, cos whenever i put the arrayList in the loop it only display the whole lines in the 'data.txt'

//it will display the whole lines even it is not duplicate
for (int i=0; i< arrayList1.size(); i++){
            System.out.println(arrayList1.get(i));
} 

-thanks

i just need to display the duplicate lines...how should i do that?:)

Yes, that just goes through and for each item in the ArrayList, it prints it.

That's not what you want, though. For each item in the list, you want to check if it's a duplicate of some other item in the list.

Suppose you have a list of Strings:

blueberry
tangerine
apricot
kiwi
durian

and I give you another String:
gorgonzola

How do you check whether it's in the list?

hello again,

thanks for replying to this thread i already figure it out how to do it...:)

Each time you read an element from the txt file, you start a loop and compare it to all the elements read until that point.

You probably also have to check the location where you dump the duplicates to be free of duplicates. If your data.txt is like this:

AAA
BBB
BBB
BBB

Your duplicates file will probably look like this:
BBB
BBB

...

While writing this in the quick reply window, I realized the thread has 2 pages and saw it's already solved.

the you can (probably) use a some of Set, that doesn't allows duplicates, and you can test it with methods someSet#contains

Use a Hashtable to store records as they come in. Looking up whether an element is in a Hashtable is O(1). If hashtable.containsKey(record) returns true, print it to file :)

i already figure it out, thanks again! :)

[assumin we have already declared arrayList, finalFile,
after adding the lines to an ArrayList<> which i think u can do;

sort them { arrayList.sort() }
arrayList.sort() ;
loop thru the list using for loop then compare value at( i ) and at (i + 1)
finalfile;
//assuming the value are numbers
for(int i = 0; i<arrayList.size(); i++)
{
if(i+1 <= arrayList.size()){
if( arrayList.get(i) == arrayList.get(i+1)
{
add value at i to a finalfile;
or print it;
}
}


}

r0n, do you remember that Disney cartoon of the Sorceror's Apprentice? The one where Mickey Mouse conjures up an endless stream of animated brooms?

I don't know why that came to mind, but maybe you should mark this thread as "closed".

oh, i forgot to closed..thanks anyway:)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.