Duplicates project :

Question

dhandb01 0 Newbie Poster

16 Years Ago

hi there guys!!!!
Could u plz help me with this program.
I hav done quite a lot but there still are some problems. here is a brief overview of ma project

The aim of this project is to design and develop a useful utility program, named duplicates, to locate and report on duplicate files in, and below, a named directory. Your implementation of duplicates will be invoked with zero or more valid command-line options, and one directory name.
With no command-line options (i.e. only a directory name is provided) duplicates will simply list 4 things (with just one integer per line):

the number of files found,
the total size of all files found (i.e. the count of all bytes occupied by all files),
the number of unique files (i.e. any duplicate is only counted once), and
the possible minimum total size of all files found (i.e. the sizes of duplicated files are only counted once).
Files and directories (other than the "starting" directory indicated on the command-line) which cannot be read should be silently ignored (no error messages should be printed).
For the standard project, the "starting" directory will contain only regular files and sub-directories. In particular, there will be no hard- or symbolic-links.

An explanation of each of the command-line options follows. Support for the command-line option marked with a chili is only required in the Advanced version of this project (see later):

-a By default, hidden and configuration files (conventionally those beginning with a '.' character) are not considered by duplicates. Providing the -a option requests that all files be considered.
-l duplicates lists the duplicate files found. Each line of output will consist of the names of two or more files that are duplicates of each other. The filenames of duplicate files (on each line) must be separated by the TAB character.
-m duplicates minimizes the total number of bytes required to store all files' data. As described in the "Advanced tasks" section
-t duplicates simply tests if the named directory contains any duplicate files. duplicates produces no output at all, simply terminating with EXIT_SUCCESS if there are no duplicates (i.e. storage is minimized) or with EXIT_FAILURE otherwise.
-v By default, duplicates performs its actions silently. No output, other than the requested output, appears on the stdout channel, although some errors may be reported on the stderr channel. Providing the -v option requests that duplicates be more verbose in its output, reporting to stderr its actions.
Detecting duplicate files:
To detect duplicate files we'll employ a cryptographic checksum function named SHA2 (pronounced 'shar-2'). SHA2 examines the contents of a file and produces a fixed-length summary of its contents. Cryptographic checksum functions are designed by mathematicians and those developing encryption and security software.
Here is an implementation of the function - strSHA2.c

Two or more files are considered identical if their cryptographic checksums are identical. To date, no two (different) files ever have been found with identical cryptographic checksums! For this project, we'll use a string to store this representation, and two files will be considered identical if their SHA2 string representations are identical. The function strSHA2, with the following prototype:

char *strSHA2(char *filename);
will be provided for this project (it is not a standard C99 function). If strSHA2 can read the indicated file, it will returned a dynamically allocated string holding the SHA2 string representation of the file's contents. If the indicated file cannot be read, strSHA2 will return NULL. Note that you do not have to understand the SHA2 function or its implementation for this project.

Now the problem is that when i pass a directory name which has 24 tems in it, 22 files and 2 folders(each containing 4 more files) it gives the output as follows :

Total no of files found : 24
Total size : .....
Total no of unique files : 21
Size of the uniques files : ....

Now, i was not able to understand wat i was doin wrong. Could u plz help me with this.
Thanx in advance.

#include "fileHandle.h"
#include <string.h> 
#include <stdio.h>  
#include <stdlib.h> 
#include <sys/types.h> 
#include <sys/stat.h> 
#include <dirent.h> 
#include <errno.h> 
#include "strSHA2.h" 
#include <fcntl.h>

int getNumberOfFiles(char *directoryName){ // 
	DIR             *dip;
	struct dirent   *dit;
	int i=0;
	if ((dip = opendir(directoryName)) == NULL)
	{
			perror("opendir");
			return 0;
	}
	while ((dit = readdir(dip)) != NULL){
				if (dit->d_name[0] == '.'){
					    	continue;
				}
				i++;
			}
	rewinddir(dip);
	if (closedir(dip) == -1)
	{
			perror("closedir");
			return 0;
	}
	return i;
}

void listFiles(char* directoryName, fileInfo files[]){
		DIR             *dip;
		struct dirent   *dit;

		int             i = 0;
		struct stat sb;

		if ((dip = opendir(directoryName)) == NULL)
		{
				perror("opendir");
				return;
		}
		strcat(directoryName,"/");

		char currentFile[50];
		while ((dit = readdir(dip)) != NULL){
			i++;
		}
		rewinddir(dip);
		i=0;
		while ((dit = readdir(dip)) != NULL)
		{
			    if (dit->d_name[0] == '.'){
			    	continue;
			    }
				strcpy(currentFile,directoryName);
				strcat(currentFile,dit->d_name);
				strcpy(files[i].fileName,currentFile);
				strcpy(files[i].fileChechsum,strSHA2(currentFile));
				stat(files[i].fileName,&sb);
				files[i].size = sb.st_size;
				i++;
		}


		if (closedir(dip) == -1)
		{
				perror("closedir");
				return;
		}

}

void findDuplicates(fileInfo files[], int numberOfFiles){
	char shrrs[numberOfFiles][68];
	int filesProcessed = 0;
	int singleFiles = 0;
	int i=0;
	char current[68];
	int isDuplicate = 0;
	int size = 0;
	int minimumSize = 0;
	for (filesProcessed = 0;filesProcessed < numberOfFiles;filesProcessed++){
		isDuplicate = 0;
		strcpy(current,files[filesProcessed].fileChechsum);
		size += files[filesProcessed].size;
		for (i=0;i<singleFiles;i++){\
			if (strcmp(current,shrrs[i])==0){
				isDuplicate = 1;
			}
		}
		if(isDuplicate == 0){
			strcpy(shrrs[singleFiles],current);
			singleFiles++;
			minimumSize += files[filesProcessed].size;
		}
	}
	printf("Total Number of Files : %d\n", numberOfFiles);
	printf("Total Size : %d\n", size);
	printf("Number of Unique files : %d\n", singleFiles);
	printf("Possible Minimum Total Size : %d\n", minimumSize);
}

Cheers!!!!!

c encryption storage

2 Contributors
3 Replies
157 Views
8 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by dhandb01

All 3 Replies

Salem 5,265 Posting Sage

16 Years Ago

Well since you missed this:

Our Software Development forum category encompasses topics related to application programming and software design. When posting programming code, encase it in [code], [code=syntax], where 'syntax' is any language found within our Code Snippets section, or [icode], for inline code, bbcode tags. Also, to keep DaniWeb a student-friendly place to learn, don't expect quick solutions to your homework. We'll help you get started and exchange algorithm ideas, but only if you show that you're willing to put in effort as well.

Failed to read any of these threads:
Read before posting
The complete idiots guide to using code tags
Hey, dickwad, use the fucking code tags

And completely missed the watermark at the back of the edit window telling you about code tags.

Did you even press "preview" to check that you weren't posting a pile of fetid dingo's kidneys?

Sheesh - are you blind, or just stupid?

It's not just you. You're just the latest in an endless stream of barely sentient noobs who rush to the great white telephone and basically puke all over the floor of the forum. Do you think the regulars and mods have nothing better to do than offer janitorial services to the incontinent?

Swing by here at some point as well.

Oh well, that's my Jagged Little Pill for you to swallow, I'm going to carry on listening to Alanis singing the same - ha!

Edited 12 Years Ago by Dani because: Formatting fixed

Aia commented: Code tags? What tags? I ain't seeing any code tags. +11

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

dhandb01 0 Newbie Poster · Answer 1 · 2008-10-26T21:04:48+00:00

well i m sorry that i missed the code tags.
Could u plz help me with this code. i really need some help with this!!!!

dhandb01 0 Newbie Poster · Answer 2 · 2008-10-26T21:30:42+00:00

i was hoping to get an output like this:

total no of files found : 30( 22+4+4)
....
....
....

Could anybody help me where i hav gone wrong!!!!

Duplicates project :

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers