hi there guys!!!!
Could u plz help me with this program.
I hav done quite a lot but there still are some problems. here is a brief overview of ma project
The aim of this project is to design and develop a useful utility program, named duplicates, to locate and report on duplicate files in, and below, a named directory. Your implementation of duplicates will be invoked with zero or more valid command-line options, and one directory name.
With no command-line options (i.e. only a directory name is provided) duplicates will simply list 4 things (with just one integer per line):
the number of files found,
the total size of all files found (i.e. the count of all bytes occupied by all files),
the number of unique files (i.e. any duplicate is only counted once), and
the possible minimum total size of all files found (i.e. the sizes of duplicated files are only counted once).
Files and directories (other than the "starting" directory indicated on the command-line) which cannot be read should be silently ignored (no error messages should be printed).
For the standard project, the "starting" directory will contain only regular files and sub-directories. In particular, there will be no hard- or symbolic-links.
An explanation of each of the command-line options follows. Support for the command-line option marked with a chili is only required in the Advanced version of this project (see later):
-a By default, hidden and configuration files (conventionally those beginning with a '.' character) are not considered by duplicates. Providing the -a option requests that all files be considered.
-l duplicates lists the duplicate files found. Each line of output will consist of the names of two or more files that are duplicates of each other. The filenames of duplicate files (on each line) must be separated by the TAB character.
-m duplicates minimizes the total number of bytes required to store all files' data. As described in the "Advanced tasks" section
-t duplicates simply tests if the named directory contains any duplicate files. duplicates produces no output at all, simply terminating with EXIT_SUCCESS if there are no duplicates (i.e. storage is minimized) or with EXIT_FAILURE otherwise.
-v By default, duplicates performs its actions silently. No output, other than the requested output, appears on the stdout channel, although some errors may be reported on the stderr channel. Providing the -v option requests that duplicates be more verbose in its output, reporting to stderr its actions.
Detecting duplicate files:
To detect duplicate files we'll employ a cryptographic checksum function named SHA2 (pronounced 'shar-2'). SHA2 examines the contents of a file and produces a fixed-length summary of its contents. Cryptographic checksum functions are designed by mathematicians and those developing encryption and security software.
Here is an implementation of the function - strSHA2.c
Two or more files are considered identical if their cryptographic checksums are identical. To date, no two (different) files ever have been found with identical cryptographic checksums! For this project, we'll use a string to store this representation, and two files will be considered identical if their SHA2 string representations are identical. The function strSHA2, with the following prototype:
char *strSHA2(char *filename);
will be provided for this project (it is not a standard C99 function). If strSHA2 can read the indicated file, it will returned a dynamically allocated string holding the SHA2 string representation of the file's contents. If the indicated file cannot be read, strSHA2 will return NULL. Note that you do not have to understand the SHA2 function or its implementation for this project.
Now the problem is that when i pass a directory name which has 24 tems in it, 22 files and 2 folders(each containing 4 more files) it gives the output as follows :
Total no of files found : 24
Total size : .....
Total no of unique files : 21
Size of the uniques files : ....
Now, i was not able to understand wat i was doin wrong. Could u plz help me with this.
Thanx in advance.
#include "fileHandle.h"
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <errno.h>
#include "strSHA2.h"
#include <fcntl.h>
int getNumberOfFiles(char *directoryName){ //
DIR *dip;
struct dirent *dit;
int i=0;
if ((dip = opendir(directoryName)) == NULL)
{
perror("opendir");
return 0;
}
while ((dit = readdir(dip)) != NULL){
if (dit->d_name[0] == '.'){
continue;
}
i++;
}
rewinddir(dip);
if (closedir(dip) == -1)
{
perror("closedir");
return 0;
}
return i;
}
void listFiles(char* directoryName, fileInfo files[]){
DIR *dip;
struct dirent *dit;
int i = 0;
struct stat sb;
if ((dip = opendir(directoryName)) == NULL)
{
perror("opendir");
return;
}
strcat(directoryName,"/");
char currentFile[50];
while ((dit = readdir(dip)) != NULL){
i++;
}
rewinddir(dip);
i=0;
while ((dit = readdir(dip)) != NULL)
{
if (dit->d_name[0] == '.'){
continue;
}
strcpy(currentFile,directoryName);
strcat(currentFile,dit->d_name);
strcpy(files[i].fileName,currentFile);
strcpy(files[i].fileChechsum,strSHA2(currentFile));
stat(files[i].fileName,&sb);
files[i].size = sb.st_size;
i++;
}
if (closedir(dip) == -1)
{
perror("closedir");
return;
}
}
void findDuplicates(fileInfo files[], int numberOfFiles){
char shrrs[numberOfFiles][68];
int filesProcessed = 0;
int singleFiles = 0;
int i=0;
char current[68];
int isDuplicate = 0;
int size = 0;
int minimumSize = 0;
for (filesProcessed = 0;filesProcessed < numberOfFiles;filesProcessed++){
isDuplicate = 0;
strcpy(current,files[filesProcessed].fileChechsum);
size += files[filesProcessed].size;
for (i=0;i<singleFiles;i++){\
if (strcmp(current,shrrs[i])==0){
isDuplicate = 1;
}
}
if(isDuplicate == 0){
strcpy(shrrs[singleFiles],current);
singleFiles++;
minimumSize += files[filesProcessed].size;
}
}
printf("Total Number of Files : %d\n", numberOfFiles);
printf("Total Size : %d\n", size);
printf("Number of Unique files : %d\n", singleFiles);
printf("Possible Minimum Total Size : %d\n", minimumSize);
}
Cheers!!!!!