File searching, replacing, copying...
A. Third Edition
This is comp444 lab1 and it is concentrated on file operations such as searching, replacing, copying...
I want to keep this simple version for possible future modification. I have to backup my work because it is
very dangerous in debugging such that there are two occasions that I almost destroyed my source code by running
the program in my source code directory.
COMP 444
System Software Design
Programming Assignment One
Due Date: midnight on Thursday, February 3
In this assignment, you¡¯ll get a chance to utilize a number of the calls that we¡¯ve looked at so far.
In addition, you¡¯ll have an opportunity to practice your C programming¡since a number of you
are no doubt a little rusty at this point.
Objective: You work for a company that is named after the company founder. Unfortunately, the
company founder is named Stanley G. Moneyloser. Consultants have been contacted and after
great expense have determined that the company¡¯s lagging sales may be associated with the fact
that buying products from the Moneyloser Corporation may not be very appealing to potential
customers. Consequently, Stanley has decided to change the name to the Moneymaker
Corporation.
Unfortunately, there are thousands of files on the company¡¯s UNIX server that have the name
Moneyloser in them. Rather than trying to track all of them down and change them manually,
Stanley has asked you to write some kind of application that can do this automatically (Yes, I
realize that, in practice, this would probably be done with Perl). Based upon some serious
thought, the following requirements have been specified:
1. The application will take one command line argument: the name of a directory in which to
start the search.
2. The search will be recursive. As you find new directories inside the current directory, you will
have to visit them as well.
3. Symbolic links may exist within any directory. You should be able to handle the symbolic links
properly.
4. In terms of processing, your job will be to open all regular files and read their contents.
Remember, Unix treats all flies the same way ¨C text or binary.
5. You will be looking for the following sequence of 10 characters: ¡°Moneyloser¡±. You don¡¯t want
to just look for ¡°Money¡± since this may occur in places that have nothing to do with the
company name. Note: there are many ways to search for patterns. Some techniques ¨C like
regex matching ¨C can be quite powerful. However, in the current case, the matching can be
carried out simply by taking each consecutive sequence of 10 bytes and comparing it to the
Moneyloser ¡°test¡± string.
6. If a match is found in one or more files in the current directory, you will carry out the following
steps:
A. Create a new directory called currentdirector_oldfiles. For example, if the current
directory is called ¡°foo¡±, the new directory will be called foo_oldfiles. It should be created
as a sub-directory of foo (e.g., foo/foo_oldfiles). The new directory must be given
permissions equivalent to ¡°rwxrwx---¡°.
B. Create a file called mod.log in the oldfiles directory. It should be given permissions of ¡°rw-
------¡°. In fact, all files stored to the oldfiles directory should have these same permissions.
C. For each occurrence of ¡°Moneyloser¡± found in a given file (there can be many), write a
line to mod.log that looks like ¡°filename changed at #¡±, where # is the byte offset of the
text to be changed. As a concrete example, you might have something like ¡°abc.txt
changed at 167¡±.
¡¡
C.The idea of programFrom point of view of compiler project, it is a toy scanner with much simplified functionality. But it is
difficult for me to debug c in Linux. Another reason is the awkward C grammar. There is no "bool" type, there
is no function overloading, there is no ...
name: Qingzhe Huang ID: 5037735 file list: functionality summary main.c the main program myhead.h my own head file including all necessary header files errHandle.c a simple error handle module can be complicated in future fileBuf.h header of a FILE-like structure, I prefer to implementing my own buffered I/O fileBuf.c my own buffered I/O, mimic C-Library like getc, fopen, fclose etc. searchString.h major implementation is here, like traverse directory, handle log etc. searchString.c handle log, creating directory, handle repeating directory, etc. dynaArray.h my own version of self-incremental array, basically can be unlimited, any type of node dynaArray.c user must supply both compare and eqaul function pointer to achieve dynamic array makefile make file backup souce code to upper directory when finished compiling major features: a) I implement my own version of fgetc, fputc, fopen, fseek, fclose, fflush, itoa etc, so that they can be used as independent library in future. That is one reason why this lab takes me more time than expected. b) I implement my own version of dynamic array which can self-increment when more elements are added. Also, it maintains order when new elements are inserted. So, I think my way has the highest efficiency compared any sorting methods because elements are sorted when they are copied into array instead of sorting after all are copied into array. That is why I didn't use any sorting library in this assignment. It is very handy to bookkeep what directory has been searched, and it is very easy to sort file records and avoid overwrite file searching records into "mod.log" file. The basic idea is that whenever you insert a new node (It is passed as void*, so, it can be a pointer to anything, integer, char*, struct etc.) into array, you have also pass two function pointer as parameter. These function pointer takes two paramenters, both void*. One function is to compare two elements and return "TRUE" when first is smaller than second element. The other function pointer returns "TRUE" when two elements are equal. Therefore, when inserting the new element, the dynamic array will search and insert it into appropriate place so that array is in ascending order. Then you don't need to sort them any more. c) Generally speaking, I assume serious mistakes happen when any system call fails. And I handle them very simply by exiting program. One only exception is that "creating cur_oldfiles directory" which maybe created before. Another trivial one is that soft link might point to some invalid file or directory and I simply ignore them when this happens. d) Originally I designed one highest-efficiency-algorithm by searching words, replacing words, copying old files during one pass. Later I realized there is some logical difficulty handling these three jobs during one pass as they are more clear-cut in logic. So, I sacrifice efficiency by doing them in three passes. As a result, my program can search any capital or small case combinations of "MoNeyLosER", but I can only replace them with a fixed string "Moneymaker" since my program only records the file offset when spotting them. e) My searching key words algorithm is based on idea of Deterministic Finite Automaton(DFA) without using any fancy library. Because I think this has the highest efficiency and reliability. Any algorithm must read character one by one internally. Therefore instead of copying overhead, my program recognize "moneyloser" by repeating calling "mygetc". f) Generally speaking, my program should be able to handle arbitrarily depth of directory since my storage array are all self- incremental. However, Linux system has certain limits on length of path, I also follow this. However, for number of files or directories under one directory, there is no limit. g) Instead book-keeping the actual path of directory searched, I record its inode number in my dynamic array which is space- saving and fast and reliable when soft link is involved in the name of path. h) There is one known limit for soft link. It is malicious circular link. i.e. softlink1 pointing to softlink2 and softlink2 pointing to softlink1. There is no system call can get around this problem. i) The main program only accept one parameter as directory name. When it is invalid, program gives error messange and exits. j) If file name contains "hereisMonEyLOsErname", then my program will replace the string with "hereisMoneymakername" and if in the file there exists string "MoneYLosER", it will also replace with "Moneymaker". k) If the "curdirectory_oldfiles" doesn't contain a file named "mod.log", I will assume the directory is created by user. Otherwise I will not search the directory. major self-defined data structures and data types: a) defined in "fileBuf.h" and it is a "FILE-like" structure which records all buffered-I/O information. struct FileBuf { int fd; int size; int offset; int cur; char buf[BuffSize]; }; typedef struct FileBuf* FilePtr; b) defined in "dynaArray.h" and it is structure which stores all necessary information of self-incremental array. //each is a pointer struct DynaArray { void** ptr; int len;//max size, the index of last node int cur;//current position, relative index of last node int fold; }; typedef struct DynaArray* DynaArrayPtr; c) defined in "searchString.h" and it is used to store the searching result. struct FileLog { char fileName[MaxNameLength]; int offset; }; typedef struct FileLog Log;
D.The major functions
major self-defined data structures and data types: a) defined in "fileBuf.h" and it is a "FILE-like" structure which records all buffered-I/O information. struct FileBuf { int fd; int size; int offset; int cur; char buf[BuffSize]; }; typedef struct FileBuf* FilePtr; b) defined in "dynaArray.h" and it is structure which stores all necessary information of self-incremental array. //each is a pointer struct DynaArray { void** ptr; int len;//max size, the index of last node int cur;//current position, relative index of last node int fold; }; typedef struct DynaArray* DynaArrayPtr; c) defined in "searchString.h" and it is used to store the searching result. struct FileLog { char fileName[MaxNameLength]; int offset; };
typedef struct FileLog Log;
E.Further improvement
¡¡
F.File listing
1. myhead.h
2. searchString.h
3. searchString.c
4. main.c
5. errHandle.c
6. dynaArray.h
7. dynaArray.c
8. fileBuf.h
9. fileBuf.c
10. makefile
¡¡
¡¡
file name: myhead.h
#ifndef MYHEAD_H #define MYHEAD_H #include <sys/types.h> #include <sys/stat.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <dirent.h> #include <unistd.h> #include <string.h> #include <fcntl.h> //define three mode and it is purely like // DFA=Deterministic Finite Automaton #define FindNext 0 #define Comparing 1 typedef int bool; #define TRUE 1 #define FALSE 0 //for a particular file NAME, NOT path //const int MaxNameLength=20; #define MaxPathLength 128 #define MaxNameLength 20 #define MaxIntLength 20 #define BuffSize 128 void errHandle(char* msg); #endif
¡¡
file name: searchString.h
#ifndef SEARCHSTRING_H #define SEARCHSTRING_H #include "myhead.h" #include "fileBuf.h" #include "dynaArray.h" #include <utime.h> #define MaxPathLength 128 struct FileLog { char fileName[MaxNameLength]; int offset; }; typedef struct FileLog Log; bool findString(char* fileName, char* str, DynaArrayPtr arrayPtr); //search in a file with my own file information struct FileBuf*, //target is the string for searching //repeating call this function and return each location, //return -1 indicating end of file int searchString(FilePtr stream, char* target); bool compLog(void* first, void* second); bool equalLog(void* first, void* second); Log* createLog(char* theName, int pos); //I have to admit that my code is a little bit too fancy to show off //however, I do try to write a general directory traverse function, //but I later found out I have to make it specific for this particular assignment //a general dir traverse function void dirTraverse(char* path, DynaArrayPtr dirList); void fileHandler(char* name, DynaArrayPtr arrayPtr); #endif
file name: searchString.c
#include "searchString.h" //funny, isn't it? char* moneyLoser="Moneyloser"; char* moneyMaker="Moneymaker"; static bool renameFile(char* path, char* oldName); void displayLog(DynaArrayPtr arrayPtr) { int index; Log* log; for (index=0; index<arrayPtr->cur; index++) { log=(Log*)(arrayPtr->ptr[index]); //printf("file name: %s ; offset: %d\n", log->fileName, log->offset); } } char* getFileName(char* path) { char* ptr=path; while (*ptr!='\0') { ptr++; } while (*ptr!='/') { ptr--; } ptr++;//pass over / return ptr; } bool findString(char* fileName, char* str, DynaArrayPtr arrayPtr) { int fd, n=0; FilePtr stream; Log* log; bool result=FALSE; char* chPtr; if ((fd=open(fileName, O_RDONLY))<0) { printf("should be permission denied?\n"); errHandle("cannot open searching files\n"); } if ((stream=myfopen(fd))==NULL) { errHandle("openfile error\n"); } chPtr=getFileName(fileName); while ((n=searchString(stream, str))!=-1) { result=TRUE; log=createLog(chPtr, n); addOne(arrayPtr, (void*)log, compLog, equalLog); //printf("test: find target at %d\n", n); } return result; } //starting a particular file offset and search the target string //if found return the file offset of target, else return -1 int searchString(FilePtr stream, char* target) { char ch; int result=-1;//this is the starting offset of word //the length of word compared int index, length; int mode; index=0; length=strlen(target); mode=FindNext;//initial status //-1 means EOF while ((ch=mygetc(stream))!=-1) { switch (mode) { case Comparing: if (issamechar(ch, target[index])) { index++; if (index==length) { return result; } } else { if (issamechar(ch, target[0])) { index=1; result=myftell(stream); } else { index=0; result=-1;//not found mode=FindNext; } } break; case FindNext: if (issamechar(ch, target[0])) { index=1; result=myftell(stream)-1; mode=Comparing; } break; }//switch } return result; } bool compLog(void* first, void* second) { int result; Log* log1, *log2; log1=(Log*)first; log2=(Log*)second; if ((result=strcmp(log1->fileName, log2->fileName))==0) { //strictly less return log1->offset < log2->offset; } else { return result < 0; } } /* //searching and copying, replacing for one single pass, it is the highest efficiency bool searchReplace(char* sourceFile, char* targetFile, char* strSource, char* strTarget, DynaArrayPtr arradyPtr) { int fdsource, int fdtarget; FilePtr source, target; char ch; Log* log; int mode, index, length; bool result=FALSE; length=strlen(strSource); if ((fdSource=open(sourceFile, O_RDWR))<0) { errHandle("cannot open source file\n"); } if ((fdTarget=open(targetFile, O_WRONLY))<0) { errHandle("cannot open target file\n"); } source=myfopen(fdsource); target=myfopen(fdtarget); mode=FindNext; index=0; while ((ch=mygetc(source))!=-1) { myputc(target, ch);//copy always first switch (mode) { case Comparing: if (issamechar(ch, strSource[index])) { index++; if (index==length) { //find match, first rewind source, ready to write back myfseek(source, -length); //first write down the offset into pointer of dynamic array log=createLog(sourceFile, myftell(source)); addOne(arrayPtr, (void*)log, compLog, equalLog); result=TRUE; index=0; while (index<length) { myputc(source, strTarget[index++]); } //reset state index=0; mode=FindNext; } } else { index=0; mode=FindNext; } break; case FindNext: if (issamechar(ch, strSource[0])) { index=1;//this is explicit mode=Comparing; } //else remain unchanged break; }//switch } return result; } */ Log* createLog(char* theName, int pos) { Log* result; if ((result =(Log*)malloc(sizeof(Log)))==NULL) { errHandle("create log fails\n"); } strcpy(result->fileName, theName); result->offset=pos; return result; } /* void writeLog(int fdLog, Log* log) { char offsetBuf[MaxPathLength];//not necessarily so large myputs(fdLog, log->fileName); itoa(log->offset, offsetBuf, MaxPathLength); //myfseek(fdLog, 1); myputc(fdLog, ' '); myputs(fdLog, offsetBuf); } */ void logHandler(char* logName, DynaArrayPtr arrayPtr) { FilePtr stream; int fdLog; char pathBuf[MaxPathLength]; char offsetBuf[MaxPathLength]; char logBuf[MaxPathLength+MaxIntLength+1]; DynaArrayPtr fileRec; Log* log; int index=0, len, newSize=0; if ((fdLog=open(logName, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR))<0) { errHandle("error when open log files\n"); } stream=myfopen(fdLog); fileRec=createArray(); while (mygets(stream, pathBuf, MaxPathLength)!=-1) { //myfseek(fdLog, 1); if (mygets(stream, offsetBuf, MaxPathLength)==-1) { errHandle("corrupted mod.log record\n"); } log=createLog(pathBuf, atoi(offsetBuf)); addOne(fileRec, (void*)log, compLog, equalLog); } for (index=0; index<arrayPtr->cur; index++) { addOne(fileRec, arrayPtr->ptr[index], compLog, equalLog); } if (lseek(stream->fd, 0, SEEK_SET)<0) { errHandle("lseek error when begin writing log\n"); } for (index=0; index<fileRec->cur; index++) { log=(Log*)fileRec->ptr[index]; strcpy(logBuf,log->fileName); strcat(logBuf, " "); len=strlen(logBuf); //starting from '\0' if (itoa(log->offset, logBuf+len, MaxIntLength)<0) { errHandle("itoa error when retrieve file offset\n"); } strcat(logBuf, "\n"); len=strlen(logBuf);//include the '\0' if (len!=write(stream->fd, logBuf, len)) { errHandle("write error when writing file offset\n"); } newSize+=len; } if (ftruncate(stream->fd, newSize)<0) { errHandle("ftruncate error when writing log files\n"); } } bool equalLog(void* first, void* second) { Log* log1, *log2; log1=(Log*)first; log2=(Log*)second; return strcmp(log1->fileName, log2->fileName)==0&&log1->offset==log2->offset; } bool renameFile(char* path, char* oldName) { //char tempBuf[MaxPathLength]; char oldNameBuf[MaxPathLength]; char newNameBuf[MaxPathLength]; struct stat statBuf; bool result=FALSE; int index=0, pos=-1, len=0, targetLen=strlen(moneyLoser); while (oldName[index]!='\0') { if (issamechar(oldName[index], moneyLoser[len])) { len++; if (len==targetLen) { result=TRUE;//repeating break; } if (!result) { pos=index; result=TRUE; } } else { //restart if (issamechar(oldName[index], moneyLoser[0])) { len=1; result=TRUE; pos=index; } else { len=0; result=FALSE; } } index++; } if (!result) { return result; } //search if there is a file name is moneyloser strcpy(oldNameBuf, path); strcat(oldNameBuf, "/"); strcpy(newNameBuf, oldNameBuf); strcat(oldNameBuf, oldName); /* //copy first "pos" part strncpy(tempBuf, oldName, pos); strcat(tempBuf, moneyMaker); strcat(newNameBuf, tempBuf); strcpy(tempBuf, oldName+pos+targetLen); strcat(newNameBuf, tempBuf); */ len=strlen(newNameBuf); strcat(newNameBuf, oldName); len+=pos; newNameBuf[len]='\0'; strcat(newNameBuf, moneyMaker); strcat(newNameBuf, oldName+pos+targetLen); if (lstat(newNameBuf, &statBuf)==0) { printf("probably there is already moneyMaker file in directory, so ignore renaming\n"); } else { if (rename(oldNameBuf, newNameBuf)<0) { printf("rename %s to %s failed,reason unknown and ignore it\n", oldName, newNameBuf); } else { //printf("rename %s to moneymaker successful\n", oldName); result=TRUE; } } return result; } //this is purely because Mokhov suggest a fancy functionality such that //multiple directory name can be executed by passing them as parameter //is it logical? //I have to admit that my code is a little bit too fancy to show off //however, I do try to write a general directory traverse function, //but I later found out I have to make it specific for this particular assignment //a general dir traverse function void dirTraverse(char* path, DynaArrayPtr dirList) { int pathLength; int index=0; char dirName[MaxPathLength]; DIR* dp; int* pDirNum=(int*)malloc(sizeof(int));//I have to make a dynamic variable DynaArrayPtr arrayPtr; char fileName[MaxPathLength]; char logName[MaxPathLength]; struct stat statBuf; struct dirent* dirNode; pathLength=strlen(path); strcpy(fileName, path); arrayPtr=createArray();//create a dynamic array for Log of file index=pathLength; while (path[index]!='/') { index--; } index++; strcpy(dirName, path); strcat(dirName, "/"); strcat(dirName, path+index); strcat(dirName, "_oldfiles"); strcpy(logName, dirName); strcat(logName, "/"); strcat(logName, "mod.log"); if (stat(path, &statBuf)<0) { errHandle("stat error for dir\n"); } *pDirNum=statBuf.st_ino; addOne(dirList, (void*)pDirNum, compInt, equalInt); if ((dp=opendir(path))==NULL) { errHandle("open dir error\n"); } while ((dirNode=readdir(dp))!=NULL) { if (strcmp(dirNode->d_name, ".")==0||strcmp(dirNode->d_name, "..")==0) { continue; } /* //a test if (strcasecmp(dirNode->d_name, moneyLoser)==0) { printf("before \n"); //strcpy(dirNode->d_name, moneyMaker); printf("after %s\n", dirNode->d_name); } */ if (renameFile(path, dirNode->d_name)) { rewinddir(dp); continue; } fileName[pathLength]='/'; fileName[pathLength+1]='\0'; strcat(fileName, dirNode->d_name); //it is safe to open it first with lstat if the softlink point to invalid file if (lstat(fileName, &statBuf)<0) { errHandle("lstat error here for \n"); } if (S_ISLNK(statBuf.st_mode)) { if (stat(fileName, &statBuf)<0) { printf("<%s> is an invalid softlink, ignore it\n"); continue;//important!! } } else { //reopen if (stat(fileName, &statBuf)<0) { errHandle("stat error for reopen\n"); } } switch(statBuf.st_mode & S_IFMT) { case S_IFREG: //handle regular files fileHandler(fileName, arrayPtr); break; case S_IFLNK: if (stat(fileName, &statBuf)<0) { printf("<%s> is an invalid softlink, ignore it\n"); } //softlink handler break; case S_IFDIR: //recursive call, and check if it is already in our list //inode is the most reliable one to check if (!findOne(dirList, (void*)&statBuf.st_ino, equalInt)) { //printf("here is it and ready for dir %s\n", dirName); //printf("comp filename %s vs. olddirname %s\n\n", fileName, dirName); if (strcmp(fileName, dirName)==0)//not the cur_oldfiles { //CHECK IF mod.log can be open if (open(logName, O_RDONLY)>=0)//IT IS INDEED CREATED BY ME { continue; } } dirTraverse(fileName, dirList); } else { printf("<%s> is a directory already in list, ignore it\n", fileName); } break; default: //just ignore break; }//switch }//while //here we need to copy all files //printf("ready to handle log %s\n", logName); logHandler(logName, arrayPtr); } void fileHandler(char* name, DynaArrayPtr arrayPtr) { char fullPath[MaxPathLength]; char dirName[MaxPathLength]; char fileName[MaxPathLength]; char tempBuf[MaxPathLength]; char backupName[MaxPathLength]; int index, dirIndex=0; int fdCopy; struct stat statBuf; struct utimbuf timeBuf; if (stat(name, &statBuf)<0) { errHandle("stat error during file handling\n"); } //printf("before search file %s\n", name); //displayLog(arrayPtr); if (findString(name, moneyLoser, arrayPtr)) { //test //printf("target found in file %s\n", name); index=0; //copy file name while (name[index]!='\0') { fullPath[index]=name[index]; index++; } //remove file name while (name[index]!='/') { index--; } strcpy(fileName, name+index+1); fullPath[index+1]='\0';//include the last '/' index--;//pass over the last '/' //find current directory name while (name[index]!='/') { index--; } index++;//pass over the second '/' while (name[index]!='/') { dirName[dirIndex]=name[index]; index++; dirIndex++; } dirName[dirIndex]='\0'; strcat(dirName, "_oldfiles"); strcpy(tempBuf, fullPath); strcat(tempBuf, dirName); if (mkdir(tempBuf, S_IRUSR|S_IWUSR|S_IXUSR|S_IXGRP|S_IRGRP|S_IWGRP)<0) { //this indicate the directory is already there or error? } strcpy(backupName, tempBuf); strcat(backupName, "/"); strcat(backupName, fileName);//create backup file name //printf("to copy %s to %s\n", name, backupName); copyFile(name, backupName); timeBuf.actime=statBuf.st_atime; timeBuf.modtime=statBuf.st_mtime; if (utime(backupName, &timeBuf)<0) { errHandle("utime error when file handling\n"); } index=0; //open current file for replace if ((fdCopy=open(name, O_WRONLY))<0) { //printf("file name to be replaced by opening %s\n", name); errHandle("open error when replacing\n"); } while (index<arrayPtr->cur) { if (strcmp(((Log*)(arrayPtr->ptr[index]))->fileName, fileName)==0) { replace(fdCopy, ((Log*)(arrayPtr->ptr[index]))->offset, moneyMaker); } index++; } } else { //printf("nothing found to replace\n"); //printf("no target found in filename=%s \n\n", name); } //printf("after search file %s\n", name); //displayLog(arrayPtr); }
file name: main.c
#include "myhead.h" #include "searchString.h" #include "dynaArray.h" #include "fileBuf.h" const int ArrayLength=100; void showInt(void* ptr); int main(int argc, char* argv[]) { DynaArrayPtr dynaPtr; if (argc!=2) { errHandle("usage:<main.o> <directoryname>\n"); } dynaPtr=createArray(); dirTraverse(argv[1], dynaPtr); return 0; }
file name: errHandle.c
#include "myhead.h" extern int errno; void errHandle(char* msg) { if (msg!=NULL) { perror(msg); } else { strerror(errno); } if (errno==EACCES) { printf("cannot open file because permission denied, so ignore it\n"); } else { exit(errno); } }
file name: dynaArray.h
#ifndef DYNAARRAY_H #define DYNAARRAY_H #include "myhead.h" #define DefaultLength 20 //each is a pointer struct DynaArray { void** ptr; int len;//max size, the index of last node int cur;//current position, relative index of last node int fold; }; typedef struct DynaArray* DynaArrayPtr; //define a function pointer for comparing in sorting typedef bool CompFunc(void* first, void* second); //define a show func type typedef void ShowFunc(void* ptr); typedef bool EqualFunc(void* ptr1, void* ptr2); //bool compLog(void* first, void* second); //create with default length and fold is 1 DynaArrayPtr createArray(); //this can be only called internally after a node is added void expand(DynaArrayPtr arrayPtr); //this is typical C++ member function, void addEnd(DynaArrayPtr arrayPtr, void* ptr);//without sorting //user need to supply with comp function, return true for first '<' second //it must be strictly smaller, as equal used to indicate it is a copy //the addOne function should return the index of node inserted, -1 if already there int addOne(DynaArrayPtr arrayPtr, void* ptr, CompFunc* compFunc, EqualFunc* equalFunc); //int addDistinct(DynaArrayPtr arrayPtr, void* ptr, CompFunc* compFunc, EqualFunc* equalFunc); void displayArray(DynaArrayPtr arrayPtr, ShowFunc* showFunc); //bool equalLog(void* first, void* second); bool compInt(void* first, void* second); bool equalInt(void*first, void*second); int findOne(DynaArrayPtr arrayPtr, void* ptr, EqualFunc* equalFunc); #endif
file name: dynaArray.c
#include "dynaArray.h" bool findOne(DynaArrayPtr arrayPtr, void* ptr, EqualFunc* equalFunc) { int index=0; while (index<arrayPtr->cur) { if (equalFunc(ptr, arrayPtr->ptr[index])) { return TRUE; } index++; } return FALSE; } /* bool compLog(void* first, void* second) { int result; Log* log1, *log2; log1=(Log*)first; log2=(Log*)second; if ((result=strcmp(log1->fileName, log2->fileName))==0) { //strictly less return log1->offset < log2->offset; } else { return result < 0; } } */ bool compInt(void* first, void* second) { return *((int*)first) < *((int*)second); } bool equalInt(void* first, void*second) { return *((int*)first) == *((int*)second); } //it is a pity that in C, there is no function overloading and default parameter, so //I have to copy paste all code in "addOne" //int addDistinct(DynaArrayPtr arrayPtr, void* ptr, CompFunc* compFunc, EqualFunc* equalFunc) /* bool equalLog(void* first, void* second) { Log* log1, *log2; log1=(Log*)first; log2=(Log*)second; return strcmp(log1->fileName, log2->fileName)==0&&log1->offset==log2->offset; } */ //user need to supply with comp function, return true for first '<' second //it must be strictly smaller, as equal used to indicate it is a copy //the addOne function should return the index of node inserted, -1 if already there int addOne(DynaArrayPtr arrayPtr, void* ptr, CompFunc* compFunc , EqualFunc* equalFunc) { int pos=0, i; //1st case, there is no node in array while (pos<arrayPtr->len) { if (!compFunc(ptr, arrayPtr->ptr[pos])) { pos++; } else { if (ptr==arrayPtr->ptr[pos]) { //2nd case, it is already there return -1;//it is a copy } if (equalFunc(ptr, arrayPtr->ptr[pos])) { return -1; } //should insert here, and there must have space, shift first for (i=arrayPtr->len; i>pos; i--) { arrayPtr->ptr[i]=arrayPtr->ptr[i-1]; } //3rd case, it is shifted first break; } } arrayPtr->ptr[pos]=ptr;//insert arrayPtr->len++; arrayPtr->cur++; if (arrayPtr->cur==DefaultLength) { expand(arrayPtr); } } //delete dynamically allocated memory void closeArray(DynaArrayPtr arrayPtr) { free(arrayPtr->ptr); free(arrayPtr); } //create with default length and fold is 1 DynaArrayPtr createArray() { DynaArrayPtr result; if ((result=(DynaArrayPtr)malloc(DefaultLength*sizeof(void*)))==NULL) { errHandle("cannot alloc memory for dyna array creation\n"); } result->len=0;//the actual number of pointers in array result->cur=0;//internal use only result->fold=1; if ((result->ptr=(void**)malloc(DefaultLength*sizeof(void*)))==NULL) { errHandle("cannot alloc memory for dyna array\n"); } return result; } //user must pass a user-defined function pointer for displaying struct void displayArray(DynaArrayPtr arrayPtr, ShowFunc* showFunc) { int i; for (i=0; i<arrayPtr->len; i++) { showFunc(arrayPtr->ptr[i]); } } //this is typical C++ member function, void addEnd(DynaArrayPtr arrayPtr, void* ptr) { arrayPtr->ptr[arrayPtr->len++]=ptr; arrayPtr->cur++; if (arrayPtr->cur==DefaultLength) { expand(arrayPtr); } } void expand(DynaArrayPtr arrayPtr) { arrayPtr->fold++; if ((arrayPtr->ptr=realloc(arrayPtr->ptr, arrayPtr->fold*DefaultLength*sizeof(void*)))==NULL) { errHandle("expand error for dyna array\n"); } arrayPtr->cur=0;//it should be }
file name: fileBuf.h
#ifndef FILEBUF_H #define FILEBUF_H #include "myhead.h" //the max path string length #define MaxPathLength 128 struct FileBuf { int fd; int size; int offset; int cur; char buf[BuffSize]; }; typedef struct FileBuf* FilePtr; //my version of fgets() int mygets(FilePtr stream, char* buffer, int bufSize); //my version of int to char* or itoa int itoa(int num, char* buf, int bufSize); //if you are using myputc before you have to myfflush by yourself //ok, I do it for you. int myputs(FilePtr stream, char* str); //specify if it is white space char //notice here, in C, there is no boolean type and I define it //as alias of int in "myhead.h" bool isBreak(char ch); void copyFile(char* source, char* target); int myftell(FilePtr stream); void replace(int fd, int offset, char* str); //just return the file buffer, regardless read of write FilePtr myfopen(int fd); //a-z, A-Z bool myisalpha(char ch); //my edition of getc char mygetc(struct FileBuf* stream); void myputc(FilePtr stream, char ch); //my version of fclose void myfclose(FilePtr stream); //fflush, user of writing must call this at end of file void myfflush(FilePtr stream); //typedef void FileHandler(struct dirent* dirNode, struct stat* statPtr, char* fullPath); typedef void FileHandler(void* param1, void* param2, void* param3, void* param4); //void dirTraverse(char* path, FileHandler* funcPtr); #endif
¡¡
¡¡
file name: fileBuf.c
#include "fileBuf.h" //the reason I don't want to combine both copy and replace together is //that in the possible sorting like quick sort which is not stable sorting method //the offset to be replaced in one file maybe sorted in different order, //i.e. foo.txt 169 foo.txt 140 foo.txt 249 //in such situation, it is very unsafe to copy a chunk of file and then //replace. So, divide the job into TWO pass is a safe way. //if you are using myputc before you have to myfflush by yourself //ok, I do it for you. int myputs(FilePtr stream, char* str) { int len=strlen(str); int result; myfflush(stream); result= write(stream->fd, (void*)str, len); if (result>=0) { stream->offset+=result; stream->cur=0; } return result; } /* int myrewind(FilePtr stream) { if (lseek(stream->fd, 0, SEEK_SET)<0) { errHandler("lseek error when rewind file ptr\n"); } stream *///useless and inconsistency //my version of int to char* or itoa int itoa(int num, char* buf, int bufSize) { int len=bufSize-1; int temp, start=0; div_t result; bool negative=FALSE; buf[len]='\0'; //at least 2 if (bufSize<2) { return -1; } if (num<0) { negative=TRUE; temp=-1*num; } else { temp=num; } len--; do { result=div(temp, 10); buf[len]='0'+result.rem; temp=result.quot; len--; } while (temp!=0&&len>=0); if (temp==0)//ok { if (negative) { if (len>0) { buf[len]='-'; } else { return -1;//unable to add '-' sign } } else { len++; } //shift array while (len<bufSize) { buf[start]=buf[len]; start++; len++; } return start;//the count } else { return -1; } } //always from current, writing must flush by user first int myfseek(FilePtr stream, int offset) { if ((stream->offset=lseek(stream->fd, offset, SEEK_CUR))<0) { errHandle("lseek error\n"); } stream->cur=0;//need to restart to get char return stream->offset; } //my version of fgets() //return length of string read, -1 means EOF or error param int mygets(FilePtr stream, char* buffer, int bufSize) { int length=0, mode; char ch; //it must be bigger than 1 if (bufSize<2) { return -1; } mode=FindNext; while ((ch=mygetc(stream))!=-1) { switch (mode) { case FindNext: if (!isBreak(ch)) { mode=Comparing; buffer[0]=ch; length=1; } break; case Comparing: if (length==bufSize) { buffer[length]='\0'; return length; } if (isBreak(ch)) { buffer[length]='\0'; return length; } else { buffer[length]=ch; length++; } }//switch }//while if (mode==Comparing) { buffer[length]='\0'; return length; } else { //also means EOF return -1; } } //always case insensitive and ch2 is gurantee to be alphabet bool issamechar(char ch1, char ch2) { if (ch1==ch2) { return TRUE; } if (ch2<='Z') { return ch1-ch2=='a'-'A'; } else { return ch2-ch1=='a'-'A'; } } //it is of course low efficiency to first copy and replace, but //it is easy to code void copyFile(char* source, char* target) { int fdOld, fdNew, n; char buf[BuffSize]; if ((fdOld=open(source, O_RDONLY))<0) { errHandle("open error when copying file\n"); } if ((fdNew=open(target, O_WRONLY|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR))<0) { errHandle("open error when copying file\n"); } while ((n=read(fdOld, buf, BuffSize))!=0) { if (n!=write(fdNew, buf, n)) { errHandle("write error when copying file\n"); } } } void replace(int fd, int offset, char* str) { int size; if (str==NULL) { printf("cannot replace empty string\n"); return; } //strlen is just a function and it will be used more than once size=strlen(str); if (lseek(fd, offset, SEEK_SET)<0) { errHandle("lseek error when replacing\n"); } if (size!=write(fd, str, size)) { errHandle("write error when replace\n"); } } //my version of fclose void myfclose(FilePtr ptr) { free(ptr); } //actually it only create file buff struct for both read and write purpose FilePtr myfopen(int fd) { FilePtr result=(FilePtr)malloc(sizeof(struct FileBuf)); if (result==NULL) { errHandle("cannot allocate memory for FileBuf\n"); } result->fd=fd; result->offset=0; result->cur=0; /* if ((result->size=read(fd, result->buf, BuffSize))==-1) { errHandle("unable to start reading file for myfopen\n"); } */ result->size=0;//first setup as 0 return result; } //fflush void myfflush(FilePtr stream) { if (stream->cur>0) { if (stream->cur!=write(stream->fd, stream->buf, stream->cur)) { errHandle("fflush error\n"); } stream->cur=0;//ready for new write } } //putc void myputc(FilePtr stream, char ch) { if (stream->cur>= stream->size) { if (BuffSize !=write(stream->fd, stream->buf, BuffSize)) { errHandle("write error of file buffer\n"); } stream->cur=0; } stream->offset++; stream->buf[stream->cur]=ch; stream->cur++; } //my verstion of fgetc char mygetc(FilePtr stream) { //if buff is fully read if (stream->cur >= stream->size) { /* if (stream->size==0)//must be end of file { return -1; //return EOF; } */ //if erro happened during reading if ((stream->size=read(stream->fd, stream->buf, BuffSize))<0) { errHandle("read error of file buf\n"); } if (stream->size==0)//must be end of file { return -1; //return EOF; } //else reset cur stream->cur=0; } //offset is always inc for each call stream->offset++; return stream->buf[stream->cur++]; } //what is white space bool isBreak(char ch) { return (ch==' '||ch=='\t'||ch=='\n'||ch=='\0'); } int myftell(FilePtr stream) { return stream->offset; } bool myisalpha(char ch) { return ((ch>='A'&&ch<='Z')||(ch>='a'&&ch<='z')); }
file name: makefile
all: main.o cp -f ./main.o ./dirtest/ @echo "make complete for main.o " cp -f *.c ../labbackup/ cp -f *.h ../labbackup/ fileBuf.o : fileBuf.c fileBuf.h myhead.h @echo "compiling fileBuf.o..." gcc -g -c fileBuf.c -o fileBuf.o errHandle.o : errHandle.c myhead.h @echo "compiling errHanle module..." gcc -g -c errHandle.c -o errHandle.o main.o : searchString.o main.c errHandle.o myhead.h fileBuf.o dynaArray.o @echo "compiling main.o..." gcc -g main.c searchString.o errHandle.o fileBuf.o dynaArray.o -o main.o searchString.o : searchString.h searchString.c myhead.h @echo "compiling searchString.o..." gcc -g -c searchString.c -o searchString.o dynaArray.o : dynaArray.h dynaArray.c myhead.h @echo "compiling dynaArray.o..." gcc -g -c dynaArray.c -o dynaArray.o clear : @echo "clear up...\n" rm *.o
How to run?
1. You need the file name as argument in command line
[qingz_hu@alamanni ~/lab1] % cd dirtest [qingz_hu@alamanni dirtest] % ./main.o ./dirtest1 <./dirtest1/dirtest2/softlink2> is a repeat directory which is already in search list [qingz_hu@alamanni dirtest] % ../make ../make: Command not found. [qingz_hu@alamanni dirtest] % cd .. [qingz_hu@alamanni ~/lab1] % make compiling searchString.o... gcc -g -c searchString.c -o searchString.o compiling main.o... gcc -g main.c searchString.o errHandle.o fileBuf.o dynaArray.o -o main.o cp -f ./main.o ./dirtest/ make complete for main.o cp -f *.c ../labbackup/ cp -f *.h ../labbackup/ [qingz_hu@alamanni ~/lab1] % cd dirtest [qingz_hu@alamanni dirtest] % ./main.o ./dirtest1 <./dirtest1/dirtest2/softlink2> is a directory already in list, ignore it [qingz_hu@alamanni dirtest] %