File Type Statistics Using C

CCBeginner
Practice Now

Introduction

This chapter is based on the file and directory interfaces of Linux. This project revolves around the nature of the file system, using the lstat function and directory operations to implement a program for recursively counting file types. It provides a convenient way to gain a deep understanding of the composition of file types in the Linux file system. Additionally, the file type counting program developed in this project can be used in practical learning and work environments.

👀 Preview

$ ./file_type .
regular files = 2, 66.67 %
directories = 1, 33.33 %
block special = 0, 0.00 %
char special = 0, 0.00 %
FIFOs = 0, 0.00 %
symbolic links = 0, 0.00 %
sockets = 0, 0.00 %

ðŸŽŊ Tasks

In this project, you will learn:

  • How to implement a program in C that recursively counts file types in a directory using Linux file and directory interfaces.

🏆 Achievements

After completing this project, you will be able to:

  • Use the lstat function to obtain file information in Linux.
  • Perform directory operations such as opening directories and reading directory entries.
  • Create a program that recursively counts different file types, including regular files, directories, block special files, character special files, named pipes, symbolic links, and sockets.
  • Calculate and display the percentage of each file type within a directory.

Basic Knowledge and Creating Project Files

Next, we will introduce the steps from conception to implementation, mainly applying the following C language knowledge points:

  • stat structure and lstat function, opendir, readdir functions, dirent structure, recursion, function calls, etc.

In the whole program, we construct functions such as main, myftw, dopath, myfunc, and path_alloc.

  • The myfunc function mainly traverses and counts the file types that meet the criteria.
  • The dopath function mainly recursively obtains paths and determines whether they are directories or files.
  • The myftw function starts from obtaining the starting address and size of memory space for storing the complete path from path_alloc.
  • The path_alloc function mainly allocates memory space for the path (complete path).

Create a new file named file_type.c in the ~/project directory, and open it in your preferred code editor.

cd ~/project
touch file_type.c
âœĻ Check Solution and Practice

Design the main Function

The main function's main function is to first receive command line arguments and check their validity. Then it calls the myftw function to calculate the number of various types of files. Finally, it calculates the percentage of file types and outputs them.

#include <dirent.h>
#include <limits.h>
#include <sys/stat.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

#define FTW_F 1 /* Flag for non-directory files */
#define FTW_D 2 /* Flag for directory files */
#define FTW_DNR 3 /* Flag for unreadable directory files */
#define FTW_NS 4 /* Flag for files that cannot be accessed by stat */

static char *fullpath;
static size_t pathlen;

/* Define the function for handling files */
typedef int Myfunc(const char *, const struct stat *, int);
static Myfunc myfunc;
static int myftw(char *, Myfunc *);
static int dopath(Myfunc *);
char *path_alloc(size_t *size);

static long nreg, ndir, nblk, nchr, nfifo, nslink, nsock, ntot;

int main(int argc, char *argv[])
{
 int ret;

 // Perform input validity check
 if (argc != 2)
 {
  printf("Invalid command input! \n");
  exit(1);
 }

 /* Calculate the number of various types of files */
 ret = myftw(argv[1], myfunc);

 /* Calculate the total number of files */
 ntot = nreg + ndir + nblk + nchr + nfifo + nslink + nsock;

 /* Avoid division by 0 to improve program stability */
 if (ntot == 0)
  ntot = 1;

 /* Print the percentage of various types of files */
 printf("regular files  = %7ld, %5.2f %%\n", nreg,
  nreg*100.0 / ntot);
 printf("directories    = %7ld, %5.2f %%\n", ndir,
  ndir*100.0 / ntot);
 printf("block special  = %7ld, %5.2f %%\n", nblk,
  nblk*100.0 / ntot);
 printf("char special   = %7ld, %5.2f %%\n", nchr,
  nchr*100.0 / ntot);
 printf("FIFOs          = %7ld, %5.2f %%\n", nfifo,
  nfifo*100.0 / ntot);
 printf("symbolic links = %7ld, %5.2f %%\n", nslink,
  nslink*100.0 / ntot);
 printf("sockets        = %7ld, %5.2f %%\n", nsock,
  nsock*100.0 / ntot);
 exit(ret);
}
  • *fullpath: Used to store the full path of the file.
  • pathlen: Used to store the length of the file path.
  • nreg: The number of regular files.
  • ndir: The number of directory files.
  • nblk: The number of block special files.
  • nchr: The number of character special files.
  • nfifo: The number of named pipes.
  • nslink: The number of symbolic link files.
  • nsock: The number of socket files.
  • ntot: Total number of files.
âœĻ Check Solution and Practice

Design the myftw Function

The function is used to handle pathname and save it in a global character array, and then call dopath. The function path_alloc is used to allocate space, and it is important to note that fullpath is a global variable, so different functions can conveniently use it. Then, call the dopath function to further process the pathname (whether it is a directory or not).

static int myftw(char *pathname, Myfunc *func)
{
 /* Allocate space for the string array to save the path */
 fullpath = path_alloc(&pathlen);

 /* If the allocated space is not enough to save the path, use realloc to reallocate */
 if (pathlen <= strlen(pathname)) {
  pathlen = strlen(pathname) * 2;
  if ((fullpath = realloc(fullpath, pathlen)) == NULL);
  printf("realloc failed!\n");
 }

 /* Save the pathname parameter in the full path. Pay attention: fullpath is a global variable
    and can be called by dopath */
 strcpy(fullpath, pathname);

 /* Call the dopath function */
 return(dopath(func));
}

/* Path array allocation */
char *path_alloc(size_t* size)
{
 char *p = NULL;
 if (!size)
  return NULL;
 p = malloc(256);
 if (p)
  *size = 256;
 else
  *size = 0;
 return p;
}
âœĻ Check Solution and Practice

Design dopath Function

  • int lstat(const char *path, struct stat *buf) When the file is a symbolic link, lstat returns information about the symbolic link itself, while stat returns information about the file that the link points to. One data structure used here is the stat structure. The definition of this structure is as follows:
struct stat {
dev_t           st_dev;
ino_t           st_ino;
mode_t          st_mode;
nlink_t         st_nlink;
uid_t           st_uid;
gid_t           st_gid;
dev_t           st_rdev;
off_t           st_size;
timestruc_t     st_atim;
timestruc_t     st_mtim;
timestruc_t     st_ctim;
blksize_t       st_blksize;
blkcnt_t        st_blocks;
char            st_fstype[_ST_FSTYPSZ];
};
  • DIR* opendir (const char * path ) Function: open a directory, returning an empty pointer when failed.

  • struct dirent *readdir(DIR *dir) Function: readdir() returns the next directory entry in the directory stream dir. Return value: On success, readdir() returns a pointer to the next directory entry. Upon reaching the end of the directory stream, readdir() returns NULL.

/* dopath is used to determine if it is a directory, and then choose whether to directly enter the myfunc function for counting, or recursively call the dopath function. */
static int dopath(Myfunc* func)
{
 struct stat statbuf;
 struct dirent *dirp;
 DIR *dp;
 int ret, n;

 /* Call the lstat function to obtain the stat information of the pathname. If it fails, call the func function and pass FTW_NS */
 if (lstat(fullpath, &statbuf) < 0)
  return(func(fullpath, &statbuf, FTW_NS));

 /* Check the st_mode of the file stat structure. If it is not a directory, call the func function and pass FTW_F, and then determine the file type by myfunc */
 if (S_ISDIR(statbuf.st_mode) == 0)
  return(func(fullpath, &statbuf, FTW_F));

 /* The last case is that the pathname represents a directory. The normal return value of func is 0, so after executing func, it will not return and will continue to recursively call func */
 if ((ret = func(fullpath, &statbuf, FTW_D)) != 0)
  return(ret);
 /* Path processing, expand the length of the path space */
 n = strlen(fullpath);
 if (n + NAME_MAX + 2 > pathlen) {
  pathlen *= 2;
  if ((fullpath = realloc(fullpath, pathlen)) == NULL)
   printf("realloc failed!\n");
 }
 fullpath[n++] = '/';
 fullpath[n] = 0;

 /* Handle each entry in the directory */
 if ((dp = opendir(fullpath)) == NULL)
  return(func(fullpath, &statbuf, FTW_DNR));
 while ((dirp = readdir(dp)) != NULL) {
  if (strcmp(dirp->d_name, ".") == 0  ||
      strcmp(dirp->d_name, "..") == 0)
    continue;  /* Ignore the current directory (.) and the parent directory (..) to avoid infinite loop */
  strcpy(&fullpath[n], dirp->d_name); /* Append the name of the current directory entry after "/" */
  if ((ret = dopath(func)) != 0) /* Then recursively call dopath with the new pathname */
   break;
 }
 fullpath[n-1] = 0;

 /* Close the directory */
 if (closedir(dp) < 0)
  printf("can't close directory %s", fullpath);
 return(ret);
}
âœĻ Check Solution and Practice

Design the myfunc Function

The main purpose of the myfunc function is to determine the file type based on stat and count them. S_IFMT is a mask used to interpret the st_mode flags.

There are some macro definitions used to help determine the file type. These are parameterized macros, similar to function calls:

  • S_ISBLK: Test if it is a special block device file.
  • S_ISCHR: Test if it is a special character device file.
  • S_ISDIR: Test if it is a directory.
  • S_ISFIFO: Test if it is a FIFO device.
  • S_ISREG: Test if it is a regular file.
  • S_ISLNK: Test if it is a symbolic link.
  • S_ISSOCK: Test if it is a socket.
static int myfunc(const char *pathname, const struct stat *statptr, int type)
{
 switch (type) {

 /* Handling for non-directory files */
 case FTW_F:
  switch (statptr->st_mode & S_IFMT) {
  case S_IFREG: nreg++;  break;
  case S_IFBLK: nblk++;  break;
  case S_IFCHR: nchr++;  break;
  case S_IFIFO: nfifo++; break;
  case S_IFLNK: nslink++; break;
  case S_IFSOCK: nsock++; break;
  case S_IFDIR:
   printf("for S_IFDIR for %s", pathname);
  }
  break;

 /* Handling for directory files */
 case FTW_D:
  ndir++;
  break;

 /* Handling for unreadable directories */
 case FTW_DNR:
  printf("%s directory is unreadable", pathname);
  break;
 case FTW_NS:
  printf("%s error in stat", pathname);
  break;
 default:
  printf("Type %d is unrecognized, pathname is %s", type, pathname);
 }
 return(0);
}
âœĻ Check Solution and Practice

Compile and Test

Enter the following command in the terminal to compile and run:

cd ~/project
gcc -o file_type file_type.c
./file_type .
labex:project/ $ ls
file_type file_type.c

labex:project/ $ gcc -o file_type file_type.c

labex:project/ $ ./file_type .
regular files = 2, 66.67 %
directories = 1, 33.33 %
block special = 0, 0.00 %
char special = 0, 0.00 %
FIFOs = 0, 0.00 %
symbolic links = 0, 0.00 %
sockets = 0, 0.00 %

The results show that in the current directory, regular files account for 66.67% and directories account for 33.33%.

If you need to count the permissions in the system directory, you can use sudo before executing the command, even so, there are still some files for which the permissions cannot be counted.

âœĻ Check Solution and Practice

Summary

Through the training of this project, one can improve their understanding of the Linux file system. They will learn how to perform directory operations and gain a deeper understanding of the stat structure that stores file information. By completing this project, one can develop a practical Linux tool.

Other C Tutorials you may like