Manage Files in Unix


Introduction

Files are meant as a sequence of bytes. The access to them is sequential: files are accessed through file descriptors.

Moreover, each process has an I/O pointer that points to the current position of the file over which the process is currently operating.

Each user as its own vision about opened files: if more users open the same file, each user process has its own I/O pointer.

File Descriptors

A Unix process interpreter all I/O devices as a set of descriptors. This behaviour is caused by the fact that there’s homogeneity between files and devices.

File descriptors are non-negative integer numbers that identify the opened files.

A Unix process has:

  • stdin: file descriptor 0 (standard input)
  • stdout: file descriptor 1 (standard output)
  • stderr: file descriptor 2 (standard error)

Those file descriptors are automatically opened by the shell foreach process and are linked to the I/O. New operations of request to operate over a file (or over a device) produce new file descriptors for a process.

The descriptor table is the data structure of a process.

Processes interact with I/O with open-read-write-close paradigm: prologue and epilogue operations. The I/O pointer points to the current position of the file where the process is operating (writing or reading).

Files and multi-user

A process gets UID and GID of the user that launch it. The kernel stores for each file UID and GID of the creator process. The kernel looks at effective UID and GID of the process that is accessing the file.

Every user in Unix operating system is identified by a different integer number (called UserID).

There are three types of UID defined for a process, which can be dynamically changed as per the privilege of task.

The three different types of UIDs defined are :

  1. Real UserID:
    For a process, Real UserId is simply the UserID of the user that has started it. It defines which files that this process has access to.

  2. Effective UserID:
    It is normally the same as Real UserID, but sometimes it is changed to enable a non-privileged user to access files that can only be accessed by a privileged user like root.

  3. Saved UserID:
    It is used when a process is running with elevated privileges (generally root) needs to do some under-privileged work, this can be achieved by temporarily switching to a non-privileged account.

While performing under-privileged work, the effective UID is changed to some lower privilege value, and the UID is saved to saved userID, so that it can be used for switching back to a privileged account when the task is completed.

A process can access to a file if:

  1. process UID is equal 0 (note that root user has UID equal 0)
  2. process UID is equal to file owner UID and rights allow it
  3. process UID is not equal to file owner UID but process GID is equal to file owner GID and rights allow it
  4. process UID and GID are not equal to file owner UID and GID but rights for other allow it

System Call

Programmers can invoke Unix system functions (like open a file, create a process, put in communication two processes) using system calls.

Unlike standard procedures, a system call is executed by the operative system.

System calls are also called primitives because they are elementary actions of Unix virtual machine. They have the following properties:

  • base operations: with which creating all the other operations
  • atomic operations: executed without interruption
  • protected operations: executed in kernel environment

In case of error, all system calls returns value -1.
You can use this value to check if a system call succeeded or failed.

A system call can operate at low level over files using those C functions:

Note that almost any system calls are defined inside <unistd.h> library.

Standard I/O Library

<stdio.h> library is built above system calls. It contains functions to access the files at a higher level: instead of file descriptors uses stream represented by data structures of type FILE.

It follows that:

  • stdin is a stream associated to standard input file descriptor
  • stdout is a stream associated to standard output file descriptor
  • stderr is a stream associated to standard error file descriptor

It gives a better formatting, buffering and more efficiency.

Operations over files

Because of homogeneity between files and devices, those functions can also be used over devices.

Create a file

The following code creates an empty file, whose name is passed as first parameter.

#include <stdio.h>  // ISO C99 Standard: 7.19 Input/output
#include <stdlib.h> // ISO C99 Standard: 7.20 General utilities
#include <fcntl.h>  // POSIX Standard: 6.5 File Control Operations

int main(int argc, char **argv) {
    if (argc < 2) {
        printf("Error: file name not specified.\n");
        exit(1);
    }

    // file name in file system
    char *name = argv[1];

    // file attributes
    // S_IRUSR: Read permission for the file owner.
    // S_IRGRP: Read permission for the file's group.
    // S_IROTH: Read permission for users other than the file owner.
    // S_IWUSR: Write permission for the file owner.
    int mode = S_IRUSR|S_IRGRP|S_IROTH|S_IWUSR|S_IWGRP;

    // new file descriptor returned from creat
    int fd = creat(name, mode);
    // int fd = open(name, O_CREAT|O_WRONLY|O_TRUNC, mode);

    if (fd != -1) {
        printf("%s file have been created\n", name);
    }
    printf("file descriptor = %d\n", fd);

    return 0;
}

// gcc -o main create-test.c && ./main ./test-file.txt

The mode argument specifies the file permission bits to be used in creating the file.

Full list of modes can be found at GNU’s mode bits for access permission.

Open a file

The following code opens a file, whose name is passed as first parameter.

#include <stdio.h>  // ISO C99 Standard: 7.19 Input/output
#include <stdlib.h> // ISO C99 Standard: 7.20 General utilities
#include <fcntl.h>  // POSIX Standard: 6.5 File Control Operations

int main(int argc, char **argv) {
    if (argc < 2) {
        printf("Error: file name not specified.\n");
        exit(1);
    }

    char *file_name = argv[1]; // file name in file system

    // file attributes, Unix rights, usually in octal
    int mode = O_WRONLY|O_APPEND;

    // new file descriptor returned from creat
    int fd = open(name, mode);
    if(fd < 0) {
        fprintf(stderr, "Error while opening %s file\n", file_name);
        exit(EXIT_FAILURE);
    }
    printf("file descriptor = %d\n", fd);
    return 0;
}

// gcc -o main open-file.c && ./main ./test-file.txt

In <fcntl.h> library are defined the constants like O_RDONLY, O_WRONLY, O_RDWR.

Values for oflag (second argument of open) are constructed by a bitwise-inclusive OR of flags from the following list.

Applications should specify exactly one of the first five values (file access modes) below in the value of oflag:

  • O_EXEC: Open for execute only (non-directory files)
  • O_RDONLY: Open for reading only
  • O_RDWR: Open for reading and writin
  • O_SEARCH: Open directory for search only
  • O_WRONLY: Open for writing only

Any combination of the following may be used:

  • O_APPEND: If set, the file offset shall be set to the end of the file prior to each write.
  • O_CLOEXEC: If set, the FD_CLOEXEC flag for the new file descriptor shall be set.
  • O_CREAT: If the file exists, this flag has no effect. Otherwise, if O_DIRECTORY is not set the file shall be created as a regular file.
  • O_DIRECTORY: If path resolves to a non-directory file, fail and set errno to [ENOTDIR].
  • O_DSYNC: Write I/O operations on the file descriptor shall complete as defined by synchronized I/O data integrity completion.
  • O_EXCL: If O_CREAT and O_EXCL are set, open() shall fail if the file exists.

There are many other constants: please refer to open manual page.

Close a file

When you create or open a file, you must close it before the finish of program execution in order to free up resources.

close() returns zero on success while set -1 on error, and errno is set to indicate the error.

Moreover, perror() function produces a message on standard error describing the last error encountered during a call to a system or library function. It can be a good alternative to fprintf(stderr, "...") function to describe what lead to an error during a system call.

#include <stdio.h>  // ISO C99 Standard: 7.19 Input/output
#include <stdlib.h> // ISO C99 Standard: 7.20 General utilities
#include <fcntl.h>  // POSIX Standard: 6.5 File Control Operations
#include <unistd.h> // POSIX Standard: 2.10 Symbolic Constants

int main(int argc, char **argv) {
    // ...
    // new file descriptor returned from creat
    int fd = creat(name, mode);

    // ...
    // close the file descriptor
    if(close(fd) == -1) {
        perror("file descriptor close");
        exit(EXIT_FAILURE);
    }

    return 0;
}

Since there’s a maximum number of file descriptors that can stay open at the same time, it’s always very important to close file descriptors that are no more used by a process.

Read & Write

read() and write() operations always start from I/O pointer position: when it reaches the end of the file, read() returns 0.

read() and write() operations are atomic. It means that you cannot execute a part of that operation: it is executed in full, or nothing (not half, not a quarter).

// include ...

int main() {
    int fd; // file descriptor
    int n; // number of chars that you want to read-write
    int nread, nwrite; // number of chars actually read/wrote
    char *buf; // buffer where to take the chars to write or place the read chars

    /* READ */
    nread = read(fd, buf, n);

    /* WRITE */
    nwrite = write(fd, buf, n);

    return 0;
}

Copy a file

The code below will copy the content from a file and write it on another file:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#define perm 0644

int main(int argc, char **argv) {

    char* input_file = argv[1];
    char* output_file =  argv[2];

    // file descriptors
    int fd_infile, fd_outfile;

    // number of read characters
    int n_read;

    char buffer[BUFSIZ];

    // open infile
    fd_infile = open(input_file, O_RDONLY);

    // create outfile
    fd_outfile = creat(output_file, perm);

    // while the number of read characters is major than 0
    while ((n_read = read(fd_infile, buffer, BUFSIZ)) > 0) {
        // write the characters on the out file
        write(fd_outfile, buffer, n_read);
    }

    // close files
    close(fd_infile);
    close(fd_outfile);
}

// gcc -o main copy-from-file-to-another.c
// ./main ./hello.txt out.txt
// ./main ./hello.txt /dev/pts/NUM

You can also specify as output file a device because of homogeneity between files and devices. In this way, you can print the file content to a Terminal window.

Open a terminal and type ps a command to report a snapshot of the current processes and find out the device name (in this example will be /pts/2).
Compile the code above into an executable called main and, in place of out.txt, specify /dev/pts/2 as shown below:

ps a
    PID TTY      STAT   TIME COMMAND
   8836 pts/1    Ss+    0:00 /bin/bash
   8874 pts/2    Ss     0:00 /bin/bash
   8907 pts/2    R+     0:00 ps a

./main ./hello.txt /dev/pts/2

Insert characters

Insert characters in a file (given as first argument) from standard input.

// include those libraries <stdio.h> <stdlib.h> <fcntl.h> <unistd.h>
#define perm 0644

void check_arguments_errors(int);

int main(int argc, char **argv) {
    check_arguments_errors(argc);

    printf("Insert characters from standard input. Press Ctrl + D to finish.\n");

    char* file_path = argv[1];
    int fd, n_read;

    // allocate memory for the buffer
    char* buff = (char *) malloc(80);

    if ((fd = open(file_path, O_WRONLY)) < 0) {
        if ((fd = creat(file_path, perm)) < 0) {
            exit(1);
        }
    }

    printf("Opened or created %s file con fd = %d\n", file_path, fd);

    while ((n_read=read(0, buff, 80)) > 0) {
        if (write(fd, buff, n_read) < n_read) {
            close(fd);
            exit(2);
        }
    }

    close(fd);
}

// gcc -o main insert-characters-in-file.c
// ./main ./out.txt

Append string

// include <stdio.h> <stdlib.h> <fcntl.h> <unistd.h> <string.h> libraries
#define perm 0644

/**
 * @brief Pattern that compare the given strings
 * Return 1 if the second char is equal 's' and penultimate is a digit
 */
int pattern(char *s) {
    return (s[1] == 's' && s[strlen(s)-2] >= '0' && s[strlen(s)-2] <= '9' ? 1 : 0);
}

/**
 * @brief strings given as input are appended to a file
 * only if they satisfy a pattern file name is a parameter.
 */
int main(int argc, char ** argv) {

    int fd;
    char string[80],  answer[3], eol = '\n';
    long int pos = 0;

    char* file_path = argv[1];

    printf("Append strings from standard input that matches a given pattern\n");
    printf("File name: %s\n", file_path);

    // open in write mode
    if ((fd = open(argv[1], O_WRONLY)) < 0) {
        // if the file does not exist, create it
        fd = creat(argv[1], perm);
    } else {
        // if it does exist, move to the end of it
        pos = lseek(fd, 0, 2);
    }

    printf ("file contains %ld byte(s)\n", pos);

    while (
        printf("Do you want to exit? (Y/n): "), scanf("%s", answer),
        strcmp(answer,"Y")
    ) {
        printf("string to insert: ");
        scanf("%s", string);

        // if the pattern is satisfied, insert in the file
        if (pattern(string)) {
            write(fd, string, strlen(string));
            write(fd, &eol, 1);
        }
    }

    close(fd);
    return 0;
}

// cd intro-examples
// gcc -o main strings-append-on-file.c && ./main test.txt

lseek

lseek is a non-sequential operation that moves the I/O pointer of the file for the invoking process.

// 0 = from beginning
// 1 = from current
// 2 from the end
int origin;

int fd;
long int offset;

long int new_position = lseek(fd, offset, origin);

Access rights over a file

access

Verify the access rights over a file.

// include <stdio.h> <stdlib.h> <unistd.h> libraries

int main(int argc, char **argv) {
    // file name in file system
    char *name = argv[1];

    // 04 read access
    // 02 write access
    // 01 execute access
    // 00 existence
    int a_mode = 0;

    if(access(name, 04) == 0) {
        printf("Current user has read access over %s file.\n", name);
    }
    if(access(name, 02) == 0) {
        printf("Current user has write access over %s file.\n", name);
    }

    if(access(name, 01) == 0) {
        printf("Current user has execute access over %s file.\n", name);
    }

    if(access(name, 00) == 0) {
        printf("%s file exist.\n", name);
    }

    return 0;
}

// gcc -o main access-test.c && ./main ../test-file.txt
// Docs: https://man7.org/linux/man-pages/man3/access.3p.html

chmod

Change access rights over a file.

// include <stdio.h> <stdlib.h> <sys/stat.h> libraries

int main(int argc, char ** argv) {

    char *name = argv[1];
    int new_mode = atoi(argv[2]); //S_IRUSR|S_IWUSR|S_IRGRP;

    if(chmod(name, new_mode) == 0) {
      printf("Succesfully changed %s file permission to %d.\n", name, new_mode);
    } else {
      printf("Error while changing %s file permission to %d.\n", name, new_mode);
    }

    return 0;
}

// gcc -o main chmod-test.c && ./main ../test-file.txt
// Docs: https://man7.org/linux/man-pages/man3/chmod.3p.html
// mode bits: https://www.gnu.org/software/libc/manual/html_node/Permission-Bits.html

Create a link for an existing file.

// include <stdio.h> <stdlib.h> <unistd.h> libraries

int main(int argc, char ** argv) {
    char *source_name = argv[1];
    char *dest_name = argv[2];

    int link_return_value = link(source_name, dest_name);

    if(link_return_value == 0) {
      printf("Create %s link of %s\n", dest_name, source_name);
    } else if (link_return_value == -1) {
      printf("Error occured while creating %s link of %s\n", dest_name, source_name);
    }

    return 0;
}

// gcc -o main link-test.c && ./main ../test-file.txt ./link-file.txt
// Docs: https://man7.org/linux/man-pages/man3/link.3p.html

Delete a link for an existing file. If the number of links is equal zero, then the file is deleted and the disk space is freed-up.

// include <stdio.h> <stdlib.h> <unistd.h> libraries

int main(int argc, char ** argv) {
    char *name = argv[1];

    int unlink_return_value = unlink(name);

    if(unlink_return_value == 0) {
      printf("Unlink %s\n", name);
    } else if (unlink_return_value == -1) {
      printf("Error occured while unlinking %s\n", name);
    }

    return 0;
}

// gcc -o main unlink-test.c && ./main ./link-file.txt
// Docs: https://man7.org/linux/man-pages/man3/unlink.3p.html

stat & fstat

Get information about a file. Inside <sys/stat.h> is defined the stat struct.

// include <stdio.h> <stdlib.h> <sys/stat.h> <sys/types.h>

int main(int argc, char **argv) {
    if (argc != 2) {
        printf("Error: wrong number of arguments.\n");
        exit(1);
    }

    char *name = argv[1];
    struct stat buff;

    if (stat(name, &buff) == 0) {
        printf("Buffer fields:\n");
        printf("mode = %d\n", buff.st_mode);
        printf("inode = %lu\n", buff.st_ino);
        printf("device ID = %lu\n", buff.st_dev);
        printf("link numbers = %lu\n", buff.st_nlink);
        printf("owner UID = %d\n", buff.st_uid);
        printf("owner GID = %d\n", buff.st_gid);
        printf("file length in bytes = %ld\n", buff.st_size);
        printf("last access time = %ld\n", buff.st_atime);
        printf("last modify time = %ld\n", buff.st_mtime);
        printf("last status change time = %ld\n", buff.st_ctime);
    } else {
        printf("Error while getting stat for %s\n", name);
    }

    return 0;
}

// cd system-calls/access-rights
// gcc -o main stat-test.c && ./main ../test-file.txt
// Docs: https://man7.org/linux/man-pages/man3/stat.3p.html

Conclusion

Useful links: