I/O Management

Introduction

Let’s start this article with some schemas.

Schemas

Hardware architecture of a computing system

Logic management of I/O subsystem for devices

Logic Management of I/O subsystem 2

Device manager

Terminology

Buffering

Buffering areas store data during transfer from devices and application processes’ memory areas. Buffering is useful to narrow down the differences between different speed of production and consuming of data, and to parallelize I/O accesses.

Exception management

During I/O operations may occur different anomalous events which can be masked and hidden from the users, or communicated and propagated to processes and users.

Naming

Each device has a unique identifier.

Polling

Device polling refers to a technique that lets the operating system periodically poll devices, instead of relying on the devices to generate interrupts when they need attention.

When done properly, polling gives more control to the operating system on when and how to handle devices, with a number of advantages in terms of system responsiveness and performance.

Spooling

Technique to manage shared resources. It avoids any kind of interference over resources and reduces waiting time for processes.

Interrupt-driven I/O

In synchronous management, each process which wants to start an I/O operation is blocked waiting for the OS to terminate that request.

In asynchronous management, when the I/O operation ends, controller sends an hardware interrupt to the OS, which can then wake up the blocker process.

Interrupt-driven I/O management avoids polling and the related inefficiency lead by waiting times of processes.

Drivers are part of the operative system, they manage the devices.

A driver sends commands to controller and manage interruptions.

Hard Disk Management

Hard Disks provide mass memory, which is used by file system.

The sector is the minimum unit of allocation and transfer of memory. A sector is identified by number of disk side, track number and sector number inside the track.

Hard Disk performance is measured is term of average transfer time (TF), which is the sum of:

TA: average access time (time to move the head)
TT: average transfer data time (time to actually transfer data)

Moreover, the TA is the sum of:

ST: time to longitudinally move the disk head on the required track (seek time)
RL: time to rotate the disk in order to read the required sector (rotational time)

Disk performance is expressed in terms of round per minutes, usually between 5'400 and 15'000 rmp. TT is in the order of microseconds. ST and RL are in the order of milliseconds.

In a concurrent system, the file system must manage many requests from many processes. To reduce data access waiting time, scheduling policy must be established.

The most common policies are:

FCFS: First Come First Served
SSTF: Shortest Seek Time First
SCAN: from 1st to last cylinder

RAID Disks

Redundant Array of Independent Disks systems are used to improve performance, reliability and fault tolerance.

It is based on different levels.

RAID 0 - Striping

A single logic volume is stored into different disks. The way data is allocated allows to parallelize I/O operations.

RAID 0

RAID 1: Mirroring

The data is repeated on two disks. The first disk is mirrored on the second disk. Data is always written over both the disks. Reading can be parallelized.

RAID 1

RAID 2: Striping with ECC

A central controller synchronizes the disks by making them spin at the same angular orientation so that they all reach the index simultaneously. RAID 2 uses bit-level striping, and each sequential bit is placed on a different hard drive.

The error correcting code (ECC) used is the Hamming code parity, which is calculated across bits and stored separately in at least a single drive.

RAID 2

Other RAIDs types are:

RAID 3: Striping with ECC stored as parity
RAID 4: Striping with parity stored on a drive
RAID 5: Striping with parity across more drives

File System Management

File systems allow memorizing data on secondary memory. From (user) application to secondary memory, there are the following levels of abstraction:

Logical Structure: gives file and directory abstractions
Access: access and protection policies
Physical Structure: allocate file’s logic records on physical blocks of mass memory
Virtual Device: gives the physic block abstraction to interpreter mass memory as a list of blocks

Logic Structure

Gives file and directory abstractions and the relative system calls to manage them. A UNIX process interpreters the I/O word around it as a set of descriptors because of homogeneity between files and device. Processes interact with I/O according to open-read-write-close paradigm: prologue and epilogue operations.

Access

A multi-user system must manage the access to files in order to guarantee the privacy for all users. Foreach file, the system could be defined different classes of users and different ways to access the file, which usually are:

r: read
w: write
x: execute

Physical Structure

The most important allocation methods are: adjacent allocation, linked list allocation (with or without FAT table) and index allocation.

In Adjacent Allocation a file is allocated in an empty area major than the file dimension. This leads to fragmentation and needs periodical defragmentation routines of mass memory.

In Linked List Allocation there are no fragmentation issues. The sequential access is efficient.

Linked List Allocation with FAT table uses a FAT table as a data structure that describes blocks allocation map.

Lastly, Index Allocation in a non-adjacent allocation which does not causes fragmentation.

UNIX File System Management

A file descriptor is a pointer to an i-node. It is a non-negative integer that identifies an opened file.

In UNIX’s System Programming, many variables contains fd in their names to refer the fact that they store file descriptors.

Each process has an I/O pointer that points to the current position of the file over which the process is currently operating. Each user as its own vision about opened files: if more users open the same file, each user process has its own I/O pointer.

UNIX File System Management

UNIX Access

Take a look at File Protection slice of Introduction to UNIX article.

UNIX Physical Structure

Each file is represented by a descriptor (data structure) called i-node. i-node structure for devices is:

ID number
device class (blocks/chars) i-node structure for files and directories is:
Owner: UID and GID
file type
permissions
bits: SUID (Set owner User ID), SGID (Set owner Group ID) and sticky
links number
file dimension
allocation: addresses of thirteen blocks (index allocation)

Index allocation in Unix works in this way. The first ten addresses (from 0 to 9) directly point to blocks on mass memory. The 11th address has a level of indirectness: it points to a block that holds addresses to other 13 blocks.

The 12th address has two levels of indirectness and 13-th address has tree levels of indirectness. UNIX Physical Structure

The disk is subdivided into fixed-dimension blocks:

boot-block
superblock: describe the file system
i-list: i-node list
Data block

Calling open() system call lead to the allocation of an element (a file descriptor) in the first available position of Table of Process Opened Files. Then, a new record is added to Table of System Opened Files and a copy of its i-node is added to Active File Table (if the file has not been opened by another process).

If more processes open the same file, they:

share a single entry in Active File Table
have separate entries in Table of System Opened Files

open() system call

Let’s take a deeper look at the case of a process that opens a file before creating a child process. Both father and child share the same entry in Table of Process Opened Files. Due to this, they also share the same i-node inside Active File Table and the same I/O pointer.

open() system call parent & child processes

Linux File System

Let’s take, for example, any Linux-based Operative System. Its File System Hierarchy Standard is well-defined at pathname.com/fhs/.

As you can read in paragraph 3.2 Requirements of fhs-2.3.pd, the following directories (or symbolic links to directories) are required in / (root).

Directory	Description
`bin`	Essential command binaries
`boot`	Static files of the boot loader
`dev`	Device files
`etc`	Host-specific system configuration
`lib`	Essential shared libraries and kernel modules
`media`	Mount point for removable media
`mnt`	Mount point for mounting a filesystem temporarily
`opt`	Add-on application software packages
`sbin`	Essential system binaries
`srv`	Data for services provided by this system
`tmp`	Temporary files
`usr`	Secondary hierarchy
`var`	Variable data

Protection & Security

Protection, on an operative system, is given by the guarantee that system resources are provided only to authorized subjects. Resources are meant as both logics and physics (CPU, memory, but also files, semaphores, pipes, …). Subjects are users, processes and procedures. Protection objective is to ensure that each active party in a system uses the resources only in the ways allowed by given predefined policies. The Reference Monitor is the operative system component that mediate between processors’ access requests and resources.

Least Privilege

Least Privilege principle tells that, in any instant of time, a process must access only to those strictly necessaries resources that allow the process to perform its right working.

Protection Domain

A protection domain defines a set of resources (objects, O_n) and the corresponding access rights (read, write, execute) allowed to subjects included in that domain. A process (subject) works inside a protection domain that specifies the resources the process can use and with which access rights.

For example, protection domain D1 could consist of the following entities:

<O₁, { read, write } >
<O₂, {read, write, execute, print } >
<O₃, { read } >

Protection domain D2:

<O₁, { write } >
<O₂, { read, write } >
<O₃, { read, print } >

Protection domain D3:

<O₁, { read } >
<O₂, { read, print } >
<O₃, { execute, print } >

Access Matrix

Each matrix’s cell (indicated with access_matrix(i, j) C-style function) defines the set of access rights that a subject operating on a specific domain i can perform over the object j. Starting from this table, it’s possible to archive an ordered set of triple:

<domain, object, set of access rights>

When an operation M have to be executed on a domain D_i over an object O_j, the operative system looks for triple <D_i,O_j,R_k> with M ∈ R_k If M exists, it can be executed, otherwise it’ll throw an exception.

Access Matrix

Static VS Dynamic Domains

If the domain is static, the combination process-domain can’t change during the process’ life. If the domain is dynamic, a switch right allow a process to change its protection domain.

Access Matrix with switch

The problems related to Access Matrix come in play when the matrix starts to have a huge number of columns and rows, and many cells are left empty. To overcome those issues, the operative system can use tools like ACL (Access Control List) or Capability List.

Access Control List

Foreach object, the ACL provides an order couple:

<domain, set of access rights>

Those couples are valid only for domains with a non-empty set of access rights. When an operation M have to be executed on a domain D_i over an object O_j, the operative system looks inside the ACL for <D_i,R_k> with M ∈ R_k

If M does not exists, M is searched in a sort of default list. If M doesn’t even exists in this list, an exception is thrown. To improve efficiency, the operative system can use the “default list” before using the ACL.

Capability List

Foreach domain, the Capability List provides the set of objects and the corresponding access rights. For example:

D₁: <O₁, rights>, <O₂, rights>, ... and D₂: <O₃, rights>, <O₇, rights>, ...

An object is usually identified by its physic name or by its capability. The ownership of a capability corresponds to the authorization of execute a certain operation. Capability List is protected and managed by the operative system: it cannot be manipulated by a user process.

Quotes

References:

polling(4)

I/O Management

Introduction

Schemas

Hardware architecture of a computing system

Logic management of I/O subsystem for devices

Device manager

Terminology

Buffering

Exception management

Naming

Polling

Spooling

Interrupt-driven I/O

Hard Disk Management

RAID Disks

RAID 0 - Striping

RAID 1: Mirroring

RAID 2: Striping with ECC

File System Management

Logic Structure

Access

Physical Structure

UNIX File System Management

UNIX Access

UNIX Physical Structure

Share file between processes in UNIX

Linux File System

Protection & Security

Least Privilege

Protection Domain

Access Matrix

Static VS Dynamic Domains

Access Control List

Capability List

Quotes