Linux IO Model

Buffered IO (default)`

alias: normal IO

Read

A read process could be divided into 2 stages:

  1. Waiting for the data (from disk or network) to be ready in kernel page cache ( load disk data through DMA )
  2. Copying the data from the kernel to the process

Write

Write back Model.

Direct IO

Read

Only 1 stage: loading data into process space by DMA

Write

direct write to disk.

Comparsion

  • Buffered beats Direct
    1. decoupling disk and process
    2. reduce IO reads
  • Direct beats Buffered
    1. self-caching applications (e.g. database) work with their own cache management.
    2. Avoid memory copy between kernel space and user space

IO Model

We talk about the IO model about buffered Reading in the following.

IO Model Matrix

Blocking(阻塞)Non-blocking(非阻塞)
Synchronous(同步)1.Blocking IO(default socket, file read/write)
2. non-blocking IO
Asynchronous(异步)3. IO multiplexing (select, epoll, poll)4.AIO

Block vs Sync

They are discussed on two indepedent dimensions.

Sync / Async

Determined by communication (function call is request, return value is response).

  1. Sync: Each function call returns a response.
  2. Aysnc: Each function call returns with nothing. The response will be sent back later.

blocking / non-blocking

Determined by whether the process need to wait

  1. blocked: process need to wait until function completed
  2. unblocked: process could do other things

Typical IO Model

Blocking IO (Blocking + Sync)

The application blocks until the system call is complete (data transferred or error).

The two stages are blocked.

e.g.
  • socket
  • stream IO
  • normal read/write
pros & cons
  • pros:
    • no delay
    • easy for developing
  • cons:
    • inefficient

Non-blocking IO (non-Blocking + Sync)

This model requires numerous calls (polling 轮循) to await completion.

e.g.
  • java NIO
  • read/write with O_NONBLOCK flag
pros & cons
  • pros
    • do other things when waiting for data ready
  • cons
    • larger latency because it can’t not read data immediatly when data is ready in kernel.

IO multiplexing (Block + Async)

similiar to nonblocking IO, the only difference is that “other things” is listening other IO channels.

e.g.

select, poll, epoll

pros & cons
  • pros
    • single thread listens multiple IO channels, no context switch overhead
  • cons
    • additional system call select for each read

Asynchronous non-blocking I/O (non-Blocking + Async)

The read request returns immediately, indicating that the read was successfully initiated. The application can then perform other processing while the background read operation completes. When the read response arrives, a signal or a thread-based callback can be generated to complete the I/O transaction.

e.g.

AIO(linux)

Signal-Driven IO (half-Blocking + Async)

uncommon model.

Only stage 2 is blocked.

Strictly speaking, it’s could be regarded as block IO.

Summary

Reference