Compliation process

Process Graph

General Description

  • General Compilation (1,2,3)
    • translates source code file to machine code file respectively
    • leaves undefined functions/symbols to be filled in by linker
  • Link
    • links the object code with the library code to produce an executable file

Specific Description

  1. Preprocessing -E
  • Removal of Comments, Expansion of Macros, Expansion of the included files.
  • The lines in our code that begin with the “#” character are preprocessor directives.
  1. Compilation -S
  • translates the preprocessed code to assembly instructions specific to the target processor architecture.
  1. Assembly -c
  • translates the assembly instructions to object code. The output consists of actual instructions to be run by the target processor.
  • leaves the addresses of the external functions undefined, to be filled in later by the Linker.
  • The contents of output file is in a binary format and can be inspected using hexdump or od
  1. Linking -o
  • fill in the addresses of all the external functions (to be called) with the actual definitions
  • combine object files and libraries into a single executable file, make program run.

Executable file: Can be loaded (copied) into memory and executed

linking Type

static linking:

when

at compiling time , has two major tasks:

  • Symbol resolution: It associates each symbol reference with exactly one symbol definition .Every symbol have predefined task.
  • Relocation: It relocate code and data section and modify symbol references to the relocated memory location.

static lib

The linker copy all static library .a (unix), .lib(windows) used in the program into executable image.

features

  • pros:
    1. does not require the presence of library on the system when it is run
    2. faster and more portable
    3. less error chance.
  • cons:
    1. more space of both memory and executable file.

dynamic linking

when

  • load-time (when program is loaded into memory and executed by the loader)
    • usually for fixed functionality (e.g. C run-time library)
  • run-time (load a dynamic library when need it)
    • pros:
      • more dynamic functionality such as plugin loading through LoadLibrary() API;
      • lazy mode: speed up program startup
    • cons: need to manage lib loading/freeing and function lookup manually

dynamic(shared) lib

Multiple processes could load the same dynamic library .so(linux), .dll(windows), .dylib(macosx). There are only one physical copy of the library code in system memory. Every process can have access to that library code at any virtual address it likes.

features

  • pros:
    1. less space of both memory and executable file.
    2. easy library update
  • cons:
    1. more chances of error and failure

ELF Format

  • load to memory
    • .text : program code
    • .rodata : const variable, const string
    • .data : initialized global and static variables
    • .bss: uninitialized global and static variables
  • only in relocated ELF
    • symtab: symbol table
    • .ref.text : relocation info for .text section (addresses of instructions that will need to be modified in the executable)
    • .ref.data: relocation info for .data section (addresses of pointer data that will need to be modified in the merged executable)

Reference