分布式系统 - 0 OS
Index
- Overview
- File system
- Process Management
- CPU Scheduling
- Memory Mangement
- I/O System
- Socket
- Ubuntu Linux Cmd
Overview
Simplicity: the most important characteristics.
File System
The most important role of the system
- Block and Fragments
- File Type
- i-number, i-list, i-node
- File structure
- Mount table
- Layout
- Free BSD File System API
Block and Fragments
Hardware disk sector is usually 512 bytes.
4.2BSD use two block sizes for one files, a large block and a smaller fragment. Size of block:fragment is 8:1. Min block size is 4K. Generally, 8K:1K and 4K:0.5K are seperately for big or small file environment.
File types:
- ordinary disk files
- dir
- special files
File existes independently of dir. Dir is use to map file name and file. Dir entry includes file name <-> i-number. In FreeBSD, length of file name is variable up to 255 bytes, so dir entry also include the entry length information.
But dir must appear as an entity in exactly one other dir except root, so form a strictly rooted tree.
i-number, i-list, i-node
i-number is index of i-list. i-list stored in a known part of device. i-list stores i-node.
Info saved in i-node includes: file owner's user id/group id, protection bits, physical disk addr (normal file: 13 device addr, 1-4 level 128 512byte block addressing (In FreeBSD, there are 15 device addr); special file: device name: device type + device number), file size, time of creation, last use and modification, number of links to the file, file type, etc.
File structure
Store file's device, i-number, read/write pointer. File structure table exites in system space, shared by processes. Per process, there is a process open files table in system space, key is the file descriptor (counter), stores file structure table index.
Mount table
Maintain mapping device name and i-number of the mount point and device numer of the mounted device file. Operated by mount(). In FreeBSD, a bit in i-node indicates the i-node has a file system mounted on it.
Misc
Each dir always has at least two entries. "." and "..".
Hard link is just same as the real file.
No link may exist between one file system and other mounted file system.
In FreeBSD, symbolic link infinite loop is prevented by counting the number of symbolic links encoutered during path-name search. Max value is 8.
set-user-ID bit: tenth protection bit of file.
create() truncate an existed file to zero length and open the file for writing.
No lock in file system. It is unnecessary and unsufficient.
Reading and writing on one file are sequential, i.e. next I/O shall continue from the seek position of last I/O.
when read return n = 0, file end reached.
lseek return the actual offset from the file beginning.
Layout
Each file system's first sector is the boot block, which possibly containing a primary bootstroop program, which may call a 2nd bootstap program residing in the next 7.5K.
Then ther is superblock. Superblock contains static parameters of the file system, e.g. size, block/fragment size, allocation parameters, free block list. In FreeBSD, for speed, superblock is kept in memory and sync to disk every 30s.
4.3BSD introduces cylinder group to allow allocation localization. Each cylinder group has same superblock for fault-tolerence, cylinder block including free data block and free inodes bit map and allocation statistics, inode array and data blocks.
i-node array immediately following superblock. It is also kept in memory for speed.
Free BSD File System API
creat, open, read, write, close, unlink, trunc, lseek, dup, dup2, fcntl, ioctl, stat, rename, chmod, chown, link, symlink,
dir has another set of system calls, mkdir, rmdir, cd, opendir, readdir, closedir, etc.
Process Management
- Image & Process
- API
- Signals
- Process Group (job)
- PCB
Image & Process
image is an execution environment, including memory image, register value, open files, current dir, etc. Process is the execution of image.
Sys data structure and user space consists the swappable process image.
API
fork() make two same processes with same copies of memory image and opened files list. Parent and child is differentiated by the return value: 0 is child, for parent, it is the process id of the child process.
pipe() can be passed to child with fork() for communicate among these processes. Pipe size is usually 4K bytes.
execute/execve() replaces process code and data segments with the executed file, but open files list, current dir, pipes are not changed.
wait() wait the first occurence of exit() of child processes.
In Free BSD, API includes: fork, execve(replace process memory space with a new program), exit, wait, wait3(allow parent collect performance statistics about the child), getuid, geteudi, getgid, getegid, getgroups. also include other Info Manipulation related API: getitimer, setitimer, gettimeofday, settimeofday, getpid, getgid, gethosname.
Signals
20 kinds. interrupt - SIGINT, quit - SIGQUIT (generate a core), SIGILL (illegel instruction), SIGSEGV (access illegal memory space).
SIGKILL can not be ignored or caught by a signal handler. Others can be.
No priority among signals. Multiple same signals shall trigger handler only one time.
Other signaling: SIGWINCH-window change size,
kill() can generate signal.
Process Group (job)
Child inherits process group propriety from parent. setpgrg() can change group.
Processes in a group can communicate by pipe. Signals can be sent to all member of a group, e.g. SIGTTIN, SIGTTOU, SIGCONT, SIGSTOP.
PCB
Process Structure
Process structure is refered when decide process swap in/out. includes pid, priority, pointers to parent and youngest living child, pointer to text strucutre, pointer to page tables, and pointers to other control blocks. Scheduler maintains the ready queue as a doubly linked list of process structure. It is always in memory.
Text Structure
Text structure incudes page table when swap out and pointer to the process list that sharing this text segment. It is always in memory.
Page tables map process virtual memory to physical memory.
System Data Structure
user structure or u structure is mapped read-only in process virtual address space, so it is readable by user process, but only writeable by kernel. When process is swapped in, the content is useful, which includes a copy of PCB which stores a lot of information, including process's general registers, stack pointer, program counter, page-table base registers. system call parameters and return values, associated uid and gid, signals, timer, quota structure, current dir, and open files table.
When a system call is made, process enters into system mode from user mode. In system mode, a kernel stack is used instead of user stack. Kernel stake follows u structure and compose the system data structure for the process.
User Space
User memory part of an image includes three segments, text, data, stack. Stack grows downward from the highest address. text is usually read-only, but debugger can write it for insertion of breakpoint.
fork() & vfork()
fork() copy the u structure. so keep the open files table, etc. info, but fork() creates a new page table, allocate new memory for data and stack segment and copy them from parent process.
vfork() just use parent's page table and avoids the copy, in this case, parent shall suspend and wait child call execve or terminate so that it don't change the memory that child needs. and generally child shall also call execve immediately to change its virtual addr space completely. Note child also shoudn't change memory during this interval.
CPU Scheduling
Larger priority number means lower priority.
Disk I/O and other key task has priority less than "pzero" and cannot be killed by signals. nice() can change user process priority.
Process aging is used.
round-robin reschedule every 0.1s and recompute priority every 1s.
Memory Mangement
Demand paging is used. Page size is selected to optimally align processor's L1 and L2 caches.
Page is locked during page fetching for avoding page replacement.
Second chance page replacement algorithm: sweep non kernel main memory map, check page table entry of the frame. If entry is invalid, add it to free list, otherwise made it invalid and relaimable, so have 2nd chance. BSD Tahoe use reference bit for this goal.
It is configurable that min data pages and max used memory of a process.
Kernel check the free frames threshold (1/4 of total memory) several time per second to see if need awaken pagedaemon process (process 2). (swapper is process 0 and init is process 1). pagedaemon can stop sweep when free frames threshold is reached again. It is also configurable to control it not use more than 10% of CPU.
Memory also affects process schedule. If load is high, process's free memory is low and average available memory is low, the process shall be swapped out.
Minfree pages must be kept for interrupt processing.
I/O System
I/O sys consists of buffer cache sys, general device driver, and specific HW driver.
Three kinds: block devices(disk/tape, 512bytes/block, using block buffer cache, which size is usually from 1K to 8K, need align the paging-cluster size, usually 1K bytes), character devices(terminal, memroy, etc. note terminal or like devices use C-lists buffer for character buffer, which is linked list with size smaller than block buffer cache, usually 28bytes), socket interface.
Buffer cache consists of many buffer headers, which includes corresponding memory pointer, device number and device block number.
Empty buffer's header are organized by three lists, LRU(Least Recently Used) list, AGE list(not used or invalid), EMPTY list(no memory associated).
When read/write, cache uses empty buffer from AGE list firstly. If AGE list is empty, sync a buffer in LRU list to disk and then use it. Write is also done periodically.
EMPTY list is used for buffer of multiple, smaller blocks in one 8K memory block.
Dir block are written synchronously for avoding chaos.
Raw device interface simply maintain a queue of pending transfer in disk driver. Each record include r/w indicator, main memory addr(usually in 512byte increments), device addr(usually a disk sector addr) and transfer size(in sectors).
For C-lists device, read generally use a raw queue and another canonical queue. Only at the end of line, the end-of-line character shall trigger the data conversion from raw queue to cannonical queue and then send to user process. Raw mode skip this conversion and make program react to every keystroke.
Socket
shutdown() can terminate one direction of communication.
select() can be used to multiplex r/w of several file and/or socket.
Ubuntu Linux Cmd
make program > & errs
dpkg -l
/etc/apt/sources.list
apt-cache search
apt-cache show
sudo apt-get updates
sudo apt-get install
quota -u ubuntu
tail
zgrep
du -h . | sort -nr | less
df -h
baobab
sudo /sbin/hdparm /dev/hdc | grep dma
/etc/hdparm.conf
sudo dhclient
sudo lsmod
sudo mount /dev/cdrom -t iso9660 -r /cdrom
sudo eject
sudo crontab -e
sudo mount -a
top
ps ax
sudo fdisk -l
free -ml
iwconfig
lshal
lspci
lsusb
lshw
sudo deluser a -remove-home
man -k
man -f
sudo updatedb
sudo locate stdio.h
/etc/protocols
/etc/rpc
/etc/services
touch
make -n
gcc -E
gcc -c
gcc -g
indent
Reference:
- "The UNIX TimeSharing System*, D. M. Ritchie and K. Thompson, 1974
- appendix of Operating Systems Concepts book by Silberschatz, Galvin, and Gagne
- Linux Unleashed, Third Edition, Tim Parker, ISBN: 0672313723
- The Official Ubuntu Book, By Benjamin Mako Hill, Jono Bacon, Corey Burger, Jonathan Jesse, Ivan Krstic, Prentice Hall, August 11, 2006, ISBN-10: 0-13-243594-2, ISBN-13: 978-0-13-243594-9

0 Comments:
发表评论
<< Home