KTU B.Tech S5 Lecture Notes Unix Shell Programming
IT 363 Unix Shell Programming
|Introduction to Unix:- Architecture of Unix|
What is Unix ?
The Unix operating system is a set of programs that act as a link between the computer and the user.
The computer programs that allocate the system resources and coordinate all the details of the computer’s internals is called the operating system or the kernel.
Users communicate with the kernel through a program known as the shell. The shell is a command line interpreter; it translates commands entered by the user and converts them into a language that is understood by the kernel.
- Unix was originally developed in 1969 by a group of AT&T employees Ken Thompson, Dennis Ritchie, Douglas McIlroy, and Joe Ossanna at Bell Labs.
- There are various Unix variants available in the market. Solaris Unix, AIX, HP Unix and BSD are a few examples. Linux is also a flavor of Unix which is freely available.
- Several people can use a Unix computer at the same time; hence Unix is called a multiuser system.
- A user can also run multiple programs at the same time; hence Unix is a multitasking environment.
Here is a basic block diagram of a Unix system −
The main concept that unites all the versions of Unix is the following four basics −
- Kernel− The kernel is the heart of the operating system. It interacts with the hardware and most of the tasks like memory management, task scheduling and file management.
- Shell− The shell is the utility that processes your requests. When you type in a command at your terminal, the shell interprets the command and calls the program that you want. The shell uses standard syntax for all commands. C Shell, Bourne Shell and Korn Shell are the most famous shells which are available with most of the Unix variants.
- Commands and Utilities− There are various commands and utilities which you can make use of in your day to day activities. cp, mv, catand grep, etc. are few examples of commands and utilities. There are over 250 standard commands plus numerous others provided through 3rd party software. All the commands come along with various options.
- Files and Directories− All the data of Unix is organized into files. All files are then organized into directories. These directories are further organized into a tree-like structure called the filesystem.
|Features of Unix|
Some key features of the Unix architecture concept are:
- Unix systems use a centralized operating system kernelwhich manages system and process activities.
- All non-kernel software is organized into separate, kernel-managed processes.
- Unix systems are preemptively multitasking: multiple processes can run at the same time, or within small time slices and nearly at the same time, and any process can be interrupted and moved out of execution by the kernel. This is known as thread
- Files are stored on disk in a hierarchical file system, with a single top location throughout the system (root, or “/”), with both files and directories, subdirectories, sub-subdirectories, and so on below it.
- With few exceptions, devices and some types of communications between processes are managed and visible as files or pseudo-files within the file system hierarchy. This is known as everything is a file. However, Linus Torvalds states that this is inaccurate and may be better rephrased as “everything is a stream of bytes”.
The UNIX operating system supports the following features and capabilities:
- Multitasking and multiuser
- Programming interface
- Use of files as abstractions of devices and other objects
- Built-in networking (TCP/IP is standard)
- Persistent system service processes called “daemons” and managed by init or inet
Introduction to unix file system
A file system is a logical collection of files on a partition or disk. A partition is a container for information and can span an entire hard drive if desired.
Your hard drive can have various partitions which usually contain only one file system, such as one file system housing the /file system or another containing the /home file system.
One file system per partition allows for the logical maintenance and management of differing file systems.
Everything in Unix is considered to be a file, including physical devices such as DVD-ROMs, USB devices, and floppy drives.
Unix uses a hierarchical file system structure, much like an upside-down tree, with root (/) at the base of the file system and all other directories spreading from there.
A Unix filesystem is a collection of files and directories that has the following properties −
- It has a root directory (/) that contains other files and directories.
- Each file or directory is uniquely identified by its name, the directory in which it resides, and a unique identifier, typically called an inode.
- By convention, the root directory has an inodenumber of 2 and the lost+found directory has an inode number of 3. Inode numbers 0and 1 are not used. File inode numbers can be seen by specifying the -i option to ls command.
- It is self-contained. There are no dependencies between one filesystem and another.
The directories have specific purposes and generally hold the same types of information for easily locating files. Following are the directories that exist on the major versions of Unix −
|S.No.||Directory & Description|
This is the root directory which should contain only the directories needed at the top level of the file structure
This is where the executable files are located. These files are available to all users
These are device drivers
Supervisor directory commands, configuration files, disk configuration files, valid user lists, groups, ethernet, hosts, where to send critical messages
Contains shared library files and sometimes other kernel-related files
Contains files for booting the system
Contains the home directory for users and other accounts
Used to mount other temporary file systems, such as cdrom and floppy for the CD-ROM drive and floppy diskette drive, respectively
Contains all processes marked as a file by process number or other information that is dynamic to the system
Holds temporary files used between system boots
Used for miscellaneous purposes, and can be used by many users. Includes administrative commands, shared files, library files, and others
Typically contains variable-length files such as log and print files and any other type of file that may contain a variable amount of data
Contains binary (executable) files, usually for system administration. For example, fdisk and ifconfig utlities
Contains kernel files
Navigating the File System
Now that you understand the basics of the file system, you can begin navigating to the files you need. The following commands are used to navigate the system −
|S.No.||Command & Description|
Displays a filename
Moves you to the identified directory
|3||cp file1 file2
Copies one file/directory to the specified location
Identifies the file type (binary, text, etc)
|5||find filename dir
Finds a file/directory
Shows the beginning of a file
Browses through a file from the end or the beginning
Shows the contents of the directory specified
Creates the specified directory
Browses through a file from the beginning to the end
|11||mv file1 file2
Moves the location of, or renames a file/directory
Shows the current directory the user is in
Removes a file
Removes a directory
Shows the end of a file
Creates a blank file or modifies an existing file or its attributes
Shows the location of a file
Shows the location of a file if it is in your PATH
You can use Manpage Help to check complete syntax for each command mentioned here.
The df Command
The first way to manage your partition space is with the df (disk free)command. The command df -k (disk free) displays the disk space usage in kilobytes, as shown below −
$df -kFilesystem 1K-blocks Used Available Use% Mounted on/dev/vzfs 10485760 7836644 2649116 75% //devices 0 0 0 0% /devices$
Some of the directories, such as /devices, shows 0 in the kbytes, used, and avail columns as well as 0% for capacity. These are special (or virtual) file systems, and although they reside on the disk under /, by themselves they do not consume disk space.
The df -k output is generally the same on all Unix systems. Here’s what it usually includes −
|S.No.||Column & Description|
The physical file system name
Total kilobytes of space available on the storage medium
Total kilobytes of space used (by files)
Total kilobytes available for use
Percentage of total space used by files
What the file system is mounted on
You can use the -h (human readable) option to display the output in a format that shows the size in easier-to-understand notation.
The du Command
The du (disk usage) command enables you to specify directories to show disk space usage on a particular directory.
This command is helpful if you want to determine how much space a particular directory is taking. The following command displays number of blocks consumed by each directory. A single block may take either 512 Bytes or 1 Kilo Byte depending on your system.
$du /etc10 /etc/cron.d126 /etc/default6 /etc/dfs…$
The -h option makes the output easier to comprehend −
$du -h /etc5k /etc/cron.d63k /etc/default3k /etc/dfs…$
Mounting the File System
A file system must be mounted in order to be usable by the system. To see what is currently mounted (available for use) on your system, use the following command −
$ mount/dev/vzfs on / type reiserfs (rw,usrquota,grpquota)proc on /proc type proc (rw,nodiratime)devpts on /dev/pts type devpts (rw)$
The /mnt directory, by the Unix convention, is where temporary mounts (such as CDROM drives, remote network drives, and floppy drives) are located. If you need to mount a file system, you can use the mount command with the following syntax −
mount -t file_system_type device_to_mount directory_to_mount_to
For example, if you want to mount a CD-ROM to the directory /mnt/cdrom, you can type −
$ mount -t iso9660 /dev/cdrom /mnt/cdrom
This assumes that your CD-ROM device is called /dev/cdrom and that you want to mount it to /mnt/cdrom. Refer to the mount man page for more specific information or type mount -h at the command line for help information.
After mounting, you can use the cd command to navigate the newly available file system through the mount point you just made.
Unmounting the File System
To unmount (remove) the file system from your system, use the umountcommand by identifying the mount point or device.
For example, to unmount cdrom, use the following command −
$ umount /dev/cdrom
The mount command enables you to access your file systems, but on most modern Unix systems, the automount function makes this process invisible to the user and requires no intervention.
User and Group Quotas
The user and group quotas provide the mechanisms by which the amount of space used by a single user or all users within a specific group can be limited to a value defined by the administrator.
Quotas operate around two limits that allow the user to take some action if the amount of space or number of disk blocks start to exceed the administrator defined limits −
- Soft Limit− If the user exceeds the limit defined, there is a grace period that allows the user to free up some space.
- Hard Limit− When the hard limit is reached, regardless of the grace period, no further files or blocks can be allocated.
There are a number of commands to administer quotas −
|S.No.||Command & Description|
Displays disk usage and limits for a user of group
This is a quota editor. Users or Groups quota can be edited using this command
Scans a filesystem for disk usage, creates, checks and repairs quota files
This is a command line quota editor
This announces to the system that disk quotas should be enabled on one or more filesystems
This announces to the system that disk quotas should be disabled for one or more filesystems
This prints a summary of the disc usage and quotas for the specified file systems
The Unix file system has a hierarchical (or tree-like) structure with its highest level directory called root (denoted by /, pronounced slash). … Similar to the concept of the process parent–child relationship, allfiles on a Unix system are related to one another. That is, files also have a parent–child existence.
File Types in Unix: Ordinary or Regular Files, Directories, Device (Special) Files, Links, Named Pipes, and Sockets
Ordinary or Regular Files
A large majority of the files found on UNIX and Linux systems are ordinary files. Ordinary files contain ASCII (human-readable) text, executable program binaries, program data, and more.
A directory is a binary file used to track and locate other files and directories. The binary format is used so that directories containing large numbers of filenames can be search quickly.
Device (Special) Files
Device or special files are used for device I/O on UNIX and Linux systems. They appear in a file system just like an ordinary file or a directory.
On UNIX systems there are two flavors of special files for each device, character special files and block special files. Linux systems only provide one special file for each device.
When a character special file is used for device I/O, data is transferred one character at a time. This type of access is called raw device access.
When a block special file is used for device I/O, data is transferred in large fixed-size blocks. This type of access is called block device access.
A link is a tool used for having multiple filenames that reference a single file on a physical disk. They appear in a file system just like an ordinary file or a directory.
Like special files, links also come in two different flavors. There are hard links and symbolic links.
Hard links do not actually link to the original file. Instead they maintain their own copy of the original file’s attributes (i.e. location on disk, file access permissions, etc.). If the original file is deleted, its data can still be accessed using the hard link.
On the other hand, symbolic links contain a pointer, or pathname, to the original file. If the original file is deleted, its data can no longer be accessed using the symbolic link, and the link is then considered to be a stale link.
Named pipes are tools that allow two or more system processes to communicate with each other using a file that acts as a pipe between them. This type of communication is known as interprocess communication, or IPC for short.
Sockets are also tools used for interprocess communication. The difference between sockets and pipes is that sockets will facilitate communication between processes running on different systems, or over the network.
With so many different types of files, it’s often wise to identify a file’s type before performing any operation with it. The ls -l command and the file command are useful for determining file types.
Consider the long listing of the livefirelabs1 file:
-rw-rw-r– 1 student1 student1 0 Jun 27 18:55 livefirelabs1
The first character of the first field indicates the file type. In this example, the first character is a – (hyphen) indicating that livefirelabs1 is an ordinary or regular file.
Consider the long listing of the live1 file:
lrwxrwxrwx 1 student1 student1 13 Jun 27 17:57 live1 -> livefirelabs1
The first character of the first field is the letter l indicating live1 is a symbolic link.
The following is a table listing what characters represent what types of files:
– Ordinary or Regular File
c Character special file
b Block special file
l Symbolic link
p Named pipe
The file command is also helpful for determining file types. The syntax for this command is:
$ file filename
If the file is an ordinary file, the file command will also indicate what the contents of the file is.
Unix: Operations on files
Copying and moving files
The exercises in this chapter makes use of a file named “unixpast.txt”. You can download a copy of this file from here. (Right click, select “Save Link As…” from the menu, and make sure you save it in your exercisesdirectory.)
Copy file (cp)
The command cp file1 file2 makes a copy of file1 and writes it to file2.
$ mv unixpast.txt unixpast.copy
Move file or directory (mv)
The command mv file1 file2 moves (or renames) file1 to file2.
To move a file from one place to another, use the mv command. This has the effect of moving rather than copying the file.
It can also be used to rename a file, by moving the file to the same directory, but giving it a different name.
We are now going to move the file unixpast.copy to your backup directory.
First, change working directory to your exercises directory. Then, inside the exercises directory, type
$ mv unixpast.copy repository
Type ls and ls repository to see if it has worked.
Create a backup of your unixpast.txt file by copying it to a file called unixpast.copy
Removing files and directories
Remove rm (), rmdir (remove directory)
To delete (remove) a file, use the rm command. As an example, we are going to create a copy of the unixpast.txtfile then delete it.
Inside your exercises directory, type the following sequence of commands:
$ cp unixpast.txt tempfile.txt$ ls$ rm tempfile.txt$ ls
You can use the rmdir command to remove a directory (make sure it is empty first). Try to remove the repository directory. You will not be able to since Unix will not let you remove a non-empty directory.
Create a directory called emphemera in your exercises directory, using the command mkdir. Check that it exists. Then remove it using the command rmdir. Check that it is gone
Displaying the contents of a file on the screen
The command cat can be used to display the contents of a text file on the screen. Type:
$ cat unixpast.txt
As you can see, the file is longer than than the size of the window, so it scrolls past making it unreadable.
The command less writes the contents of a file onto the screen a page at a time. Type
$ less unixpast.txt
Press [space] if you want to see another page, and q if you want to quit reading. As you can see, less is more convenient than cat for viewing long files.
Using less, you can search though a text file for a keyword (pattern). For example, to search through unixpast.txt for the word “open”, type:
$ less unixpast.txt
Then, still in less, type / (slash) followed by the word to search:
As you can see, less finds and highlights the keyword. Type n to search for the next occurrence of the word.
The head command writes the first ten lines of a file to the screen.
First clear the screen then type:
$ head unixpast.txt
$ head -5 unixpast.txt
What difference did the -5 do to the head command?
The tail command writes the last ten lines of a file to the screen.
Clear the screen and type
$ tail unixpast.txt
How can you view the last 15 lines of the file?
File and directory conventions
Unix allows any character except slash (/) and NULL (\000) to be part of a file or directory name. The maximum length for a Unix file name is 255 bytes.
However, it is not a good idea to have unprintable characters in a file or directory name. Some other characters that have special meanings such as / : * ? & $ should be avoided unless you enjoy quoting them. Also, whitespaces within names are a pain on the command line. The safest way to name a Unix file is to use only ASCII alphanumeric characters, (letters and numbers), together with -(hyphen), _ (underscore) and .(dot).
|Bad filenames||Good filenames||Why bad?|
|notes-17/10/2012.txt||notes-2012-10-17.txt||contains special characters|
|load page.php||load_page.php||contains space|
A valid file name usually starts with an ASCII-letter, digit, dot, or underscore, and may end with a dot followed by a group of letters indicating the type of the file. Files consisting of PHP code may for instance be named with the ending .php (e.g. index.php). Unix does not normally care what file types you use (some others programs may care), but following this convention let you use simple commands such as ls *.php to list all PHP files in the current directory.
Directories and executable files (commands) are usually without a file type extension.
In Unix, a directory is also a file. So the rules and conventions for files apply also to directories.
File ownership is an important component of Unix that provides a secure method for storing files. Every file in Unix has the following attributes −
- Owner permissions− The owner’s permissions determine what actions the owner of the file can perform on the file.
- Group permissions− The group’s permissions determine what actions a user, who is a member of the group that a file belongs to, can perform on the file.
- Other (world) permissions− The permissions for others indicate what action all other users can perform on the file.
The Permission Indicators
While using ls -l command, it displays various information related to file permission as follows −
$ls -l /home/amrood-rwxr-xr– 1 amrood users 1024 Nov 2 00:10 myfiledrwxr-xr— 1 amrood users 1024 Nov 2 00:10 mydir
Here, the first column represents different access modes, i.e., the permission associated with a file or a directory.
The permissions are broken into groups of threes, and each position in the group denotes a specific permission, in this order: read (r), write (w), execute (x) −
- The first three characters (2-4) represent the permissions for the file’s owner. For example, -rwxr-xr–represents that the owner has read (r), write (w) and execute (x) permission.
- The second group of three characters (5-7) consists of the permissions for the group to which the file belongs. For example, -rwxr-xr–represents that the group has read (r) and execute (x) permission, but no write permission.
- The last group of three characters (8-10) represents the permissions for everyone else. For example, -rwxr-xr–represents that there is read (r) only permission.
File Access Modes
The permissions of a file are the first line of defense in the security of a Unix system. The basic building blocks of Unix permissions are the read, write, and execute permissions, which have been described below −
Grants the capability to read, i.e., view the contents of the file.
Grants the capability to modify, or remove the content of the file.
User with execute permissions can run a file as a program.
Directory Access Modes
Directory access modes are listed and organized in the same manner as any other file. There are a few differences that need to be mentioned −
Access to a directory means that the user can read the contents. The user can look at the filenames inside the directory.
Access means that the user can add or delete files from the directory.
Executing a directory doesn’t really make sense, so think of this as a traverse permission.
A user must have execute access to the bin directory in order to execute the ls or the cd command.
To change the file or the directory permissions, you use the chmod (change mode) command. There are two ways to use chmod — the symbolic mode and the absolute mode.
Using chmod in Symbolic Mode
The easiest way for a beginner to modify file or directory permissions is to use the symbolic mode. With symbolic permissions you can add, delete, or specify the permission set you want by using the operators in the following table.
|S.No.||Chmod operator & Description|
Adds the designated permission(s) to a file or directory.
Removes the designated permission(s) from a file or directory.
Sets the designated permission(s).
Here’s an example using testfile. Running ls -1 on the testfile shows that the file’s permissions are as follows −
$ls -l testfile-rwxrwxr– 1 amrood users 1024 Nov 2 00:10 testfile
Then each example chmod command from the preceding table is run on the testfile, followed by ls –l, so you can see the permission changes −
$chmod o+wx testfile$ls -l testfile-rwxrwxrwx 1 amrood users 1024 Nov 2 00:10 testfile$chmod u-x testfile$ls -l testfile-rw-rwxrwx 1 amrood users 1024 Nov 2 00:10 testfile$chmod g = rx testfile$ls -l testfile-rw-r-xrwx 1 amrood users 1024 Nov 2 00:10 testfile
Here’s how you can combine these commands on a single line −
$chmod o+wx,u-x,g = rx testfile$ls -l testfile-rw-r-xrwx 1 amrood users 1024 Nov 2 00:10 testfile
Using chmod with Absolute Permissions
The second way to modify permissions with the chmod command is to use a number to specify each set of permissions for the file.
Each permission is assigned a value, as the following table shows, and the total of each set of permissions provides a number for that set.
|Number||Octal Permission Representation||Ref|
|3||Execute and write permission: 1 (execute) + 2 (write) = 3||-wx|
|5||Read and execute permission: 4 (read) + 1 (execute) = 5||r-x|
|6||Read and write permission: 4 (read) + 2 (write) = 6||rw-|
|7||All permissions: 4 (read) + 2 (write) + 1 (execute) = 7||rwx|
Here’s an example using the testfile. Running ls -1 on the testfile shows that the file’s permissions are as follows −
$ls -l testfile-rwxrwxr– 1 amrood users 1024 Nov 2 00:10 testfile
Then each example chmod command from the preceding table is run on the testfile, followed by ls –l, so you can see the permission changes −
$ chmod 755 testfile$ls -l testfile-rwxr-xr-x 1 amrood users 1024 Nov 2 00:10 testfile$chmod 743 testfile$ls -l testfile-rwxr—wx 1 amrood users 1024 Nov 2 00:10 testfile$chmod 043 testfile$ls -l testfile—-r—wx 1 amrood users 1024 Nov 2 00:10 testfile
Changing Owners and Groups
While creating an account on Unix, it assigns a owner ID and a group ID to each user. All the permissions mentioned above are also assigned based on the Owner and the Groups.
Two commands are available to change the owner and the group of files −
- chown− The chown command stands for “change owner” and is used to change the owner of a file.
- chgrp− The chgrp command stands for “change group” and is used to change the group of a file.
The chown command changes the ownership of a file. The basic syntax is as follows −
$ chown user filelist
The value of the user can be either the name of a user on the system or the user id (uid) of a user on the system.
The following example will help you understand the concept −
$ chown amrood testfile$
Changes the owner of the given file to the user amrood.
NOTE − The super user, root, has the unrestricted capability to change the ownership of any file but normal users can change the ownership of only those files that they own.
Changing Group Ownership
The chgrp command changes the group ownership of a file. The basic syntax is as follows −
$ chgrp group filelist
The value of group can be the name of a group on the system or the group ID (GID) of a group on the system.
Following example helps you understand the concept −
$ chgrp special testfile$
Changes the group of the given file to special group.
SUID and SGID File Permission
Often when a command is executed, it will have to be executed with special privileges in order to accomplish its task.
As an example, when you change your password with the passwd command, your new password is stored in the file /etc/shadow.
As a regular user, you do not have read or write access to this file for security reasons, but when you change your password, you need to have the write permission to this file. This means that the passwd program has to give you additional permissions so that you can write to the file /etc/shadow.
Additional permissions are given to programs via a mechanism known as the Set User ID (SUID) and Set Group ID (SGID) bits.
When you execute a program that has the SUID bit enabled, you inherit the permissions of that program’s owner. Programs that do not have the SUID bit set are run with the permissions of the user who started the program.
This is the case with SGID as well. Normally, programs execute with your group permissions, but instead your group will be changed just for this program to the group owner of the program.
The SUID and SGID bits will appear as the letter “s” if the permission is available. The SUID “s” bit will be located in the permission bits where the owners’ execute permission normally resides.
For example, the command −
$ ls -l /usr/bin/passwd-r-sr-xr-x 1 root bin 19031 Feb 7 13:47 /usr/bin/passwd*$
Shows that the SUID bit is set and that the command is owned by the root. A capital letter S in the execute position instead of a lowercase s indicates that the execute bit is not set.
If the sticky bit is enabled on the directory, files can only be removed if you are one of the following users −
- The owner of the sticky directory
- The owner of the file being removed
- The super user, root
To set the SUID and SGID bits for any directory try the following command −
$ chmod ug+s dirname$ ls -ldrwsr-sr-x 2 root root 4096 Jun 19 06:45 dirname$
Introduction: Files and InodesIndex
In Unix/Linux, a file is a sequence of bytes without structure. Any necessary structure (e.g. for a database) is added by the programs that manipulate the data in the file. Linux itself doesn’t know about the internal structure of a database file – all it does is return bytes.
1.1 Even hardware devices have file namesIndex
Unix/Linux tries its best to treat every device attached to it as if it were a list of bytes. Therefore, everything, including network cards, hard drives, partitions, keyboards, printers, and plain files are treated as file-like objects and each has a name in the file system.
- Your computer memory is /dev/mem.
- Your first hard disk is /dev/sda.
- A terminal (keyboard and screen) is /dev/tty1.
$ ls -li /dev/mem /dev/sda /dev/tty15792 crw-r—– 1 root kmem 1, 1 Oct 13 02:30 /dev/mem 888 brw-rw—- 1 root disk 8, 0 Oct 13 02:30 /dev/sda5808 crw-rw—- 1 root tty 4, 1 Oct 13 02:31 /dev/tty1
Most input and output devices and directories are treated as files in Linux. If you have sufficient permissions, you can directly read all these devices using their file system names. Recent versions of Unix/Linux have evolved directories into non-readable (non-file) objects.
1.2 Index Nodes = InodesIndex
As with most things computer-related, things in the file system are not stored by name, they are stored by number. Linux stores the data and information about each disk object (e.g. a file or a directory) in a numbered data structure called an “index node” or inode.
Each inode is identified by a unique inode number that can be shown using the -i option to the lscommand:
$ ls -l -i /usr/bin/perl*266327 -rwxr-xr-x 2 root root 10376 Mar 18 2013 /usr/bin/perl266327 -rwxr-xr-x 2 root root 10376 Mar 18 2013 /usr/bin/perl5.14.2266331 -rwxr-xr-x 2 root root 45183 Mar 18 2013 /usr/bin/perlbug266328 -rwxr-xr-x 1 root root 224 Mar 18 2013 /usr/bin/perldoc266329 -rwxr-xr-x 1 root root 125 Mar 18 2013 /usr/bin/perldoc.stub266330 -rwxr-xr-x 1 root root 12318 Mar 18 2013 /usr/bin/perlivp266331 -rwxr-xr-x 2 root root 45183 Mar 18 2013 /usr/bin/perlthanks
The program /usr/bin/perl, above, is not stored on disk with its name perl; it is stored somewhere else, under inode number 266327. Unix/Linux directories are what map file system names (e.g. perl) to inode numbers (e.g. 266327). In the example above, you can see that file /usr/bin/perl is really inode number 266327 (and that another name perl5.14.2 leads to the same inode!). When you access the perl program, the system finds the perl name in a directory, paired with the inode number 266327 that holds the actual data, and then the system has to go elsewhere on disk to that inode number to access the data for the perl program. File data is stored under inode numbers, not under names.
Every file has its name entered in a directory and is assigned a unique inode number. Each file name can be mapped to only one single inode number, but one inode number may have many names (as is the case with perl, above).
Inode numbers are specific to a file system inside a disk partition. Every file on a file system (in that partition) has a unique inode number. Numbering is done separately for each file system, so different disk partitions may have file system objects with the same inode numbers.
Every Linux file system is created new with a large set of available inodes. You can list the free inodes using df -i. Older types of file systems can never make more inodes, even if there is lots of disk space available; when all the inodes are used up, the file system can create no more files until some files are deleted to free some inodes.
2 File System DiagramsIndex
Most diagrams showing file systems and links in Unix texts are wrong and range from confusing to seriously misleading. Here’s the truth, complete with an ASCII-art file system diagram below.
Names for inodes (names for files, directories, devices, etc.) are stored on disk in directories. Only the names and the associated inode numbers are stored in the directory; the actual disk space for whatever data is being named is stored in the inode, not in the directory. The names and numbers are kept in the directory; the names are not kept with the data.
In the directory, beside each name, is the index number (inode number) indicating where to find the disk space used to actually store the thing being named. You can see this name-inode pairing using ls -i:
$ ls -i /usr/bin/perl*266327 /usr/bin/perl 266329 /usr/bin/perldoc.stub266327 /usr/bin/perl5.14.2 266330 /usr/bin/perlivp266331 /usr/bin/perlbug 266331 /usr/bin/perlthanks266328 /usr/bin/perldoc
The crucial thing to know is that the names and the actual storage for the things being named are in separate places. Most texts make the error of writing Unix file system diagrams that put the names right on the things that are being named. That is misleading and the cause of many misunderstandings about Unix/Linux files and directories. Names exist one level above (separate from) the items that they name:
WRONG – names on things RIGHT – names above things======================= ========================== R O O T —> [etc,bin,home] <– ROOT directory / | \ / | \etc bin home —> [passwd] [ls,rm] [abcd0001] | / \ \ | / \ | | ls rm abcd0001 —> | <data> <data> [.bashrc] | | | |passwd .bashrc —> <data> <data>
Directories are lists of names and numbers, as shown by the square-bracketed lists in the diagram on the right, above. (The actual inode numbers are omitted from this small diagram.) The name of each thing (file, directory, special file, etc.) is kept in a directory, separate from the storage space for the thing it names. This allows inodes to have multiple names and names in multiple directories; all the names can refer to the same storage space by using the same inode number.
In the correct diagram on the right, the directories give names to the objects below them in the tree. The top directory on the right is the ROOT directory inode, containing the list of names etc, bin, and home (and others). Because there is no name level above the ROOT directory to give it a name, the ROOT directory has no name!
The line leading downwards from the name bin in the ROOT directory indicates that the name bin is paired with an inode number that is another directory inode containing the list of names in the bindirectory, including names ls and rm (and others). The line leading down from ls in the bin directory inode leads to the data inode for the file /bin/ls. There is no name kept with the data inode – the name is up in the directory above it.
The ROOT inode has no name because there is no directory above it to give it one! Every other directory has a name because there is a directory inode above it that contains its name.
3 Inodes manage disk blocksIndex
The actual data for each Unix file or directory stored on disk is managed by numbered on-disk data structures called “inodes” (index nodes). One inode is allocated for each file and each directory. Unix inodes have unique numbers, not names, and it is these numbers that are kept in directories alongside the names. The -i option to ls shows these inode numbers.
A Unix inode manages the disk storage space for a file or a directory. The inode contains a list of pointers to the disk blocks that belong to that file or directory. The larger the file or directory, the more disk block pointers it needs in the inode. Also stored in the inode are the attributes of the file or directory (permissions, owner, group, size, access/modify times, etc.); but, not the name of the file or directory. Inodes have only numbers, attributes, and disk blocks – not names. The names are kept separately, in directories.
Everything in a Unix file system has a unique inode number that manages the storage for that thing: every file, directory, special file, etc. Files and directories are both managed with inodes.
4 Directory inodes hold all the namesIndex
File system names are stored in directory inodes. The names are not kept in the same inodes with the things that they name. The name of a file or directory is not kept in the inode with the file attributes or pointers to disk blocks; the name is kept in a directory somewhere else.
Directories are what give names to inodes on Unix. Directories can be thought of as “files containing lists of names and inode numbers”. Files have disk blocks containing file data; directories also have disk blocks; but, the blocks contain lists of names and inode numbers.
Like most other inodes, directory inodes contain attribute information about the inode (permissions, owner, etc.) and one or more disk block pointers in which to store data; but, what is stored in the disk blocks of a directory is not file data but directory data (names and inode numbers).
A Unix directory is simply a list of pairs of names and associated inode numbers. That is all – the disk blocks of Unix directories contain only names and inode numbers. The rest of the attribute information about an item named in a directory (the type, permissions, owner, etc.) is kept with the inode associated with the name. You must use the inode number from the directory to find the inode on disk to read its attribute information; reading the directory only tells you the name and inode number. (Some modern Unix/Linux file systems also cache a second copy of the inode type in the directory to speed up common file system browsing operations.)
Reading a Unix directory tells you only some names and inode numbers; you know nothing about the types, sizes, owners, or modify times of those inodes unless you actually go out to the separate inode on disk and access them to read the attributes. Without actually accessing the inode, you can’t know most of the attributes of the file system object; you can’t even know if the inode is a file inode or a directory inode.
To find out attribute information of some file system object, which is stored with the inode, not in the directory, you must first use the inode number associated with the object to find the inode of the item and look at the item’s attributes. This is why ls or ls -i are much faster than ls -l:
- lsor ls -i only need to read the names and inode numbers from the directory – no additional inode access is needed because no other attributes are being queried. Reading the one directory inode is sufficient.
- ls -lhas to display attribute information, so it has to do a separate inode lookup to find out the inode attribute information for every inode in the directory. A directory with 100 names in it requires 100 separate inode lookups to fetch the attributes.
No attribute information about the things named in the directory is kept in the directory (except on those modern file systems where caching is enabled). The directory only contains pairs of names and inode numbers.
To find a thing by name, the system goes to a directory inode, looks up the name in the disk space allocated to that directory, finds the inode number associated with the name, then goes out to the disk a second time and finds that inode on the disk. If that inode is another directory, the process repeats from left-to-right along the pathname until the inode of the last pathname component (on the far right in the pathname) is found. Then the disk block pointers of that last inode can be used to find the data contents of the last pathname component.
(The storage for each directory is itself managed by an inode, so the inode for the directory itself contains attribute information about the directory, not about the things named in the directory. Use ls -ld to see the attributes of the directory inode itself.)
4.1 Damaged directories create orphansIndex
The name and inode number pairing in a Unix directory is the only connection between a name and the thing it names on disk. The name is kept separate from the data belonging to the thing it names (the actual inode on disk). If a disk error damages a directory inode or the directory disk blocks, file data is not usually lost; since, the actual data for the things named in the directory are stored in inodes separate from the directory itself. If a directory is damaged, only the names of the things are lost and the inodes become “orphan” inodes without names. The storage used for the things themselves is elsewhere on disk and may be undamaged. You can run a file system recovery program such as fsck to recover the data (but not the names).
The name of an item (file, directory, etc.) and its inode number are kept in a directory. The directory storage for that name and number is managed by its own inode that is separate from the inode of each thing in the directory. The name and number are stored in the directory inode; the data for the item named is stored in its own inode somewhere else.
5 Multiple names – hard linksIndex
Because (1) a file is managed by an inode with a unique number, (2) the name of the file is not kept in that inode, and (3) directories pair names with inode numbers, a Unix file (inode) can be given multiple names by having multiple name-and-inode pairs in one or more directories.
Inode 123 may be paired with the name cat in one directory and the same 123 may be paired with the name dog in the same or a different directory. Either name leads to the same 123 file inode and the same data and attributes. Though there appear to be two different files cat and dog in the directory, the only thing different between the two is the name – both names lead to the same inode and therefore to the same data and attributes (permissions, owner, etc.).
5.1 Link counts count names; ln creates, rm removes only a nameIndex
Multiple names for the same inode are called “hard links”. The ln command can create a new name (a new hard link) in a directory for an existing inode. The system keeps a “link count” in each inode that counts the number of names each inode has been given. The rm command removes a name (a hard link) from a directory, decreasing the link count. When the link count for an inode goes to zero, the inode has no names and the inode is recycled and all the storage and data used by the item is released.
The rm command does not remove files; it removes names for files. When all the names are gone, the system removes the file and releases the space.
6 Tracing Inodes in PathnamesIndex
When you look at a Unix pathname, remember that that the slashes separate names of pathname components. All the components to the left of the rightmost slash must be directories, including the “empty” ROOT directory name to the left of the leftmost slash. For example:
In the above example, there are three slashes and therefore four pathname components. The “empty” name in front of the first slash is the name of the ROOT directory. The ROOT directory doesn’t have a name. (Some books get around this by calling the ROOT directory “slash” or /. That is wrong. ROOT doesn’t have a name – slashes separate names.)
- Inside the ROOT directory is the name of the home
- Inside the homedirectory is the name of the alex
- Inside the alexdirectory is the name of the foobar
The last (rightmost) component of a pathname can be a file or a directory (or other); for this example, let’s assume foobar is a file name.
Below is a file system diagram written correctly, with the names for things shown one level above the things to which the names actually refer. Each box represents an inode; the inode numbers for the box are given beside the box, on the left. Inside the directory inodes you can see the pairing of names and inode numbers. (These inode numbers are made up – see your actual Unix system for the real inode numbers.) One of the inodes, #12, is not a directory; it is an inode for a file and contains the file data. The downward arrows trace two paths (hard links) to the same #12 file data, /home/alex/foobar and /home/alex/literature/barfoo:
We will trace the inodes for two pathnames in the diagram below:
Follow the downward-pointing arrows:
+—-+—–+—————————————–+#2 |. 2 |.. 2 | home 5 | usr 9 | tmp 11 | etc 23 | … | +—-+—–+—————————————–+ | The inode #2 above is the ROOT directory. It has the | name “home” in it. The *directory* “home” is not | here; only the *name* is here. The ROOT directory | itself does not have a name! V +—-+—–+—————————————————+#5 |. 5 |.. 2 | alex 31 | leslie 36 | pat 39 | abcd0001 21 | … | +—-+—–+—————————————————+ | The inode #5 above is the “home” directory. The name | “home” isn’t here; it’s up in the ROOT directory, | above. This directory has the name “alex” in it. V +—-+—–+—————————————————+#31 |. 31|.. 5 | foobar 12 | temp 15 | literature 7 | demo 6 | … | +—-+—–+—————————————————+ | The inode #31 above is | | the “alex” directory. The | | name “alex” isn’t here; | | it’s up in the “home” | | directory, above. This | | directory has the names | | “foobar” and “literature” | | in it. | | V +—-+—–+–|——————————————-+#7 |. 7 |.. 31| | barfoo 12 | morestuf 123 | junk 99 | … | +—-+—–+–|——————————————-+ | | The inode #7 above is the “literature” directory. | | The name “literature” isn’t here; it’s up | | in the “alex” directory. This directory has | | the name “barfoo” in it. | | V V *———–* This inode #12 on the left is a file inode. | file data | It contains the data blocks for the file. #12 | file data | This file happens to have two names, “foobar” | file data | and “barfoo”, but those names are not here. *———–* The names of this file are up in the two directories that point to this file, above.
The pathname /home/alex/foobar starts at the nameless ROOT directory, inode #2. It travels through two more directory inodes and stops at file inode #12. Using all four inode numbers, /home/alex/foobar could be written as #2->#5->#31->#12.
The pathname /home/alex/literature/barfoo starts at the ROOT inode and travels through three more directory inodes. It stops at the same #12 file inode as /home/alex/foobar. Using all five inode numbers, /home/alex/literature/barfoo could be written as #2->#5->#31->#7->#12.
Thus, /home/alex/foobar and /home/alex/literature/barfoo are two pathnames leading to the same inode #12file data. The names foobar and barfoo are two names for the same file and are called “hard links”.
7 Tracing Pathname 1: /home/alex/foobarIndex
Let’s examine each of the above inodes.
The box below represents the layout of names and inode numbers inside the actual disk space given to the nameless ROOT directory, inode #2:
+—-+—–+—————————————–+#2 |. 2 |.. 2 | home 5 | usr 9 | tmp 11 | etc 23 | … | +—-+—–+—————————————–+
The above ROOT directory has the name home in it, paired with inode #5. The actual disk space of the directory home is not here; only the name home is here, alongside of its own inode number #5. To read the actual contents of the home directory, you have to find the disk space managed by inode #5somewhere else on disk and look there.
The above ROOT directory pairing of home with inode #5 is what gives the home directory its name. The name home is separate from the disk space for home. The ROOT directory itself does not have a name; because, it has no parent directory to give it a name!
The ROOT directory is the only directory that is its own parent. If you look at the ROOT directory above, you will see that both the name . and the name .. in this ROOT directory are paired with inode #2, the inode number of the ROOT directory. Following either name . or .. will lead to inode #2 and right back to this same ROOT inode.
Let us move to the storage space for the home directory at inode #5.
The box below represents the layout of names and inode numbers inside the actual disk space given to the home directory, inode #5:
+—-+—–+—————————————————+#5 |. 5 |.. 2 | alex 31 | leslie 36 | pat 39 | abcd0001 21 | … | +—-+—–+—————————————————+
The name home for this inode isn’t in this inode; the name home is up in the ROOT directory. This homedirectory has the name alex in it, paired with inode #31. The directory alex is not here; only the name alex is here. To read the alex directory, you have to find inode #31 on disk and look there. (In fact, until you look up inode #31 and find out that it is a directory, you have no way of even knowing that the name alex is a name of a directory!)
Let us move to the storage space for the alex directory at inode #31.
The box below represents the layout of names and inode numbers inside the actual disk space given to the alex directory, inode #31:
+—-+—–+—————————————————+#31 |. 31|.. 5 | foobar 12 | temp 15 | literature 7 | demo 6 | … | +—-+—–+—————————————————+
The name alex for this inode isn’t in this inode; the name alex is up in the home directory. This alexdirectory has the name foobar in it, paired with inode #12. The file foobar is not here; only the name foobar is here. To read the data from file foobar, you have to find inode #12 on disk and look there. (In fact, until you look up inode #12 and find out that it is a plain file, you have no way of even knowing that the name foobar is a name of a plain file!)
Let us move to the storage space for the foobar file at inode #12.
The box below represents the actual disk space given to the foobar file, inode #12:
*———–*#12 | file data | *———–*
The name foobar for this inode isn’t in this inode; the name foobar is up in the alex directory. This foobarinode is a file inode, not a directory inode, and the attributes of this inode will indicate that.
The inode for a file contains pointers to disk blocks that contain file data, not directory data. There are no special directory names . and .. in files. There are no names here at all; the disk block pointers in this inode point to just file data (whatever is in the file).
This completes the inode trace for /home/alex/foobar: #2->#5->#31->#12
8 Tracing Pathname 2: /home/alex/literature/barfooIndex
Let’s now trace the inode path for the name /home/alex/literature/barfoo. This pathname is a “hard link” to /home/alex/foobar; both the foobar and barfoo names point to the same inode number. Let’s see how:
The trace from ROOT through /home/alex is the same as before. Things change in our second trace because of /home/alex/literature. If we look at the alex directory inode #31 we see that the name literature is paired with inode #7:
+—-+—–+—————————————————+#31 |. 31|.. 5 | foobar 12 | temp 15 | literature 7 | demo 6 | … | +—-+—–+—————————————————+
The alex directory inode #31 above says to follow the trail to the literature name we must go to inode #7. (We won’t know whether the #7 inode for literature is a file or a directory until we get there!)
The box below represents the layout of names and inode numbers inside the actual disk space given to the literature directory, inode #7, which turns out to be a directory:
+—-+—–+———————————————+#7 |. 7 |.. 31| barfoo 12 | morestuf 123 | junk 99 | … | +—-+—–+———————————————+
The name literature for this inode isn’t in this inode; the name literature is up in the alex directory inode #31. This literature directory inode #7 has the name barfoo in it, paired with inode #12. The actual data for the thing that is barfoo is not here; only the name barfoo is here. You will recall that we have seen inode #12 in the previous trace.
Above, in the alex directory (inode #31), inode #12 was also paired with the name foobar. In the literature directory (inode #7), inode #12 is paired with the name barfoo. Inode #12 has two different names; names foobar and barfoo are both hard links to the same inode #12:
$ ls -i /home/alex/foobar /home/alex/literature/barfoo12 /home/alex/foobar 12 /home/alex/literature/barfoo
Two names means the “link count” of inode #12 is set to “two”. Both names lead to the same #12 inode and thus to the same data and same attributes. This is one single file with two names. A change to the file data using the name foobar changes the data in inode #12. That changes file data for the name barfoo too; because, foobar and barfoo are two names for the same #12 inode storage – they are two names that point to the same storage inode.
Everything about data inode #12 except its name is kept with the inode. The only thing different in a long listing of foobar and barfoo will be the names; everything else (file type, permissions, owner, group, link count, size, modification times, etc.) is part of inode #12 and must therefore be identical for the two names. Neither name is more “original” than the other; both names have equal status. To release the #12 inode storage, you have to delete both names (so the link count drops to zero).
9 Path TraversalIndex
Let’s use the above inode data to follow a valid path such as:
Start on the left and walk the tree to the right. To be a valid Unix path, everything to the left of the rightmost slash must be a directory. (Thus, ROOT, home, alex, and literature must be directories, if this is a valid pathname.)
Start with the nameless ROOT directory in front of the first slash (ROOT doesn’t have a name, since it does not appear in any parent directory) and look for the first pathname component (home) inside that directory (inside inode #2).
Let’s trace the pathname:
Look in the ROOT directory (located in inode #2) for the name of the first pathname component: home. We find the name home inside the ROOT directory, paired with inode #5. Go back out to the disk to find inode #5 that is the actual home directory.
Note how the names are separate from the things they name. The actual directory inode #5 of the home directory is not the same as the inode #2 of the ROOT directory that contains the directory name home. The name is stored in a different place (#2) than the thing it names (#5).
In inode #5, the directory that has the name home, look for the name alex. We find alex paired with inode #31. Go back out to the disk to find inode #31 that is the actual alex directory. Again, the name alex is contained in directory inode #5 (home) and that name is stored separately from inode #31 that is the actual alex directory.
In inode #31, the directory that has the name alex, look for the name literature. We find literaturepaired with inode #7. Go back out to the disk to find inode #7 that is the actual literature directory. Again, the name literature is contained in directory inode #31 (alex) and that name is stored separately from the inode #7 that is the actual literature directory.
In inode #7, the directory that has the name literature, look for the name barfoo. We find it paired with inode #12. Go back out to the disk to find inode #12 that is the actual data of the file barfoo. Again, the name barfoo is contained in directory inode #7 (literature) and that name is stored separately from the inode #12 that is the actual data of the file. The name of a file is not part of the inode that makes up the actual file data.
9.1 Permissions on data vs. permissions on directories
You now have found the disk node (inode) that is your file data: inode #12. The name of this file, barfoo, is stored up in inode #7 that is the literature directory. The name is separate from the data it names.
If file data inode #12 has appropriate permission attributes, you can read or write the data in the file. It is the permission attributes on the inode containing the file data that govern what you can do with the data. The permissions on the inode of the directory containing the name of the file (directory inode #7) don’t control what you can do with the data of the file.
If the any of the inodes of the directories leading down to the file inode #12 don’t give you search permission, you won’t be able to reach the file’s data inode that way and won’t be able to access the file’s data using those directories; but, perhaps some other directories may lead you to the same inode #12, if the file has another name.
To access and read the data in a file path such as:
you need appropriate search permissions on the ROOT directory inode, the home directory inode, the alexdirectory inode, the literature directory inode, and finally read permissions on the barfoo file data inode #12.
It is the barfoo file data inode #12 permissions that determine whether or not you can read or change the data of the file. Reading or changing the data in the file requires permissions on the inode #12 that contains the data blocks of the file itself.
It is the literature directory inode permissions (inode #7) that determine what you can do with the nameof the file, because the literature directory (inode #7) is where the name barfoo is kept. Changing, linking to, or removing the name of a file operates on the inode of the directory in which the file name appears; altering the name has nothing to do with reading or changing the inode that contains the data blocks of the file itself.
You can have no permissions on the inode that contains the data blocks of the file itself (it may even be owned by some other user) and still you may be able to rename or remove the name of the file from a directory on whose inode you do have permissions. The name(s) of a file is(are) stored in separate inodes from the data blocks of the file.
Names are separate from the things that they name. The permissions of the names are also separate from the permissions of the data.
Changing a name only requires write/execute permissions on a directory. No permissions are needed on the inode of the thing being renamed. Changing the content of a file only requires write permissions on the data inode of the file itself, not on the directory that holds the name of the file.
10 Links and Directories
- Normally when you do ls -l diryou see the permissions of the contents of the directory, not the directory itself. What command and options are needed to see the access permissions and link count of a directory, instead of the contents of a directory? (RTFM)
- When you are inside a directory, what is the name you use to refer to the directory itself? (This name works inside any directory.) What name always refers to the unique parent directory?
- How many links (names) does a brand new, empty directory have? Why isn’t it just one link, as it is for a new file? (In other words, why does a new file have one link and a new directory have more than that?)
- Why does creating a sub-directory in a directory cause the directory’s link (name) count to increase by one for every sub-directory created? (Recall that a link count is a count of names.)
- Why doesn’t the link (name) count of the directory increase when you create files in the directory?
- Give the Unix command and its output that shows the inode number and owners of the following directories:
- your current directory
- your parent directory
- your HOME directory
- the directory named /home
- the ROOT directory
- the directory named /root
Note: Show only one line of output for each single directory; do not show the contents of the directory. Use a command (and options) that will show only the directory itself, not its contents. (RTFM)