USTAR Format
This section was original adapted from the IBM AIX manpages, and covers
what I believe to be the ustar format.
Tape Archive Header Block
Every tar archive is composed of several "files". Each file has
a header block describing the file, followed by
zero or more blocks that give the contents of the file. The
end-of-archive indicator consists of
two blocks filled with binary zeros. Each block is a fixed
size of 512 bytes.
Blocks are grouped for physical I/O based on the "blocking factor".
These groups can be written using a single write(2) call. On magnetic
tape, the result of this write is a single tape record (sometimes
called a tape block; confusing isn't it?). The last record is
always a full 512 bytes. Blocks after the end-of-archive zeros contain
undefined data.
The header block structure is shown in the following table. All lengths
and offsets are in decimal, and all lengths are in bytes. All times are
in Unix standard format (seconds since epoch, UTC).
Field Name |
Offset |
Length |
Contents |
name |
0 |
100 |
filename (no path, no slash) |
mode |
100 |
8 |
file mode |
uid |
108 |
8 |
file mode |
gid |
116 |
8 |
group id |
size |
124 |
12 |
length of file contents |
mtime |
136 |
12 |
last modification time |
cksum |
148 |
8 |
file and header checksum |
typeflag |
156 |
1 |
file type |
linkname |
157 |
100 |
linked path name or file name |
magic |
257 |
6 |
format representation for tar |
version |
263 |
2 |
version representation for tar |
uname |
265 |
32 |
user name |
gname |
297 |
32 |
group name |
devmajor |
329 |
8 |
major device representation |
devmajor |
337 |
8 |
minor device representation |
prefix |
345 |
155 |
path name, no trailing slashes |
Names are preserved only if the characters are chosen from the POSIX
portable file-name character set or if the same extended character
set is used between systems. During a read operation, a file can be
created only if the original file can be accessed using the open,
stat, chdir, fcntl, or opendir subroutine.
Header Block Fields
Each field within the header block and each character on the archive
medium are contiguous. There is no padding between fields. More information
about the specific fields and their values follows:
- name
- The file's path name is created using this field, or by using
this field in connection with the prefix field. If the prefix field
is included, the name of the file is prefix/name. This field is null-terminated
unless every character is non-null.
- mode
- Provides 9 bits for file permissions and 3 bits for SUID, SGID,
and SVTX modes. All values for this field are in octal. During a read
operation, the designated mode bits are ignored if the user does not
have equal (or higher) permissions or if the modes are not supported.
Numeric fields are terminated with a space and a null byte. The tar.h
file contains the following possible values for this field:
Flag |
Octal |
Description |
TSUID |
04000 |
set user ID on execution |
TSGID |
02000 |
set group ID on execution |
TSVTX |
01000 |
reserved |
TUREAD |
00400 |
read by owner |
TUWRITE |
00200 |
write by owner |
TUEXEC |
00100 |
execute or search by owner |
TGREAD |
00040 |
read by group |
TGWRITE |
00020 |
write by group |
TGEXEC |
00010 |
execute or search by group |
TOREAD |
00004 |
read by others |
TOWRITE |
00002 |
write by others |
TOEXEC |
00001 |
execute or search by other |
- uid
- Extracted from the corresponding archive fields unless a user
with appropriate privileges restores the file. In that case, the field
value is extracted from the password and group files instead. Numeric
fields are terminated with a space and a null byte.
- gid
- Extracted from the corresponding archive fields unless a user
with appropriate privileges restores the file. In that case, the field
value is extracted from the password and group files instead. Numeric
fields are terminated with a space and a null byte.
- size
- Value is 0 when the typeflag field is set to LNKTYPE. This field
is terminated with a space only.
- mtime
- Value is obtained from the modification-time field of the stat
subroutine. This field is terminated with a space only.
- chksum
- On calculation, the sum of all bytes in the header structure
are treated as spaces. Each unsigned byte is added to an unsigned
integer (initialized to 0) with at least 17-bits precision. Numeric
fields are terminated with a space and a null byte.
- typeflag
- The tar.h file contains the following possible values for
this field:
Flag |
Value |
Description |
REGTYPE |
'0' |
regular file |
AREGTYPE |
'\0' |
regular file |
LNKTYPE |
'1' |
link |
SYMTYPE |
'2' |
reserved |
CHRTYPE |
'3' |
character special |
BLKTYPE |
'4' |
block special |
DIRTYPE |
'5' |
directory (in this case, the size field has no meaning) |
FIFOTYPE |
'6' |
FIFO special (archiving a FIFO file archives its existence, not contents) |
CONTTYPE |
'7' |
reserved |
If other values are used, the file is extracted as a regular file
and a warning issued to the standard error output. Numeric fields
are terminated with a space and a null byte.
The LNKTYPE flag represents a link to another file, of any type, previously
archived. Such linked-to files are identified by each file having
the same device and file serial number. The linked-to name is specified
in the linkname field, including a trailing null byte.
- linkname
- Does not use the prefix field to produce a path name. If
the path name or linkname value is too long, an error message is returned
and any action on that file or directory is canceled. This field is
null-terminated unless every character is non-null.
- magic
- Contains the TMAGIC value, reflecting the extended tar archive
format. In this case, the uname and gname fields will contain the
ASCII representation for the file owner and the file group. If a file
is restored by a user with the appropriate privileges, the uid and
gid fields are extracted from the password and group files (instead
of the corresponding archive fields). This field is null-terminated.
TMAGIC is equal to the string "USTAR", null-terminated.
- version
- Represents the version of the tar command used to archive
the file. This field is terminated with a space only.
- uname
- Contains the ASCII representation of the file owner. This field
is null-terminated.
- gname
- Contains the ASCII representation of the file group. This field
is null-terminated.
- devmajor
- Contains the device major number. Terminated with a space
and a null byte.
- devminor
- Contains the device minor number. Terminated with a space
and a null byte.
- prefix
- If this field is non-null, the file's path name is created
using the prefix/name values together. Null-terminated unless every
character is non-null.