Proofread docs, move filename reads into helper function load_filename
This commit is contained in:
parent
30fcaa8e07
commit
4b0318036f
37
README.md
37
README.md
@ -17,13 +17,14 @@ Contents
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
For a more full specification of the format, please see [Format.md](Format.md)
|
For a more full specification of the format, please see [Format.md](Format.md)
|
||||||
- No padding between metadata fields or data segments so it only stores the data
|
- No padding between metadata fields or data segments. Only the data required to
|
||||||
required to recreate the original file
|
recreate the original file is stored.
|
||||||
- Optional inline checksumming using a choice of md5, sha1 or sha256 algorithms
|
- Optional inline checksumming using a choice of md5, sha1 or sha256 algorithms
|
||||||
to ensure data integrity
|
to ensure data integrity
|
||||||
- Easily parallelized library code
|
- Easily parallelized library code
|
||||||
- Uses generic `Read` and `Write` interfaces from Rust `std` to support reading
|
- Uses generic `Read` and `Write` interfaces from Rust `std` to support reading
|
||||||
archive nodes from anything that can supply a stream of data
|
archive nodes from anything that can supply a stream of data. This could be a
|
||||||
|
file, or it could be stdin/stdout, or a network connection.
|
||||||
|
|
||||||
## Building
|
## Building
|
||||||
The minimum supported Rust version (MSRV) for this project is currently Rust 1.65.
|
The minimum supported Rust version (MSRV) for this project is currently Rust 1.65.
|
||||||
@ -39,7 +40,7 @@ git = "https://codeberg.org/jeang3nie/haggis.git"
|
|||||||
The `parallel` feature enables parallel file operations via
|
The `parallel` feature enables parallel file operations via
|
||||||
[Rayon](https://crates.io/crates/rayon). When creating an archive, files will be
|
[Rayon](https://crates.io/crates/rayon). When creating an archive, files will be
|
||||||
read and checksummed in separate threads and the data passed back to the main
|
read and checksummed in separate threads and the data passed back to the main
|
||||||
thread for writing an archive. During extraction, the main thread reads the
|
thread for writing the archive. During extraction, the main thread reads the
|
||||||
archive and passes each node to a worker thread to verify it's checksum and write
|
archive and passes each node to a worker thread to verify it's checksum and write
|
||||||
the file to disk.
|
the file to disk.
|
||||||
|
|
||||||
@ -73,10 +74,12 @@ easy packaging. This feature leverages the
|
|||||||
The venerable Unix archiver, Tar, has the benefit of being ubiquitous on every Unix
|
The venerable Unix archiver, Tar, has the benefit of being ubiquitous on every Unix
|
||||||
and Unix-like operating system. Beyond that, tar is a rather clunky format with a
|
and Unix-like operating system. Beyond that, tar is a rather clunky format with a
|
||||||
number of design flaws and quirks.
|
number of design flaws and quirks.
|
||||||
- The original Tar specification had a hard limit in path names of 100 bytes
|
- The original Tar specification had a hard limit in path names of 100 bytes.
|
||||||
- The Ustar revision of the original Tar specification only partially fixed the
|
- The Ustar revision of the original specification only partially fixed the 100
|
||||||
100 byte filename limit by adding a separate field in which to store the directory
|
byte filename limit by adding a separate field in which to store the directory
|
||||||
component of the pathname. Pathnames are still limited in size to 350 bytes.
|
component of the pathname. Pathnames are still limited in size to 350 bytes,
|
||||||
|
with 250 bytes allocated for the parent directory and 100 bytes to the file
|
||||||
|
name.
|
||||||
- GNU tar fixed the filename limitation with GNU tar headers. GNU tar headers are
|
- GNU tar fixed the filename limitation with GNU tar headers. GNU tar headers are
|
||||||
not documented anywhere other than the GNU tar source code, so other implementations
|
not documented anywhere other than the GNU tar source code, so other implementations
|
||||||
have ignored the GNU format and it never caught on.
|
have ignored the GNU format and it never caught on.
|
||||||
@ -94,16 +97,17 @@ number of design flaws and quirks.
|
|||||||
|
|
||||||
Compared with Tar, Haggis takes a different approach. All integer values are stored
|
Compared with Tar, Haggis takes a different approach. All integer values are stored
|
||||||
as little endian byte arrays, exactly the same as the in memory representation of a
|
as little endian byte arrays, exactly the same as the in memory representation of a
|
||||||
little endian computer. All metadata strings are preceded by their length, requiring
|
little endian processor. All metadata strings are preceded by their length, requiring
|
||||||
no padding between fields. The actual contents of regular files are written as a byte
|
no padding between fields. The actual contents of regular files are written as a byte
|
||||||
array, and again preceded by the length in bytes, so once again no padding is required.
|
array, and again preceded by the length in bytes, so once again no padding is required.
|
||||||
|
|
||||||
If you've gotten this far, you might be noticing some differences in design philosophy.
|
If you've gotten this far, you might be noticing some differences in design philosophy.
|
||||||
- Ascii is great for humans to read but terrible for computers. Since archives are
|
- Ascii is great for humans to read but terrible for computers. Since archives are
|
||||||
read by computers, not humans, ascii is bad.
|
read by computers, not humans, ascii is not a great choice for a format designed
|
||||||
|
to be read by computers and not humans.
|
||||||
- Padding is extra bytes. Sure, that overhead tends to get squashed after compressing
|
- Padding is extra bytes. Sure, that overhead tends to get squashed after compressing
|
||||||
an archive, but it requires more memory to create the extra zeroes and more memory
|
an archive, but it requires more memory to create the extra zeroes and more memory
|
||||||
to extract them. Better to not use padding everywhere.
|
to extract them. Better to avoid padding altogether.
|
||||||
- Using offsets would always have lead to embarrassingly shortsighted limitations
|
- Using offsets would always have lead to embarrassingly shortsighted limitations
|
||||||
such as the filename length limitation that has plagued Tar from day one. Variable
|
such as the filename length limitation that has plagued Tar from day one. Variable
|
||||||
length fields are easily handled by storing their length first.
|
length fields are easily handled by storing their length first.
|
||||||
@ -117,10 +121,10 @@ and settled on [zstd](https://github.com/facebook/zstd) as being so superior as
|
|||||||
make all other common compression schemes irrelevant for **general** usage. Gzip and
|
make all other common compression schemes irrelevant for **general** usage. Gzip and
|
||||||
Bzip2 have woefully lower compression ratios and terrible performance. The
|
Bzip2 have woefully lower compression ratios and terrible performance. The
|
||||||
[xz](https://tukaani.org/xz/) compression algorithm offers much better compression at
|
[xz](https://tukaani.org/xz/) compression algorithm offers much better compression at
|
||||||
the cost of poor performance. Meta may be evil overall, but zstd offers compression
|
the cost of poor performance. Zstd offers compression ratios on par with xz with
|
||||||
ratios on par with xz and performance that is higher than all three major competitors.
|
performance that is higher than all three major competitors. Zstd now comes
|
||||||
Zstd now comes pre-installed on virtually every Linux system and is easily installed
|
pre-installed on virtually every Linux system and is easily installed on BSD and
|
||||||
on BSD and other Unix-like systems. It is the new standard.
|
other Unix-like systems. It is the new standard.
|
||||||
|
|
||||||
Other compression schemes could have been implemented into the library code, but
|
Other compression schemes could have been implemented into the library code, but
|
||||||
that would add to the maintenance burden while not adding significantly useful
|
that would add to the maintenance burden while not adding significantly useful
|
||||||
@ -130,7 +134,8 @@ Haggis. Better to encourage the use of one good compression format and discourag
|
|||||||
the continued use of legacy software.
|
the continued use of legacy software.
|
||||||
|
|
||||||
If you absolutely **must** compress a haggis archive using gzip or bzip2, you can
|
If you absolutely **must** compress a haggis archive using gzip or bzip2, you can
|
||||||
do so manually. The *haggis* binary does not provide this functionality. Don't ask.
|
do so manually, or pipe output from one program to another. The *haggis* reference
|
||||||
|
binary does not provide this functionality. Don't ask.
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
Contributions are always welcome. Please run `cargo fmt` and `cargo clippy` and
|
Contributions are always welcome. Please run `cargo fmt` and `cargo clippy` and
|
||||||
|
@ -93,23 +93,11 @@ impl FileType {
|
|||||||
Ok(Self::Normal(file))
|
Ok(Self::Normal(file))
|
||||||
}
|
}
|
||||||
Flag::HardLink => {
|
Flag::HardLink => {
|
||||||
let mut len = [0; 2];
|
let s = crate::load_string(reader)?;
|
||||||
reader.read_exact(&mut len)?;
|
|
||||||
let len = u16::from_le_bytes(len);
|
|
||||||
let mut buf = Vec::with_capacity(len.into());
|
|
||||||
let mut handle = reader.take(len.into());
|
|
||||||
handle.read_to_end(&mut buf)?;
|
|
||||||
let s = String::from_utf8(buf)?;
|
|
||||||
Ok(Self::HardLink(s))
|
Ok(Self::HardLink(s))
|
||||||
}
|
}
|
||||||
Flag::SoftLink => {
|
Flag::SoftLink => {
|
||||||
let mut len = [0; 2];
|
let s = crate::load_string(reader)?;
|
||||||
reader.read_exact(&mut len)?;
|
|
||||||
let len = u16::from_le_bytes(len);
|
|
||||||
let mut buf = Vec::with_capacity(len.into());
|
|
||||||
let mut handle = reader.take(len.into());
|
|
||||||
handle.read_to_end(&mut buf)?;
|
|
||||||
let s = String::from_utf8(buf)?;
|
|
||||||
Ok(Self::SoftLink(s))
|
Ok(Self::SoftLink(s))
|
||||||
}
|
}
|
||||||
Flag::Directory => Ok(Self::Directory),
|
Flag::Directory => Ok(Self::Directory),
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
#![warn(clippy::all, clippy::pedantic)]
|
#![warn(clippy::all, clippy::pedantic)]
|
||||||
use {
|
use {
|
||||||
clap::ArgMatches,
|
clap::ArgMatches,
|
||||||
haggis::{Algorithm, Listing, ListingKind, ListingStream, Message, Stream, StreamMessage},
|
haggis::{Algorithm, Listing, ListingKind, ListingStream, NodeStream, Message, StreamMessage},
|
||||||
indicatif::{ProgressBar, ProgressStyle},
|
indicatif::{ProgressBar, ProgressStyle},
|
||||||
std::{
|
std::{
|
||||||
fs::{self, File},
|
fs::{self, File},
|
||||||
@ -176,7 +176,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
|||||||
let file = file.cloned().unwrap_or("stdin".to_string());
|
let file = file.cloned().unwrap_or("stdin".to_string());
|
||||||
let handle = if zst {
|
let handle = if zst {
|
||||||
let reader = Decoder::new(fd)?;
|
let reader = Decoder::new(fd)?;
|
||||||
let mut stream = Stream::new(reader)?;
|
let mut stream = NodeStream::new(reader)?;
|
||||||
let handle = if matches.get_flag("quiet") {
|
let handle = if matches.get_flag("quiet") {
|
||||||
Some(thread::spawn(move || {
|
Some(thread::spawn(move || {
|
||||||
progress(&file, &receiver, u64::from(stream.length));
|
progress(&file, &receiver, u64::from(stream.length));
|
||||||
@ -189,7 +189,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
|||||||
handle
|
handle
|
||||||
} else {
|
} else {
|
||||||
let reader = BufReader::new(fd);
|
let reader = BufReader::new(fd);
|
||||||
let mut stream = Stream::new(reader)?;
|
let mut stream = NodeStream::new(reader)?;
|
||||||
let handle = if matches.get_flag("quiet") {
|
let handle = if matches.get_flag("quiet") {
|
||||||
Some(thread::spawn(move || {
|
Some(thread::spawn(move || {
|
||||||
progress(&file, &receiver, u64::from(stream.length));
|
progress(&file, &receiver, u64::from(stream.length));
|
||||||
@ -281,7 +281,7 @@ fn list_unsorted(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
|||||||
let fd = File::open(file)?;
|
let fd = File::open(file)?;
|
||||||
if matches.get_flag("zstd") {
|
if matches.get_flag("zstd") {
|
||||||
let reader = Decoder::new(fd)?;
|
let reader = Decoder::new(fd)?;
|
||||||
let stream = Stream::new(reader)?;
|
let stream = NodeStream::new(reader)?;
|
||||||
for node in stream {
|
for node in stream {
|
||||||
let node = node?;
|
let node = node?;
|
||||||
let li = Listing::from(node);
|
let li = Listing::from(node);
|
||||||
@ -304,7 +304,7 @@ fn list(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
|||||||
let zst = matches.get_flag("zstd") || haggis::detect_zstd(&mut fd)?;
|
let zst = matches.get_flag("zstd") || haggis::detect_zstd(&mut fd)?;
|
||||||
let list = if zst {
|
let list = if zst {
|
||||||
let reader = Decoder::new(fd)?;
|
let reader = Decoder::new(fd)?;
|
||||||
let stream = Stream::new(reader)?;
|
let stream = NodeStream::new(reader)?;
|
||||||
let mut list = vec![];
|
let mut list = vec![];
|
||||||
for node in stream {
|
for node in stream {
|
||||||
let node = node?;
|
let node = node?;
|
||||||
|
17
src/lib.rs
17
src/lib.rs
@ -33,7 +33,7 @@ pub use {
|
|||||||
listing_stream::ListingStream,
|
listing_stream::ListingStream,
|
||||||
node::Node,
|
node::Node,
|
||||||
special::Special,
|
special::Special,
|
||||||
stream::Stream,
|
stream::Stream as NodeStream,
|
||||||
};
|
};
|
||||||
|
|
||||||
#[cfg(feature = "parallel")]
|
#[cfg(feature = "parallel")]
|
||||||
@ -54,6 +54,16 @@ pub fn detect_zstd<R: Read + Seek>(reader: &mut R) -> Result<bool, Error> {
|
|||||||
Ok(buf == ZSTD_MAGIC)
|
Ok(buf == ZSTD_MAGIC)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
pub(crate) fn load_string<R: Read>(reader: &mut R) -> Result<String, Error> {
|
||||||
|
let mut len = [0; 2];
|
||||||
|
reader.read_exact(&mut len)?;
|
||||||
|
let len = u16::from_le_bytes(len);
|
||||||
|
let mut buf = Vec::with_capacity(len.into());
|
||||||
|
let mut handle = reader.take(len.into());
|
||||||
|
handle.read_to_end(&mut buf)?;
|
||||||
|
Ok(String::from_utf8(buf)?)
|
||||||
|
}
|
||||||
|
|
||||||
#[allow(clippy::similar_names)]
|
#[allow(clippy::similar_names)]
|
||||||
/// Creates a haggis archive from a list of files
|
/// Creates a haggis archive from a list of files
|
||||||
/// # Errors
|
/// # Errors
|
||||||
@ -91,7 +101,7 @@ pub fn create_archive_stdout(
|
|||||||
}
|
}
|
||||||
|
|
||||||
#[allow(clippy::similar_names)]
|
#[allow(clippy::similar_names)]
|
||||||
/// Streams a haggis archive over something which implements `Write`
|
/// Creates and streams a haggis archive over something which implements `Write`
|
||||||
/// # Errors
|
/// # Errors
|
||||||
/// Returns `crate::Error` if io fails or several other error conditions
|
/// Returns `crate::Error` if io fails or several other error conditions
|
||||||
pub fn stream_archive<W: Write>(
|
pub fn stream_archive<W: Write>(
|
||||||
@ -183,7 +193,8 @@ pub fn par_create_archive_stdout(
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Streams a Haggis archive from a list of files, processing each file in parallel
|
/// Creates and streams a Haggis archive from a list of files, processing each
|
||||||
|
/// file in parallel
|
||||||
/// # Errors
|
/// # Errors
|
||||||
/// Returns `crate::Error` if io fails or several other error conditions
|
/// Returns `crate::Error` if io fails or several other error conditions
|
||||||
#[cfg(feature = "parallel")]
|
#[cfg(feature = "parallel")]
|
||||||
|
11
src/node.rs
11
src/node.rs
@ -66,10 +66,8 @@ impl Node {
|
|||||||
/// # Errors
|
/// # Errors
|
||||||
/// Returns `crate::Error` if io fails or the archive is incorrectly formatted
|
/// Returns `crate::Error` if io fails or the archive is incorrectly formatted
|
||||||
pub fn read<T: Read>(reader: &mut T) -> Result<Self, Error> {
|
pub fn read<T: Read>(reader: &mut T) -> Result<Self, Error> {
|
||||||
let mut len = [0; 2];
|
let name = crate::load_string(reader)?;
|
||||||
reader.read_exact(&mut len)?;
|
if name.is_empty() {
|
||||||
let len = u16::from_le_bytes(len);
|
|
||||||
if len == 0 {
|
|
||||||
return Ok(Self {
|
return Ok(Self {
|
||||||
name: String::new(),
|
name: String::new(),
|
||||||
mode: 0,
|
mode: 0,
|
||||||
@ -79,9 +77,6 @@ impl Node {
|
|||||||
filetype: FileType::Eof,
|
filetype: FileType::Eof,
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
let mut name = Vec::with_capacity(len.into());
|
|
||||||
let mut handle = reader.take(len.into());
|
|
||||||
handle.read_to_end(&mut name)?;
|
|
||||||
let mut buf = [0; 18];
|
let mut buf = [0; 18];
|
||||||
reader.read_exact(&mut buf)?;
|
reader.read_exact(&mut buf)?;
|
||||||
let uid: [u8; 4] = buf[0..4].try_into()?;
|
let uid: [u8; 4] = buf[0..4].try_into()?;
|
||||||
@ -92,7 +87,7 @@ impl Node {
|
|||||||
let (flag, mode) = Flag::extract_from_raw(raw_mode)?;
|
let (flag, mode) = Flag::extract_from_raw(raw_mode)?;
|
||||||
let filetype = FileType::read(reader, flag)?;
|
let filetype = FileType::read(reader, flag)?;
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
name: String::from_utf8(name)?,
|
name,
|
||||||
uid: u32::from_le_bytes(uid),
|
uid: u32::from_le_bytes(uid),
|
||||||
gid: u32::from_le_bytes(gid),
|
gid: u32::from_le_bytes(gid),
|
||||||
mtime: u64::from_le_bytes(mtime),
|
mtime: u64::from_le_bytes(mtime),
|
||||||
|
Loading…
Reference in New Issue
Block a user