diff --git a/README.md b/README.md index 849c246..9f4d099 100644 --- a/README.md +++ b/README.md @@ -17,13 +17,14 @@ Contents ## Features For a more full specification of the format, please see [Format.md](Format.md) -- No padding between metadata fields or data segments so it only stores the data - required to recreate the original file +- No padding between metadata fields or data segments. Only the data required to + recreate the original file is stored. - Optional inline checksumming using a choice of md5, sha1 or sha256 algorithms to ensure data integrity - Easily parallelized library code - Uses generic `Read` and `Write` interfaces from Rust `std` to support reading - archive nodes from anything that can supply a stream of data + archive nodes from anything that can supply a stream of data. This could be a + file, or it could be stdin/stdout, or a network connection. ## Building The minimum supported Rust version (MSRV) for this project is currently Rust 1.65. @@ -39,7 +40,7 @@ git = "https://codeberg.org/jeang3nie/haggis.git" The `parallel` feature enables parallel file operations via [Rayon](https://crates.io/crates/rayon). When creating an archive, files will be read and checksummed in separate threads and the data passed back to the main -thread for writing an archive. During extraction, the main thread reads the +thread for writing the archive. During extraction, the main thread reads the archive and passes each node to a worker thread to verify it's checksum and write the file to disk. @@ -73,10 +74,12 @@ easy packaging. This feature leverages the The venerable Unix archiver, Tar, has the benefit of being ubiquitous on every Unix and Unix-like operating system. Beyond that, tar is a rather clunky format with a number of design flaws and quirks. -- The original Tar specification had a hard limit in path names of 100 bytes -- The Ustar revision of the original Tar specification only partially fixed the - 100 byte filename limit by adding a separate field in which to store the directory - component of the pathname. Pathnames are still limited in size to 350 bytes. +- The original Tar specification had a hard limit in path names of 100 bytes. +- The Ustar revision of the original specification only partially fixed the 100 + byte filename limit by adding a separate field in which to store the directory + component of the pathname. Pathnames are still limited in size to 350 bytes, + with 250 bytes allocated for the parent directory and 100 bytes to the file + name. - GNU tar fixed the filename limitation with GNU tar headers. GNU tar headers are not documented anywhere other than the GNU tar source code, so other implementations have ignored the GNU format and it never caught on. @@ -94,16 +97,17 @@ number of design flaws and quirks. Compared with Tar, Haggis takes a different approach. All integer values are stored as little endian byte arrays, exactly the same as the in memory representation of a -little endian computer. All metadata strings are preceded by their length, requiring +little endian processor. All metadata strings are preceded by their length, requiring no padding between fields. The actual contents of regular files are written as a byte array, and again preceded by the length in bytes, so once again no padding is required. If you've gotten this far, you might be noticing some differences in design philosophy. - Ascii is great for humans to read but terrible for computers. Since archives are - read by computers, not humans, ascii is bad. + read by computers, not humans, ascii is not a great choice for a format designed + to be read by computers and not humans. - Padding is extra bytes. Sure, that overhead tends to get squashed after compressing an archive, but it requires more memory to create the extra zeroes and more memory - to extract them. Better to not use padding everywhere. + to extract them. Better to avoid padding altogether. - Using offsets would always have lead to embarrassingly shortsighted limitations such as the filename length limitation that has plagued Tar from day one. Variable length fields are easily handled by storing their length first. @@ -117,10 +121,10 @@ and settled on [zstd](https://github.com/facebook/zstd) as being so superior as make all other common compression schemes irrelevant for **general** usage. Gzip and Bzip2 have woefully lower compression ratios and terrible performance. The [xz](https://tukaani.org/xz/) compression algorithm offers much better compression at -the cost of poor performance. Meta may be evil overall, but zstd offers compression -ratios on par with xz and performance that is higher than all three major competitors. -Zstd now comes pre-installed on virtually every Linux system and is easily installed -on BSD and other Unix-like systems. It is the new standard. +the cost of poor performance. Zstd offers compression ratios on par with xz with +performance that is higher than all three major competitors. Zstd now comes +pre-installed on virtually every Linux system and is easily installed on BSD and +other Unix-like systems. It is the new standard. Other compression schemes could have been implemented into the library code, but that would add to the maintenance burden while not adding significantly useful @@ -130,7 +134,8 @@ Haggis. Better to encourage the use of one good compression format and discourag the continued use of legacy software. If you absolutely **must** compress a haggis archive using gzip or bzip2, you can -do so manually. The *haggis* binary does not provide this functionality. Don't ask. +do so manually, or pipe output from one program to another. The *haggis* reference +binary does not provide this functionality. Don't ask. ## Contributing Contributions are always welcome. Please run `cargo fmt` and `cargo clippy` and diff --git a/src/filetype.rs b/src/filetype.rs index 1b1a7f2..e8ed298 100644 --- a/src/filetype.rs +++ b/src/filetype.rs @@ -93,23 +93,11 @@ impl FileType { Ok(Self::Normal(file)) } Flag::HardLink => { - let mut len = [0; 2]; - reader.read_exact(&mut len)?; - let len = u16::from_le_bytes(len); - let mut buf = Vec::with_capacity(len.into()); - let mut handle = reader.take(len.into()); - handle.read_to_end(&mut buf)?; - let s = String::from_utf8(buf)?; + let s = crate::load_string(reader)?; Ok(Self::HardLink(s)) } Flag::SoftLink => { - let mut len = [0; 2]; - reader.read_exact(&mut len)?; - let len = u16::from_le_bytes(len); - let mut buf = Vec::with_capacity(len.into()); - let mut handle = reader.take(len.into()); - handle.read_to_end(&mut buf)?; - let s = String::from_utf8(buf)?; + let s = crate::load_string(reader)?; Ok(Self::SoftLink(s)) } Flag::Directory => Ok(Self::Directory), diff --git a/src/haggis.rs b/src/haggis.rs index 4e5d1a7..5d3a476 100644 --- a/src/haggis.rs +++ b/src/haggis.rs @@ -1,7 +1,7 @@ #![warn(clippy::all, clippy::pedantic)] use { clap::ArgMatches, - haggis::{Algorithm, Listing, ListingKind, ListingStream, Message, Stream, StreamMessage}, + haggis::{Algorithm, Listing, ListingKind, ListingStream, NodeStream, Message, StreamMessage}, indicatif::{ProgressBar, ProgressStyle}, std::{ fs::{self, File}, @@ -176,7 +176,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> { let file = file.cloned().unwrap_or("stdin".to_string()); let handle = if zst { let reader = Decoder::new(fd)?; - let mut stream = Stream::new(reader)?; + let mut stream = NodeStream::new(reader)?; let handle = if matches.get_flag("quiet") { Some(thread::spawn(move || { progress(&file, &receiver, u64::from(stream.length)); @@ -189,7 +189,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> { handle } else { let reader = BufReader::new(fd); - let mut stream = Stream::new(reader)?; + let mut stream = NodeStream::new(reader)?; let handle = if matches.get_flag("quiet") { Some(thread::spawn(move || { progress(&file, &receiver, u64::from(stream.length)); @@ -281,7 +281,7 @@ fn list_unsorted(matches: &ArgMatches) -> Result<(), haggis::Error> { let fd = File::open(file)?; if matches.get_flag("zstd") { let reader = Decoder::new(fd)?; - let stream = Stream::new(reader)?; + let stream = NodeStream::new(reader)?; for node in stream { let node = node?; let li = Listing::from(node); @@ -304,7 +304,7 @@ fn list(matches: &ArgMatches) -> Result<(), haggis::Error> { let zst = matches.get_flag("zstd") || haggis::detect_zstd(&mut fd)?; let list = if zst { let reader = Decoder::new(fd)?; - let stream = Stream::new(reader)?; + let stream = NodeStream::new(reader)?; let mut list = vec![]; for node in stream { let node = node?; diff --git a/src/lib.rs b/src/lib.rs index 23e0475..9e2ee2d 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -33,7 +33,7 @@ pub use { listing_stream::ListingStream, node::Node, special::Special, - stream::Stream, + stream::Stream as NodeStream, }; #[cfg(feature = "parallel")] @@ -54,6 +54,16 @@ pub fn detect_zstd(reader: &mut R) -> Result { Ok(buf == ZSTD_MAGIC) } +pub(crate) fn load_string(reader: &mut R) -> Result { + let mut len = [0; 2]; + reader.read_exact(&mut len)?; + let len = u16::from_le_bytes(len); + let mut buf = Vec::with_capacity(len.into()); + let mut handle = reader.take(len.into()); + handle.read_to_end(&mut buf)?; + Ok(String::from_utf8(buf)?) +} + #[allow(clippy::similar_names)] /// Creates a haggis archive from a list of files /// # Errors @@ -91,7 +101,7 @@ pub fn create_archive_stdout( } #[allow(clippy::similar_names)] -/// Streams a haggis archive over something which implements `Write` +/// Creates and streams a haggis archive over something which implements `Write` /// # Errors /// Returns `crate::Error` if io fails or several other error conditions pub fn stream_archive( @@ -183,7 +193,8 @@ pub fn par_create_archive_stdout( Ok(()) } -/// Streams a Haggis archive from a list of files, processing each file in parallel +/// Creates and streams a Haggis archive from a list of files, processing each +/// file in parallel /// # Errors /// Returns `crate::Error` if io fails or several other error conditions #[cfg(feature = "parallel")] diff --git a/src/node.rs b/src/node.rs index b7debb4..d863ccf 100644 --- a/src/node.rs +++ b/src/node.rs @@ -66,10 +66,8 @@ impl Node { /// # Errors /// Returns `crate::Error` if io fails or the archive is incorrectly formatted pub fn read(reader: &mut T) -> Result { - let mut len = [0; 2]; - reader.read_exact(&mut len)?; - let len = u16::from_le_bytes(len); - if len == 0 { + let name = crate::load_string(reader)?; + if name.is_empty() { return Ok(Self { name: String::new(), mode: 0, @@ -79,9 +77,6 @@ impl Node { filetype: FileType::Eof, }); } - let mut name = Vec::with_capacity(len.into()); - let mut handle = reader.take(len.into()); - handle.read_to_end(&mut name)?; let mut buf = [0; 18]; reader.read_exact(&mut buf)?; let uid: [u8; 4] = buf[0..4].try_into()?; @@ -92,7 +87,7 @@ impl Node { let (flag, mode) = Flag::extract_from_raw(raw_mode)?; let filetype = FileType::read(reader, flag)?; Ok(Self { - name: String::from_utf8(name)?, + name, uid: u32::from_le_bytes(uid), gid: u32::from_le_bytes(gid), mtime: u64::from_le_bytes(mtime),