Proofread docs, move filename reads into helper function load_filename

2024-02-23 19:10:50 -05:00 · 2024-02-23 19:10:50 -05:00 · 4b0318036f
commit 4b0318036f
parent 30fcaa8e07
5 changed files with 45 additions and 46 deletions
--- a/README.md
+++ b/README.md
@ -17,13 +17,14 @@ Contents

 ## Features
 For a more full specification of the format, please see [Format.md](Format.md)
- No padding between metadata fields or data segments so it only stores the data
-  required to recreate the original file
+- No padding between metadata fields or data segments. Only the data required to
+  recreate the original file is stored.
 - Optional inline checksumming using a choice of md5, sha1 or sha256 algorithms
  to ensure data integrity
 - Easily parallelized library code
 - Uses generic `Read` and `Write` interfaces from Rust `std` to support reading
-  archive nodes from anything that can supply a stream of data
+  archive nodes from anything that can supply a stream of data. This could be a 
+  file, or it could be stdin/stdout, or a network connection.

 ## Building
 The minimum supported Rust version (MSRV) for this project is currently Rust 1.65.
@ -39,7 +40,7 @@ git = "https://codeberg.org/jeang3nie/haggis.git"
 The `parallel` feature enables parallel file operations via
 [Rayon](https://crates.io/crates/rayon). When creating an archive, files will be
 read and checksummed in separate threads and the data passed back to the main
-thread for writing an archive. During extraction, the main thread reads the
+thread for writing the archive. During extraction, the main thread reads the
 archive and passes each node to a worker thread to verify it's checksum and write
 the file to disk.

@ -73,10 +74,12 @@ easy packaging. This feature leverages the
 The venerable Unix archiver, Tar, has the benefit of being ubiquitous on every Unix
 and Unix-like operating system. Beyond that, tar is a rather clunky format with a
 number of design flaws and quirks.
- The original Tar specification had a hard limit in path names of 100 bytes
- The Ustar revision of the original Tar specification only partially fixed the
-  100 byte filename limit by adding a separate field in which to store the directory
-  component of the pathname. Pathnames are still limited in size to 350 bytes.
+- The original Tar specification had a hard limit in path names of 100 bytes.
+- The Ustar revision of the original specification only partially fixed the 100
+  byte filename limit by adding a separate field in which to store the directory
+  component of the pathname. Pathnames are still limited in size to 350 bytes,
+  with 250 bytes allocated for the parent directory and 100 bytes to the file
+  name.
 - GNU tar fixed the filename limitation with GNU tar headers. GNU tar headers are
  not documented anywhere other than the GNU tar source code, so other implementations
  have ignored the GNU format and it never caught on.
@ -94,16 +97,17 @@ number of design flaws and quirks.
  
 Compared with Tar, Haggis takes a different approach. All integer values are stored
 as little endian byte arrays, exactly the same as the in memory representation of a
-little endian computer. All metadata strings are preceded by their length, requiring
+little endian processor. All metadata strings are preceded by their length, requiring
 no padding between fields. The actual contents of regular files are written as a byte
 array, and again preceded by the length in bytes, so once again no padding is required.

 If you've gotten this far, you might be noticing some differences in design philosophy.
 - Ascii is great for humans to read but terrible for computers. Since archives are
-  read by computers, not humans, ascii is bad.
+  read by computers, not humans, ascii is not a great choice for a format designed
+  to be read by computers and not humans.
 - Padding is extra bytes. Sure, that overhead tends to get squashed after compressing
  an archive, but it requires more memory to create the extra zeroes and more memory
-  to extract them. Better to not use padding everywhere.
+  to extract them. Better to avoid padding altogether.
 - Using offsets would always have lead to embarrassingly shortsighted limitations
  such as the filename length limitation that has plagued Tar from day one. Variable
  length fields are easily handled by storing their length first.
@ -117,10 +121,10 @@ and settled on [zstd](https://github.com/facebook/zstd) as being so superior as
 make all other common compression schemes irrelevant for **general** usage. Gzip and
 Bzip2 have woefully lower compression ratios and terrible performance. The
 [xz](https://tukaani.org/xz/) compression algorithm offers much better compression at
-the cost of poor performance. Meta may be evil overall, but zstd offers compression
-ratios on par with xz and performance that is higher than all three major competitors.
-Zstd now comes pre-installed on virtually every Linux system and is easily installed
-on BSD and other Unix-like systems. It is the new standard.
+the cost of poor performance. Zstd offers compression ratios on par with xz with
+performance that is higher than all three major competitors. Zstd now comes
+pre-installed on virtually every Linux system and is easily installed on BSD and
+other Unix-like systems. It is the new standard.

 Other compression schemes could have been implemented into the library code, but
 that would add to the maintenance burden while not adding significantly useful
@ -130,7 +134,8 @@ Haggis. Better to encourage the use of one good compression format and discourag
 the continued use of legacy software.

 If you absolutely **must** compress a haggis archive using gzip or bzip2, you can
-do so manually. The *haggis* binary does not provide this functionality. Don't ask.
+do so manually, or pipe output from one program to another. The *haggis* reference
+binary does not provide this functionality. Don't ask.

 ## Contributing
 Contributions are always welcome. Please run `cargo fmt` and `cargo clippy` and
--- a/src/filetype.rs
+++ b/src/filetype.rs
@ -93,23 +93,11 @@ impl FileType {
                Ok(Self::Normal(file))
            }
            Flag::HardLink => {
-                let mut len = [0; 2];
-                reader.read_exact(&mut len)?;
-                let len = u16::from_le_bytes(len);
-                let mut buf = Vec::with_capacity(len.into());
-                let mut handle = reader.take(len.into());
-                handle.read_to_end(&mut buf)?;
-                let s = String::from_utf8(buf)?;
+                let s = crate::load_string(reader)?;
                Ok(Self::HardLink(s))
            }
            Flag::SoftLink => {
-                let mut len = [0; 2];
-                reader.read_exact(&mut len)?;
-                let len = u16::from_le_bytes(len);
-                let mut buf = Vec::with_capacity(len.into());
-                let mut handle = reader.take(len.into());
-                handle.read_to_end(&mut buf)?;
-                let s = String::from_utf8(buf)?;
+                let s = crate::load_string(reader)?;
                Ok(Self::SoftLink(s))
            }
            Flag::Directory => Ok(Self::Directory),
--- a/src/haggis.rs
+++ b/src/haggis.rs
@ -1,7 +1,7 @@
 #![warn(clippy::all, clippy::pedantic)]
 use {
    clap::ArgMatches,
-    haggis::{Algorithm, Listing, ListingKind, ListingStream, Message, Stream, StreamMessage},
+    haggis::{Algorithm, Listing, ListingKind, ListingStream, NodeStream, Message, StreamMessage},
    indicatif::{ProgressBar, ProgressStyle},
    std::{
        fs::{self, File},
@ -176,7 +176,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> {
    let file = file.cloned().unwrap_or("stdin".to_string());
    let handle = if zst {
        let reader = Decoder::new(fd)?;
-        let mut stream = Stream::new(reader)?;
+        let mut stream = NodeStream::new(reader)?;
        let handle = if matches.get_flag("quiet") {
            Some(thread::spawn(move || {
                progress(&file, &receiver, u64::from(stream.length));
@ -189,7 +189,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> {
        handle
    } else {
        let reader = BufReader::new(fd);
-        let mut stream = Stream::new(reader)?;
+        let mut stream = NodeStream::new(reader)?;
        let handle = if matches.get_flag("quiet") {
            Some(thread::spawn(move || {
                progress(&file, &receiver, u64::from(stream.length));
@ -281,7 +281,7 @@ fn list_unsorted(matches: &ArgMatches) -> Result<(), haggis::Error> {
    let fd = File::open(file)?;
    if matches.get_flag("zstd") {
        let reader = Decoder::new(fd)?;
-        let stream = Stream::new(reader)?;
+        let stream = NodeStream::new(reader)?;
        for node in stream {
            let node = node?;
            let li = Listing::from(node);
@ -304,7 +304,7 @@ fn list(matches: &ArgMatches) -> Result<(), haggis::Error> {
    let zst = matches.get_flag("zstd") || haggis::detect_zstd(&mut fd)?;
    let list = if zst {
        let reader = Decoder::new(fd)?;
-        let stream = Stream::new(reader)?;
+        let stream = NodeStream::new(reader)?;
        let mut list = vec![];
        for node in stream {
            let node = node?;
--- a/src/lib.rs
+++ b/src/lib.rs
@ -33,7 +33,7 @@ pub use {
    listing_stream::ListingStream,
    node::Node,
    special::Special,
-    stream::Stream,
+    stream::Stream as NodeStream,
 };

 #[cfg(feature = "parallel")]
@ -54,6 +54,16 @@ pub fn detect_zstd<R: Read + Seek>(reader: &mut R) -> Result<bool, Error> {
    Ok(buf == ZSTD_MAGIC)
 }

+pub(crate) fn load_string<R: Read>(reader: &mut R) -> Result<String, Error> {
+    let mut len = [0; 2];
+    reader.read_exact(&mut len)?;
+    let len = u16::from_le_bytes(len);
+    let mut buf = Vec::with_capacity(len.into());
+    let mut handle = reader.take(len.into());
+    handle.read_to_end(&mut buf)?;
+    Ok(String::from_utf8(buf)?)
+}
+
 #[allow(clippy::similar_names)]
 /// Creates a haggis archive from a list of files
 /// # Errors
@ -91,7 +101,7 @@ pub fn create_archive_stdout(
 }

 #[allow(clippy::similar_names)]
-/// Streams a haggis archive over something which implements `Write`
+/// Creates and streams a haggis archive over something which implements `Write`
 /// # Errors
 /// Returns `crate::Error` if io fails or several other error conditions
 pub fn stream_archive<W: Write>(
@ -183,7 +193,8 @@ pub fn par_create_archive_stdout(
    Ok(())
 }

-/// Streams a Haggis archive from a list of files, processing each file in parallel
+/// Creates and streams a Haggis archive from a list of files, processing each
+/// file in parallel
 /// # Errors
 /// Returns `crate::Error` if io fails or several other error conditions
 #[cfg(feature = "parallel")]
--- a/src/node.rs
+++ b/src/node.rs
@ -66,10 +66,8 @@ impl Node {
    /// # Errors
    /// Returns `crate::Error` if io fails or the archive is incorrectly formatted
    pub fn read<T: Read>(reader: &mut T) -> Result<Self, Error> {
-        let mut len = [0; 2];
-        reader.read_exact(&mut len)?;
-        let len = u16::from_le_bytes(len);
-        if len == 0 {
+        let name = crate::load_string(reader)?;
+        if name.is_empty() {
            return Ok(Self {
                name: String::new(),
                mode: 0,
@ -79,9 +77,6 @@ impl Node {
                filetype: FileType::Eof,
            });
        }
-        let mut name = Vec::with_capacity(len.into());
-        let mut handle = reader.take(len.into());
-        handle.read_to_end(&mut name)?;
        let mut buf = [0; 18];
        reader.read_exact(&mut buf)?;
        let uid: [u8; 4] = buf[0..4].try_into()?;
@ -92,7 +87,7 @@ impl Node {
        let (flag, mode) = Flag::extract_from_raw(raw_mode)?;
        let filetype = FileType::read(reader, flag)?;
        Ok(Self {
-            name: String::from_utf8(name)?,
+            name,
            uid: u32::from_le_bytes(uid),
            gid: u32::from_le_bytes(gid),
            mtime: u64::from_le_bytes(mtime),