Proofread docs, move filename reads into helper function load_filename
This commit is contained in:
parent
30fcaa8e07
commit
4b0318036f
37
README.md
37
README.md
@ -17,13 +17,14 @@ Contents
|
||||
|
||||
## Features
|
||||
For a more full specification of the format, please see [Format.md](Format.md)
|
||||
- No padding between metadata fields or data segments so it only stores the data
|
||||
required to recreate the original file
|
||||
- No padding between metadata fields or data segments. Only the data required to
|
||||
recreate the original file is stored.
|
||||
- Optional inline checksumming using a choice of md5, sha1 or sha256 algorithms
|
||||
to ensure data integrity
|
||||
- Easily parallelized library code
|
||||
- Uses generic `Read` and `Write` interfaces from Rust `std` to support reading
|
||||
archive nodes from anything that can supply a stream of data
|
||||
archive nodes from anything that can supply a stream of data. This could be a
|
||||
file, or it could be stdin/stdout, or a network connection.
|
||||
|
||||
## Building
|
||||
The minimum supported Rust version (MSRV) for this project is currently Rust 1.65.
|
||||
@ -39,7 +40,7 @@ git = "https://codeberg.org/jeang3nie/haggis.git"
|
||||
The `parallel` feature enables parallel file operations via
|
||||
[Rayon](https://crates.io/crates/rayon). When creating an archive, files will be
|
||||
read and checksummed in separate threads and the data passed back to the main
|
||||
thread for writing an archive. During extraction, the main thread reads the
|
||||
thread for writing the archive. During extraction, the main thread reads the
|
||||
archive and passes each node to a worker thread to verify it's checksum and write
|
||||
the file to disk.
|
||||
|
||||
@ -73,10 +74,12 @@ easy packaging. This feature leverages the
|
||||
The venerable Unix archiver, Tar, has the benefit of being ubiquitous on every Unix
|
||||
and Unix-like operating system. Beyond that, tar is a rather clunky format with a
|
||||
number of design flaws and quirks.
|
||||
- The original Tar specification had a hard limit in path names of 100 bytes
|
||||
- The Ustar revision of the original Tar specification only partially fixed the
|
||||
100 byte filename limit by adding a separate field in which to store the directory
|
||||
component of the pathname. Pathnames are still limited in size to 350 bytes.
|
||||
- The original Tar specification had a hard limit in path names of 100 bytes.
|
||||
- The Ustar revision of the original specification only partially fixed the 100
|
||||
byte filename limit by adding a separate field in which to store the directory
|
||||
component of the pathname. Pathnames are still limited in size to 350 bytes,
|
||||
with 250 bytes allocated for the parent directory and 100 bytes to the file
|
||||
name.
|
||||
- GNU tar fixed the filename limitation with GNU tar headers. GNU tar headers are
|
||||
not documented anywhere other than the GNU tar source code, so other implementations
|
||||
have ignored the GNU format and it never caught on.
|
||||
@ -94,16 +97,17 @@ number of design flaws and quirks.
|
||||
|
||||
Compared with Tar, Haggis takes a different approach. All integer values are stored
|
||||
as little endian byte arrays, exactly the same as the in memory representation of a
|
||||
little endian computer. All metadata strings are preceded by their length, requiring
|
||||
little endian processor. All metadata strings are preceded by their length, requiring
|
||||
no padding between fields. The actual contents of regular files are written as a byte
|
||||
array, and again preceded by the length in bytes, so once again no padding is required.
|
||||
|
||||
If you've gotten this far, you might be noticing some differences in design philosophy.
|
||||
- Ascii is great for humans to read but terrible for computers. Since archives are
|
||||
read by computers, not humans, ascii is bad.
|
||||
read by computers, not humans, ascii is not a great choice for a format designed
|
||||
to be read by computers and not humans.
|
||||
- Padding is extra bytes. Sure, that overhead tends to get squashed after compressing
|
||||
an archive, but it requires more memory to create the extra zeroes and more memory
|
||||
to extract them. Better to not use padding everywhere.
|
||||
to extract them. Better to avoid padding altogether.
|
||||
- Using offsets would always have lead to embarrassingly shortsighted limitations
|
||||
such as the filename length limitation that has plagued Tar from day one. Variable
|
||||
length fields are easily handled by storing their length first.
|
||||
@ -117,10 +121,10 @@ and settled on [zstd](https://github.com/facebook/zstd) as being so superior as
|
||||
make all other common compression schemes irrelevant for **general** usage. Gzip and
|
||||
Bzip2 have woefully lower compression ratios and terrible performance. The
|
||||
[xz](https://tukaani.org/xz/) compression algorithm offers much better compression at
|
||||
the cost of poor performance. Meta may be evil overall, but zstd offers compression
|
||||
ratios on par with xz and performance that is higher than all three major competitors.
|
||||
Zstd now comes pre-installed on virtually every Linux system and is easily installed
|
||||
on BSD and other Unix-like systems. It is the new standard.
|
||||
the cost of poor performance. Zstd offers compression ratios on par with xz with
|
||||
performance that is higher than all three major competitors. Zstd now comes
|
||||
pre-installed on virtually every Linux system and is easily installed on BSD and
|
||||
other Unix-like systems. It is the new standard.
|
||||
|
||||
Other compression schemes could have been implemented into the library code, but
|
||||
that would add to the maintenance burden while not adding significantly useful
|
||||
@ -130,7 +134,8 @@ Haggis. Better to encourage the use of one good compression format and discourag
|
||||
the continued use of legacy software.
|
||||
|
||||
If you absolutely **must** compress a haggis archive using gzip or bzip2, you can
|
||||
do so manually. The *haggis* binary does not provide this functionality. Don't ask.
|
||||
do so manually, or pipe output from one program to another. The *haggis* reference
|
||||
binary does not provide this functionality. Don't ask.
|
||||
|
||||
## Contributing
|
||||
Contributions are always welcome. Please run `cargo fmt` and `cargo clippy` and
|
||||
|
@ -93,23 +93,11 @@ impl FileType {
|
||||
Ok(Self::Normal(file))
|
||||
}
|
||||
Flag::HardLink => {
|
||||
let mut len = [0; 2];
|
||||
reader.read_exact(&mut len)?;
|
||||
let len = u16::from_le_bytes(len);
|
||||
let mut buf = Vec::with_capacity(len.into());
|
||||
let mut handle = reader.take(len.into());
|
||||
handle.read_to_end(&mut buf)?;
|
||||
let s = String::from_utf8(buf)?;
|
||||
let s = crate::load_string(reader)?;
|
||||
Ok(Self::HardLink(s))
|
||||
}
|
||||
Flag::SoftLink => {
|
||||
let mut len = [0; 2];
|
||||
reader.read_exact(&mut len)?;
|
||||
let len = u16::from_le_bytes(len);
|
||||
let mut buf = Vec::with_capacity(len.into());
|
||||
let mut handle = reader.take(len.into());
|
||||
handle.read_to_end(&mut buf)?;
|
||||
let s = String::from_utf8(buf)?;
|
||||
let s = crate::load_string(reader)?;
|
||||
Ok(Self::SoftLink(s))
|
||||
}
|
||||
Flag::Directory => Ok(Self::Directory),
|
||||
|
@ -1,7 +1,7 @@
|
||||
#![warn(clippy::all, clippy::pedantic)]
|
||||
use {
|
||||
clap::ArgMatches,
|
||||
haggis::{Algorithm, Listing, ListingKind, ListingStream, Message, Stream, StreamMessage},
|
||||
haggis::{Algorithm, Listing, ListingKind, ListingStream, NodeStream, Message, StreamMessage},
|
||||
indicatif::{ProgressBar, ProgressStyle},
|
||||
std::{
|
||||
fs::{self, File},
|
||||
@ -176,7 +176,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
||||
let file = file.cloned().unwrap_or("stdin".to_string());
|
||||
let handle = if zst {
|
||||
let reader = Decoder::new(fd)?;
|
||||
let mut stream = Stream::new(reader)?;
|
||||
let mut stream = NodeStream::new(reader)?;
|
||||
let handle = if matches.get_flag("quiet") {
|
||||
Some(thread::spawn(move || {
|
||||
progress(&file, &receiver, u64::from(stream.length));
|
||||
@ -189,7 +189,7 @@ fn extract(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
||||
handle
|
||||
} else {
|
||||
let reader = BufReader::new(fd);
|
||||
let mut stream = Stream::new(reader)?;
|
||||
let mut stream = NodeStream::new(reader)?;
|
||||
let handle = if matches.get_flag("quiet") {
|
||||
Some(thread::spawn(move || {
|
||||
progress(&file, &receiver, u64::from(stream.length));
|
||||
@ -281,7 +281,7 @@ fn list_unsorted(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
||||
let fd = File::open(file)?;
|
||||
if matches.get_flag("zstd") {
|
||||
let reader = Decoder::new(fd)?;
|
||||
let stream = Stream::new(reader)?;
|
||||
let stream = NodeStream::new(reader)?;
|
||||
for node in stream {
|
||||
let node = node?;
|
||||
let li = Listing::from(node);
|
||||
@ -304,7 +304,7 @@ fn list(matches: &ArgMatches) -> Result<(), haggis::Error> {
|
||||
let zst = matches.get_flag("zstd") || haggis::detect_zstd(&mut fd)?;
|
||||
let list = if zst {
|
||||
let reader = Decoder::new(fd)?;
|
||||
let stream = Stream::new(reader)?;
|
||||
let stream = NodeStream::new(reader)?;
|
||||
let mut list = vec![];
|
||||
for node in stream {
|
||||
let node = node?;
|
||||
|
17
src/lib.rs
17
src/lib.rs
@ -33,7 +33,7 @@ pub use {
|
||||
listing_stream::ListingStream,
|
||||
node::Node,
|
||||
special::Special,
|
||||
stream::Stream,
|
||||
stream::Stream as NodeStream,
|
||||
};
|
||||
|
||||
#[cfg(feature = "parallel")]
|
||||
@ -54,6 +54,16 @@ pub fn detect_zstd<R: Read + Seek>(reader: &mut R) -> Result<bool, Error> {
|
||||
Ok(buf == ZSTD_MAGIC)
|
||||
}
|
||||
|
||||
pub(crate) fn load_string<R: Read>(reader: &mut R) -> Result<String, Error> {
|
||||
let mut len = [0; 2];
|
||||
reader.read_exact(&mut len)?;
|
||||
let len = u16::from_le_bytes(len);
|
||||
let mut buf = Vec::with_capacity(len.into());
|
||||
let mut handle = reader.take(len.into());
|
||||
handle.read_to_end(&mut buf)?;
|
||||
Ok(String::from_utf8(buf)?)
|
||||
}
|
||||
|
||||
#[allow(clippy::similar_names)]
|
||||
/// Creates a haggis archive from a list of files
|
||||
/// # Errors
|
||||
@ -91,7 +101,7 @@ pub fn create_archive_stdout(
|
||||
}
|
||||
|
||||
#[allow(clippy::similar_names)]
|
||||
/// Streams a haggis archive over something which implements `Write`
|
||||
/// Creates and streams a haggis archive over something which implements `Write`
|
||||
/// # Errors
|
||||
/// Returns `crate::Error` if io fails or several other error conditions
|
||||
pub fn stream_archive<W: Write>(
|
||||
@ -183,7 +193,8 @@ pub fn par_create_archive_stdout(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Streams a Haggis archive from a list of files, processing each file in parallel
|
||||
/// Creates and streams a Haggis archive from a list of files, processing each
|
||||
/// file in parallel
|
||||
/// # Errors
|
||||
/// Returns `crate::Error` if io fails or several other error conditions
|
||||
#[cfg(feature = "parallel")]
|
||||
|
11
src/node.rs
11
src/node.rs
@ -66,10 +66,8 @@ impl Node {
|
||||
/// # Errors
|
||||
/// Returns `crate::Error` if io fails or the archive is incorrectly formatted
|
||||
pub fn read<T: Read>(reader: &mut T) -> Result<Self, Error> {
|
||||
let mut len = [0; 2];
|
||||
reader.read_exact(&mut len)?;
|
||||
let len = u16::from_le_bytes(len);
|
||||
if len == 0 {
|
||||
let name = crate::load_string(reader)?;
|
||||
if name.is_empty() {
|
||||
return Ok(Self {
|
||||
name: String::new(),
|
||||
mode: 0,
|
||||
@ -79,9 +77,6 @@ impl Node {
|
||||
filetype: FileType::Eof,
|
||||
});
|
||||
}
|
||||
let mut name = Vec::with_capacity(len.into());
|
||||
let mut handle = reader.take(len.into());
|
||||
handle.read_to_end(&mut name)?;
|
||||
let mut buf = [0; 18];
|
||||
reader.read_exact(&mut buf)?;
|
||||
let uid: [u8; 4] = buf[0..4].try_into()?;
|
||||
@ -92,7 +87,7 @@ impl Node {
|
||||
let (flag, mode) = Flag::extract_from_raw(raw_mode)?;
|
||||
let filetype = FileType::read(reader, flag)?;
|
||||
Ok(Self {
|
||||
name: String::from_utf8(name)?,
|
||||
name,
|
||||
uid: u32::from_le_bytes(uid),
|
||||
gid: u32::from_le_bytes(gid),
|
||||
mtime: u64::from_le_bytes(mtime),
|
||||
|
Loading…
Reference in New Issue
Block a user