Add draft post about C and what's missing

This commit is contained in:
Nathan Fisher 2023-08-10 12:08:08 -04:00
parent 9e3ff319c5
commit ef6788aec6
2 changed files with 90 additions and 0 deletions

View file

@ -0,0 +1,61 @@
Meta(
title: "Just how much is really missing in C?",
summary: Some("An experiment in polyglot programming"),
published: None,
tags: [
"C",
"programming",
"haggis",
],
)
---
A while back now, I was working on a package manager for HitchHiker in Rust. The package format was based on Tar, and included checksumming for each file. In order to avoid having to re-read the same data that was being used to create tar nodes when it came time for checksumming, the tar support had to be built into the package manager itself. This gave me an up close look at the format. I didn't particularly like what I saw.
Long story short, I started thinking about how things could be done better. Rather than slicing everything up into 512 byte blocks and padding the final one for each node, why not just specify the file length and then read that number of bytes to eliminate the padding? Rather than recording numbers in ascii, why not just lay out integers as little endian bytes, just as they are represented in memory? That way we can store an integer in a much smaller space. Instead of recording all possible combinations of metadata for every node, why not specify what type of file that node represents and then only record the actually useful metadata? On it went, and on a wild hair I started implementing it, calling it the Haggis protocol because it's a bunch of ground up bytes stuffed into a rather nasty casing.
=> https://codeberg.org/jeang3nie/Haggis Haggis in Rust
=> https://codeberg.org/jeang3nie/seahag Haggis in C
Since I like to explore different programming languages, and have been wanting to do so in a bit more depth than I have in the past, I took this as an opportunity to do some polyglot programming and implement Haggis in Rust, Zig, Hare and C. The Rust implementation is nearly complete and came together quickly for me since I use it all of the time. Zig is not far off, with Hare trailing a bit behind right now. All three of those languages were a great fit for this project, but what has really surprised me is how much fun I've been having writing this in C. Now, I've never been great at C. I've used it for small projects, mostly command line utilities, but never for anything ambitious. I didn't get serious about learning to program until about four years ago in the first place, and by that time Rust was quite a viable option, so I've been using Rust for anything ambitious ever since. But I'm not one of those Rustaceans who thinks that using a language such as C is a sin. On the contrary, there are times when C is definitely the most appropriate tool. For instance if one is writing software to be included in the base system of any of the BSD's, then C is the appropriate tool due to the C compiler and runtime being the only compiler and runtime available. And while I've tried out Rust on microcontrollers I still prefer C for embedded due to the smaller binary sizes, enabling you to stuff more functionality into the very limited space.
## What do I miss most in C?
### And how hard is it to get it back?
I find algabreic types (tagged unions) to be an indispensible tool and any modern language that doesn't have them goes to the bottom of the pile for me. C obviously doesn't have them.
Both Zig and Rust have some form of growable array, HashMap, and BTreeMap right in their standard libraries. The closest you get in C in in queue.h, in the form of queue types backed up by linked lists. But Musl libc omits queue.h, so I won't ever use it as that would reduce portability.
Another big omission is testing. Zig, Rust and Hare all have integrated testing frameworks.
I could probably mention a bunch of other things, but to me these are the biggest issues. Basically, with most modern languages it's like building a house with all of the various parts you can find in a modern big box home store. You still need some specialist knowledge, but in large part you can just nail, screw and bolt ready made pieces together. In contrast C gives you a chainsaw, shovel and other basic tools with which you are expected to cut down trees, dig a foundation and basically make all of it yourself, including any special tools you might want or need. You can of course go buy tools off the random guy down the way, but it'll cost you. It might not be a monetary cost, but the tools from "down the way" are probably not completely fit for purpose, might not be maintained, have missing parts and in general maybe just aren't worth it, which is why so many C programmers just roll their own. And that's basically what I do, too, when dropping down to C.
What you get in return though, is a quite liberating "trust the programmer" philosophy. That "trust the programmer" is actually a direct quote from "The C Programming Language" by Kernighan and Ritchie. What this means is that, even though a language like Rust claims to get you as close to the hardware as C, there are still things one can do simply in C that can't be done the same way in Rust. The C compiler is so trusting that I find myself laughing sometimes at it, feeling as if I've just pulled off some great con, amazed that I just got away with murder. This has downsides of course, and when you combine it with the lack of an official testing framework one can easily paint themselves into a corner where things are crashing at runtime and it's going to take hours or days to track down the issues causing it.
### Tagged unions in C
The lack of algabreic data types turns out to not be that huge of a problem because you can roll your own. C has enums and unions. A tagged union is just a union with an enum tag. Pop an enum and a union into a struct and write functions to ensure that the union is only ever accessed after checking the enum "tag", and make sure that you only ever access that struct using those functions. Viola, you have a tagged union. If this is part of a public API, maybe don't directly expose the underlying struct, just the functions which access it safely.
```
typedef enum {
bar,
baz,
} tag;
typedef union {
unsigned long num;
char *name;
} data;
typedef struct {
tag tg;
data payload;
} foo;
void do_stuff_with_foo(foo *f) {
switch (f->tg) {
case bar:
// do stuff with f->payload as a ulong
break;
case baz:
// do different stuff with f->payload as a string
break;
};
}
```

View file

@ -0,0 +1,29 @@
Meta(
title: "Sortix: a hobby OS with potential",
summary: None,
published: None,
tags: [
"Unix",
"Sortix",
"programming",
],
)
---
I like perriodically checking out hobby OS projects and seeing what I can learn from them. Sortix is a small hobby OS with the goal of being a clean and modern POSIX implementation. It's been in development since 2011 and managed to go self hosting with it's 1.0 release in 2016, and is capable of running a fair amount of third party software now including Vim, Emacs, Nano, Python, and Perl, which makes it at this point a nice solid base on which to build. Expectations should be appropriate for a hobby system, however, as this is largely a one man project and is not production ready.
I've actually been using code from Sortix for a few years now in HitchHiker, as they have a nice cleaned up fork of Libz that went into the HitchHiker base system early on. It's pretty much a drop in replacement for the official Libz, but the code is missing a lot of the snarled mass of #ifdef..#else..#endif from the original, as support for long dead operatinig systems and platforms has been removed. This makes the code 1000% easier to follow and maintain. What I haven't done in quite a while is actually boot it up into a VM and check out what Sortix is like today.
## Sortix development environment (Linux)
As I mentioned earlier, Sortix is a self hosted system since 2016, but that is only for the base system. Ports are still being developed via cross compilation, as there are a lot of tools that developers take for granted which haven't made it into Sortix yet. So if you want to build master plus all of the ports you need to set up a cross development environment. Alternatively, you can download a nightly ISO and run that, but the nightlies aren't always up to date. Well, that and I like to poke at things and DIY, so I wanted to go the long route. There are official instructions on the website:
=> https://sortix.org/man/man7/cross-development.7.html
I got my cross toolchain set up on Linux but would suggest the following additional instructions.
* Run a clean shell, preferably Dash, by starting it with `env -i HOME=$HOME TERM=$TERM /bin/dash`
* If you have Isl installed it can confuse the gcc build. Manually disable Isl support by appending the arg `--without-isl` to the configure options
There was a small change I applied to the `gettext` port because it's build system got confused by the packages I had installed on my host system and attempted to compile in C# support. It's a common failure when authors use GNU autotools that their configure tests will run against the host libraries rather than the target libraries if you are cross compiling. In this case I just appended `--disable-csharp` to the args passed in to `configure`, which fixed the problem. I sent the tiny diff upstream in a pull request. After that, everything built without errors.
Sortix is a small system. The system itself (kernel and userland minus ports) compiles in about a minute or less. The bulk of the compoilation time is spent on Ports, but since there aren't a lot of ports it's still pretty quick.
## Sortix development environment (FreeBSD)
Cross compiling Sortix on FreeBSD requires more work because the author was obviously working from Linux using GNU tools. Rather than trying to patch all of the Linuxism's in his Makefiles I install GNU coreutils, GNU tar and GNU sed into the same prefix as the cross toolchain. Similar to building on Linux earlier, I started a clean shell, this time using `/bin/sh`.