Parsing arguments in Rust with no dependencies

21 points by chmaynard 4 days ago

shaftway 2 hours ago

Neat writeup.

I did a very similar thing in Rust, but to solve a different problem. I wanted to be able to easily make nice rich command line arguments for bash scripts without cluttering up the script too much. My minimal demo is:

    $ cat demo.sh
    eval "$(argparse-sh --string text -- "$@")";
    echo "$TEXT"
    
    $ ./demo.sh "Hello world"
    Hello World

I always think it's valuable to build these things for yourself; maybe not always valuable to ship them. It de-magic-ifies the libraries you use and can only help you grow as a programmer.

I'm still learning as a Rust developer, and I'm sure there are terrible things in my code. The hard part for me is finding ways to make things more idiomatic in isolation. I don't have a code review feedback loop I can use to speed up improvement.

https://github.com/Hounshell/argparse-sh

epage an hour ago

Context: maintainer of clap, a Cargo team member

Regarding the CLI parser

> This will take in our arguments (from something like std::env::args) and return our matches, or an error.

`std::env::args` will panic on non-UTF8 content, like a file path. You could instead error on non-UTF8 content. Until recently, you had to pull in a dependency or reinvent some non-trivial stuff to properly deal with `OsStr`s. There are now `unsafe` functions for dealing with them. I'd like to extend things further to have a proper "pattern" API for `OsStr` which would allow almost everything a CLI parser needs to deal with `OsStr` without a dependency and without `unsafe`.

---

Regarding the discussion on dependencies, I think there are reasonable and valid situations to be careful of adding dependencies (see https://tweedegolf.nl/en/blog/119/sudo-rs-depencencies-when-... and the follow up https://www.reddit.com/r/rust/comments/1b92j0k/sudors_depend...) but the reasoning here focuses on the wrong things imo.

> That would add 23 dependencies to my little project, if you count transitive dependencies. This can go up higher if you turn on a few features: derive, env, unicode, and wrap_help bring you up to 38 dependencies!

People overly focus on dependency counts. Yes, they mention dependency counts aren't a meaningful metric later but the lack of nuance here suggests they've not internalized that, including talking about the impact of optional dependencies when they advocate for optional dependencies later.

Clap can be trimmed down to just 4 dependencies. 1 of those exist for build performance. One might be able to be removed but is very light weight. The last is functionality that would exist either way, whether in its own crate or another.

> More concretely, by having no external dependencies you reduce your bug surface area. Sure, you own all the bugs now—but you won't get leftpad-ed, and you won't get dependabot alerts for third-removed transitive dependencies that now you've gotta patch.

crates.io is leftpad safe in all but the most extreme cases (law enforcement forces the deletion of a crate).

As I point out at the beginning, you already have a bug in this trivial code, one that is often hit when people think a CLI parser is trivial and they don't need dependencies.

> On the other hand, you miss out on nice things.

I think this is an understatement. imo one of the reasons we are seeing a lot of high quality CLIs out there is because its so easy to build on the work of others.

You also get very inconsistent results which makes the user experience much worse. Take the CLI parser shown here, it doesn't handle many conventions people expect, like multiple short flags (`-zxvf`). Having to deal with each CLI parser's quirks or only living with a subset of them all is not great.

> I think more things should be built from scratch and, ideally, without dependencies. You get to know the problem space better, and most things don't need the big sophisticated solution—but you pay for the whole dependency you pull in.

In creating a "product", the problem space of CLI parsing is not core. Same with a lot of what other dependencies provide. Instead of reinventing the wheel, you can better focus on the core of what you are trying to provide.

As for big sophisticated solutions, let's take the CLI space. There are many CLI parsers that you can pick from to adapt to the needs of your specific problem (https://github.com/rosetta-rs/argparse-rosetta-rs) but do you want to go into discovery mode for every dependency for every project, pivot between them as requirements change, or deal with bouncing between APIs for non-core parts of your projects? I don't.

axegon_ an hour ago

Side note: thanks to everyone involved in the development of Clap, working with it is truly a pleasure.
oguz-ismail an hour ago

>`std::env::args` will panic on non-UTF8 content, like a file path.
Tell me this is a joke.
- Hemospectrum an hour ago
  
  The documentation is quite clear on this point:
  > The returned iterator will panic during iteration if any argument to the process is not valid Unicode. If this is not desired, use the args_os function instead.
  std::env::args_os encodes paths as an OsString, which is allowed to contain invalid Unicode. You can then perform your own Unicode validation as needed, instead of the "ASAP" behavior of std::env::args.
  https://doc.rust-lang.org/std/env/fn.args.html
- namibj an hour ago
  
  What part do you hope/expect to be a joke there?
- Analemma_ an hour ago
  
  I think this is fine? 99.99% of the time in application-land I want to be working with valid UTF-8 only, and an equal percentage of the time, filenames and CLI args cooperate. And as the sibling comments say, this is all well-documented.
  Frankly I think the onus should be on operating systems to get with the program and be UTF-8 everywhere (I think UCS-2 on Windows and "bag of bytes" filenames on Linux are braindead), but until that happens we at least have std::env::args_os as an escape hatch.

Arch-TK 2 hours ago

I dislike clap too, it requires too much work to configure it in a sensible way (sensible being defined as working like most unix utilities have worked for the entire time I've used them), it comes with very "modern" defaults which while I appreciate are aiming to improve the situation, when I'm writing an utility in rust, it's usually a port of something I wrote in another language and I don't want to deviate unnecessarily. There's also the aspect of just how many dependencies it pulls in.

But, there are a few issues with this argument parser.

First and foremost, while there's no problem with forcing your option names to be str/String, you should still process OsStr/OsString unless none of your arguments are ever planning to be OS paths. The reason for this is that making your programs accept all the valid unix path names (which might not be valid UTF-8) is just the right thing to do, the alternative is an arbitrary restriction on your end users. It's about as annoying to run into these kinds of issues as it is to run into applications which don't handle spaces in filenames.

Next, there's the inability to handle multiple short options combined.

Also there's the lack of proper handling for options which require arguments vs options with optional arguments (-ovalue, -o value, --opt=value and --opt value should all work for the former case, but for the latter case it only makes sense to accept -ovalue and --opt=value due to the implications in the alternative case). Although this isn't that important and generally confuses people anyway so maybe it should be avoided.

Last (in this list, but no guarantee it's exhaustive), there's no handling for `--` to end passing options. This can have security implications.

It's a bit of a shame there isn't a zero dependency direct clone of python's argparse. Or something like that even in the standard library. argparse is relatively easy to use, not necessarily designed to be low overhead or fast (god help you if you're in a situation where option parsing is your bottleneck, but I can also appreciate the desire for not wasting cycles where there's no reason to waste them).

I think it's a good idea that people are writing their own low-dependency programs. But it's important that you understand the subject matter in detail if you plan on doing something like this for anything you're hoping to be used by anyone other than yourself.

While clap deviates a lot from the expectations of an option parser (I think part of the deviation is that the people behind clap want to do things "better" than they've been done in the past, but the problem with this motivation is that at some point better isn't important if it is at odds with interface design which has been around for a long time), it does for the most part handle most of these things in the expected way.

For me personally, I would reach for getargs (specifically, my own fork of getargs which does the handling of ArgsOs in a way I find to be optimal) can handle all of the above outline things correctly. There's also lexopt which looked promising when I last looked at it.

epage an hour ago

> I dislike clap too, it requires too much work to configure it in a sensible way (sensible being defined as working like most unix utilities have worked for the entire time I've used them), it comes with very "modern" defaults which while I appreciate are aiming to improve the situation, when I'm writing an utility in rust, it's usually a port of something I wrote in another language and I don't want to deviate unnecessarily. There's also the aspect of just how many dependencies it pulls in.
Which deviations are you concerned about?
> It's a bit of a shame there isn't a zero dependency direct clone of python's argparse. Or something like that even in the standard library. argparse is relatively easy to use, not necessarily designed to be low overhead or fast (god help you if you're in a situation where option parsing is your bottleneck, but I can also appreciate the desire for not wasting cycles where there's no reason to waste them).
As a maintainer of a CLI parser, I think there is too much policy to put in the standard library. If you go for something much simpler, like lexopt, I think its more doable but then again, I'm finding I'm writing y own lexopt-like library because it has too much policy in it.

tantalor an hour ago

> benefit of keeping my project's dependencies much lighter

This seems like pointless exercise.

Hemospectrum an hour ago

Compile times are Rust's single biggest weakness. A lot of work is going into speeding up the compiler, but right now, the biggest wins in compile time reduction come from reorganizing your own code and eliminating overly generic dependencies, particularly those that introduce loads of transitive dependencies for building procedural macros. Clap and Serde are major offenders here. For many projects, eliminating such dependencies can speed up build times by a factor of 10 (and similarly reduce disk usage). Depending on your circumstances, it can be worth the effort.
- tantalor an hour ago
  
  TIL, this is missing context, thanks for explaining.