About 2 months ago I would have said the same as the author, but I kept running against the hard edges of Rust: the borrow checker. I realised that while I really liked using algebraic data types (e.g. Enums) and pattern matching, the borrow checker and the low level memory concerns meant I spent a lot of time fighting the borrow checker instead of fighting the PL issues at the heart of my project. So while tokenising/parsing was nice, interpreting and typechecking became the bane of my existence
With that realisation I started looking for another more suitable language - I knew the FP aspects of Rust are what I was looking for so at first I considered something like F# but I didn't like that it's tied to microsoft/.NET. Looking a bit further I could have gone with something like Zig/C but then I lose the FP niceness I'm looking for. I also spent a fair amount of time looking at Go, but eventually decided that 1. I wanted a fair amount of syntax sugar, and 2. golang is a server side language, a lot of its features and library are geared towards this use case.
Finally I found OCaml, what really convinced me was seeing the syntax was like a friendly version of Haskell, or like Rust without lifetimes. In fact the first Rust compiler was written in OCaml, and OCaml is well known in the programming language space. I'm still learning OCaml so I'm not sure I can give a fair review yet, but so far it's exactly what I was looking for.
Bringing up goland always annoys me for some reason. Like, it's really practical, it is fast, but not actually low-level, it compiles fast, and most importantly it is very popular and has all the libraries. It seems like I should use it. But I just almost irrationally hate the language itself. Everything about it is just so ugly. It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009). PHP in 2009 was already a more modern and better designed language than goddamn golang. And golang didn't really improve since. I just cannot let it go somehow.
It is worse than that, as Go initially lacked generics (introduced by CLU and ML in 1976), still doesn't do even basic Pascal enumerations (1970) rather the iota/const dance, let alone the 1990's programming language design surface.
I only advocate for it on the scenarios where a garbage collected C is more than enough, regardless of the anti-GC naysayers, e.g. see TamaGo Unikernel.
The term you are looking for is sum types (albeit in a gimped form in the case of Pascal). Enumerations refer to the value applied to the type, quite literally, and is identical in Pascal as every other language with enumerations, including Go. There is only so much you can do with what is little more than a counter.
I'm fairly sure he's referring to enumerations actually.
Pascal doesn't require case matching of enumerations to be exhaustive, but this can be turned on as a compiler warning in modern Pascal environments, FreePascal / Lazarus and such.
Go only has enums by convention, hence the "iota dance" referred to. I've argued before that this does qualify as "having enums" but just barely.
It wouldn't have been difficult to do a much better job of it, is the thing.
> Pascal doesn't require case matching of enumerations to be exhaustive
Normally in Pascal you would not match on the enumeration at all, but rather on the sum types.
type
Foo = (Bar, Baz)
case foo:
Bar: ... // Matches type Bar
Baz: ... // Matches type Baz
The only reason for enumerations in Pascal (and other languages with similar constructs) is because under the hood the computer needs a binary representation to identify the type, and an incrementing number (an enum) is a convenient source for an identifier. In a theoretical world where the machine is magic you could have the sum types without enums, but in this reality...
Thus, yes, in practice it is possible to go around the type system and get the enumerated value out with Ord(foo), but at that point its just an integer and your chance at exhaustive matching is out the window. It is the type system that allows more flexibility in what the compiler can tell you, not the values generated by the enumeration.
> Go only has enums by convention
"Enums by convention" would be manually typing 1, 2, 3, 4, etc. into the code. Indeed, that too is an enumeration, but not as provided by the language. Go actually has enums as a first-class feature of the language[1]. You even say so yourself later on, so this statement is rather curious. I expect you are confusing enums with sum types again.
[1] Arguably Pascal doesn't even have that, only using enums as an implementation detail to support its sum types. Granted, the difference is inconsequential in practice.
Pascal enums are not sum types, because they are not the sum of multiple types. They are an enumeration of discrete values, which is why they're called enums.
Sum types in Pascal are called variant records:
type
FooKind = (Foo, Bar, Baz); (* An enum *)
FooOrBaz = record (* This is the sum type *)
case foo: FooKind of
Foo: (quux: Double);
Bar: (zot, zap: Double);
Baz: (xyzzy: String);
end
Rust conflates the 'enum' keyword with sum types. Pascal does not do this. One of us is confused about what a sum type is. It isn't me.
As for Go, my full opinion on that subject may be found here. If you're... curious. Let's say.
This is the exact same thing, except in addition to the tag there is also a data component. Yes, this is the more traditional representation of sum types, but having an "undefined" data component is still identifiable as a sum type. It is the tag that makes the union a sum type.
In Typescript terms, which I think illustrates this well, it is conceptually the difference between:
{ kind: 0 } | { kind: 1 }
and
{ kind: 0; data: T } | { kind: 1; data: U }
Which is to say that there is no difference with respect to the discussion here.
> They are an enumeration of discrete values
Yes, the "tag" is populated with an enumerator. There is an enumerator involved, that it is certain, but it is outside of the type system as the user sees it. It's just an integer generated to serve as an identifier – an identifier like seen in the above examples – but provided automatically. The additional information you can gain from it, like exhaustive matching, comes at the type level, not the number itself.
> Rust conflates the 'enum' keyword with sum types.
Right, because it too uses an enumerator to generate the tag value. Like Ord(foo) before, you can access the enumerated in Rust with something like
mem::discriminant(&foo)
The spoken usage of 'enum' in Rust is ultimately misplaced, I agree. An enumerator is not a type! But it is not wrong in identifying that an enumerator is involved. It conflates 'enum' only in the very same way you have here.
But then what purpose would it serve? The last 30 years has brought no lack of new languages, not to mention evolution of older languages. Just use one of them.
The purpose of creating yet another language with Go was to break from what everyone else was doing, to see if a "simple" language would stop developers from playing with fun language toys all day to instead focus on actual engineering.
It’s really not. The strawman that if we add a few features that have stood the test of time in every other language we’ll end up with C++ is just not true. Nobody is proposing adding SFINAE-based conditional compilation, rvalue references, multiple inheritance, or any of the million other Byzantine features that make C++ virtually impossible to use correctly. Adding sum types and a match statement does not necessarily start you down that path.
I know, I'm on the same boat. What I realized is I just need to avoid the companies using Go and I don't really need to be vocal about my dislike. It's not my loss if others find the language useful, but for me it either solves problems I'm not interested in solving, or the language and tooling just does not make it for me.
But, I can always just write Rust and be happy where I am. Or, to be honest, would not be very unhappy with F#, Haskell or Ocaml either.
and they also will avoid you! A monthly go-lang meetup in San Francisco impressed me as the only meetup I have ever been to where no one (in a crowded venue) seemed to want to talk to anyone outside their clique
Go is definitely used a lot more outside of google as of late.
Anecdotally I would say where a lot of companies would have used Java in the past they are now turning to go for their server-side/backend service implementations.
> The standard library is unimpressive (to be generous)
Coming from Python, this is one of the major things that I just can't get past with golang (despite having to use it for work). The standard library has a lot of really interesting/impressive/useful things to cover niche cases, but is missing a lot of what I would consider basic functionality that I keep running into requiring me to go get an external module to solve the problem.
Then, on top of that, the documentation for external modules is extremely terrible. In many cases the best you can get is API documentation in the form of "these are the functions, this is what they take and return" with no explanation of what those values need to be, what the function does with them, and so on; a simple list of functions. In others, there is that plus example code which doesn't work because it hasn't been updated since the last time backwards-incompatible changes were made so you end up down a rabbit hole of trying to debug someone else's wrong code.
The only thing letting me write effective golang at this point is that VSCode can autocomplete a lot of method calls, API calls, and so on, and then tell me what parameters they need, but even then I'm just guessing about what function might exist and what it might be called.
The language itself is okay and the more I use it the more I understand why they implemented all the stuff I hate (like a lack of proper error handling leading to half of my lines of code being boilerplate `if err != nil` blocks), but if the tooling around it wasn't so good no one would take it remotely seriously.
You're intended to run gofmt on every save. golang is designed to be a sort of straight-jacket that forces everyone to write code in the same way (style etc) so that the junior devs can understand it clearly.
And so that people (of any level) don't bikeshed over silly things like tabs and spaces or { and newlines.
I really like this about go - that it formats code for you, and miss it in other languages where we have linters but not formatters, which is a terrible idea IMO.
You can force everyone on a project to use a formatter just as easily in any of those languages as you can in Go, with a few lines in your CI job definitions. Whether they're third-party or not is irrelevant.
I’m convinced there’s a contingent of devs who don’t like/grok abstraction. And it overlaps partially with stated goals of an easy language to onboard inexperienced devs with.
Nothing wrong with that, but it will probably never work for me. Newer versions of Java are much more enjoyable to work with versus Go.
> I’m convinced there’s a contingent of devs who don’t like/grok abstraction.
I am one of those. I grok abstractions just fine (have commercially written idiomatically obtuse Scala and C#, some Haskell for fun, etc.), but I don't enjoy them.
I use them, of course (writing everything in raw asm is unproductive for most tasks), but rather than getting that warm fuzzy feeling most programmers seem to get when they finish writing a fancy clever abstraction and it works on the first try, I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away, that it is efficient and readable in the sense of being explicit and clear, rather than hiding all the complexity away in order to look pretty or maximize more abstract concerns (reusability, DRY, etc.).
This mindset is a very good fit for writing compute-heavy numerical code, GPU stuff and lots of systems level code, not so much for being a cog in a large team on enterprise web backends, so I mostly write numerical code for physics simulations. You can write many other things this way and get very fast and bloatfree websites or anything else, but it doesn't work well in large teams or people using "industry best practices". It also makes me prefer C to Rust.
> I’m convinced there’s a contingent of devs who don’t like/grok abstraction
I am one of them. I don't like Go, though. Enums and tagged unions aren't abstractions but fundamental features in my book. It's pretty transparent how they look in memory and there's nothing hidden about them.
What does confuse me are things like macros or annotations that magically insert something and make the code incomprehensible. I'm sure it's convenient to use, but it makes my brain try to manually translate it to simple instructions like a foreign language.
In my free time I like using Rust without custom traits (except a few iterators), that's close to the sweet spot for me.
I guess I’m in that camp. I can come up with a good abstraction after working on a problem for a while and refactor it into my code. Or I can come up with a really simple abstraction (eg a Go interface with 2-3 methods), and that usually works well. But I try to avoid starting a project by defining a bunch of abstractions, since I just end up writing loads of boiler plate. Yes, I’m probably doing some things wrong.
Strange that you bracket don't like/don't understand together like that.
The vast majority understand abstractions just fine, though each takes time to understand. However most people like their own abstractions best, and those of other people less. To me hell is living in a world of bad abstractions created by someone else.
Every abstraction created adds to cognitive load when reading the code and to the maintenance burden of that code. So you have an abstraction budget, which is usually in overspent IME and needs to be carefully controlled. Most of the most horrible codebases are horrible because they have too many of the wrong sort of abstraction.
Personally, I don't want to write any new code in something that doesn't have ADTs, or the moral equivalent (Java's sealed classes). I've already written a lifetime of code without them, so I suppose part of that is not wanting to write another 20 years of the same code. :)
Man, Go gets a lot of hate on here. It's certainly not the most flexible language. If I want flexibility + speed, I tend to choose Nim for my projects. But for practical projects that I want other people to be able to pick up quickly, I usually opt for Go. I'm building a whole product manufacturing rendering system for my company, and the first-class parallelism and concurrency made it super pleasant.
I will say that the error propagation is a pain a lot of the time, but I can appreciate being forced to handle errors everywhere they pop up, explicitly.
So much of language opinion is based on people's progression of languages. My progression (of serious professional usage) looked like this:
Java -> Python -> C++ -> Rust -> Go
I have to say, given this progression going to Rust from C++ was wonderful, and going to Go from Rust was disappointing. I run into serious language issues almost daily. The one I ran into yesterday was that defer's function arguments are evaluated immediately (even if the underlying type is a reference!).
I'm curious how one ends up with such ahistorical sequence. I'd expect it to be more aligned with the actual PL history. Mainstream PLs have had a fairly logical progression with each generation solving well understood problems from the previous one. And building on top of the previous generation's abstractions.
Turbo Pascal for education, C as professional lingua franca in mid-90s (manual memory management). C++ was all the rage in late 90s (OOP,STL) . Java got hot around 2003 (GC, canonical concurrency library and memory model). Scala grew in popularity around 2010-2012 (FP for the masses, much less verbosity, mainstream ADTs and pattern matching). Kotlin was cobbled together to have the Scala syntactic sugar without the Haskell-on-the-JVM complexity later.
And then they came up with golang which completely broke with any intellectual tradition and went back to before the Java heyday.
Rust feels like a Scala with pointers so the "C++ => Rust" transition looks analogous to the "Java => Scala" one.
Go is definitely of the “worse is better” philosophy. You can basically predict what someone will think of Go if you know how they feel about that design philosophy.
I remember that famous rant about how Go’s stdlib file api assumes Unix, and doesn’t handle Windows very well.
If you are against “worse is better” like the author, that’s a show stopping design flaw.
If you are for it, you would slap a windows if statement and add a unit test when your product crosses that bridge.
The problem is that most of the time, errors are not to be handled but only bubbled up. I've also seen it in Java with checked exceptions: the more explicit error handling is, the more developers feel they should somehow try to do _something_ with the error when the correct thing to do would actually be to fail in the most straightforward manner. The resulting code is often much heavier than necessary because of this and the stacktraces also get polluted by overwrapping.
The error handling is one of the worst parts of Go for me. Every call that can fail ends up being followed by 3 lines of error handling even if it's just propagating the error up. The actual logic get drowned out.
I would kill for some kind of `err_yield(err)` construct that handles propagating the error if it's the caller's problem to deal with.
That said, I discovered that Go has the ability to basically encapsulate one error inside of another with a message; for example, if you get an err because your HTTP call returned a 404, you can pass that up and say "Unable to contact login server: <404 error here>". But then the caller takes that error and says "Could not authenticate user: <login error here>", and _their_ caller returns "Could not complete upload: <authentication error here>" and you end up with a four-line string of breadcrumbs that is ostensibly useful but not very readable.
Python's `raise from` produces a much more readable output since it amounts to much more than just a bunch of strings that force you to follow the stack yourself to figure out where the error was.
Fast to compile, fast to run, simple cross-compilation, a big standard library, good tooling…
As ugly and ad-hoc as the language feels, it’s hard to deny that what a lot of people want is just good built-in tooling.
I was going to say that maybe the initial lack of generics helped keep compile times low for go, but OCaml manages to have good compile times and generics, so maybe that depends on the implementation of generics (would love to hear from someone with a better understanding of this).
There are a million little decisions that affect compile time. A big factor here is inlining. When you inline functions, you may improve the generated code or you may make it worse. It’s hard to predict the result because the improvements may come about because of various other code transformation passes which you perform after inlining. After inlining, the compiler detects that certain code paths are impossible, certain calls can be devirtualized, etc., and this can enable more inlining.
Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.) The abstractions usually involve a lot of function calls and you need a compiler with aggressive inlining in order to get reasonable performance out of Rust. Usage of generics still results in the same non-virtual calls which can be inlined. But the compiler then has to do a lot of work to evaluate inlining for every instantiation of every generic.
Go is designed with the philosophy of simple abstractions, which may come with a cost. Generics are implemented in a way that means you are still doing a lot of dynamic dispatch. If you need speed in Go, you should be writing the monomorphic code yourself. Generics don’t get instantiated for every single type you use them with. They only get instantiated for every “shape” of type.
The point is that different people have different understandings of "cost." You're correct that that kind of cost is what "zero cost abstractions" means, but there are other costs, like "how long did it take to write the code," that people think about.
Cognitive cost is the most important cost to minimize.
A Rust project's cognitive cost budget comes out of what's left over after the language is done spending. This is true of any language, but many language designers do not discount cognitive costs to zero, which, with the "zero cost abstraction" slogan, Rust explicitly does.
Thank god I'm not the only one. I can still remember when the Go zealots were everywhere (it's cooled down now). Every feature Go didn't have was "too complicated and useless", while the few features it did have were "essential and perfect".
I've really tried giving Go a go, but it's truly the only language I find physically revolting. For a language that's supposed to be easy to learn, it made sooooo many weird decisions, seemingly just to be quirky. Every single other C-ish language declares types either as "String thing" or "thing: String". For no sane reason at all, Go went with "thing String". etc. etc.
I GENUINELY believe that 80% of Gos success has nothing to do with the language itself, and everything to do with it being bankrolled and used by a huge company like Google.
I recommend giving it a second chance. You will at least realize that the "thing string" problem isn't a problem, it's just something you find aesthetically displeasing.
One thing I've learned over the years is that if you go with the grain — not against it — of a language (or any system, really), the design tends to become apparent quicker. "When in Rome," and so forth. Cultural displeasure tends to disappear if you give the native way an earnest chance rather than resisting it. For example, in the beginning, marking identifiers as public by giving them a capital letter struck me as the ugliest thing ever. I don't mind it now. It's never going to be something I love looking at, but it does have the benefit of making declarations' visibility extremely obvious.
I don't think Go's popularity is due to Google at all. Google the company has never really promoted Go (unlike Microsoft with C# and Sun with Java, for example). Go is still treated as a bastard stepchild in many Google projects such as Protobuf/gRPC, Beam, and Google Cloud. The Go team has never seemed very enthusiastic about PR, either. There was that one big redesign of the Go site, but relatively little after that.
I think Go grew by word of mouth more than anything. Projects like Kubernetes, Prometheus, Traefik, etc. helped a lot. Don't forget that it took years for Go to become popular. It wasn't taken very seriously by many in the beginning. Go was not popular within Google until relatively recently. For many years the only serious thing written in Go internally at Google, as I understand it, was the dl.google.com backend.
I believe most successes of languages happen because of corporate backing and tooling/library ecosystem rather than language. It's not like most people are using Java because they are in love with the language features.
Personally I also think that if you removed memory safety overnight from Rust, people will still use it. Rust is appealing not because it's memory safe. For some uses it is, but most people flock to Rust because it offers an alternative to C++ without fifty years of accumulated cruft. Rust is a modern language, with a well working package manager/build tool and a wide ecosystem of libraries for every usecase. Memory safety and other features are just a cherry on the top. If Rust used garbage collection I am sure it would also be very popular just because of those other things.
Other languages like D or Nim tried to fit into that space also, but they don't have the budget to really make it. Most of work on those languages is done by unpaid volunteers, so there's little direction and there's a lot of one man projects.
> It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009).
Is there a term equivalent to "armchair quarterback" in programming? Most programmers are already in armchairs.
It's the equivalent of yelling at the TV that the ultra-successful mega-athlete sucks. I can't imagine the thought process that goes into thinking Ken Thompson, Rob Pike and Robert Griesemers are complete idiots that have no clue of what they were doing.
It was not deliberate, it was ignorance. Time and again, the Go team made comments in various forums for years showing they really knew nothing about programming language development past 2000.
All they knew was C and that they wanted to create a language that compiles faster than C++. That's all.
golang feels like someone wanted to write a "web focused" version of C, but decided to ignore every issue and complaint about C raised in the past 25 years
It's a very simple and straightforward language, which I think is why people like it, but it's just a pain to use. It feels like it fights any attempt at using it to do things optimally or quickly.
Do people actually care that much about languages? I mean, we're here writing English, which is a complete dumpster fire. Go is undeniable perfection compared to the horror that is English. Clearly you and I don't care that much about languages.
I expect people like Go because of its tooling (what also saves English), which was a million miles ahead of the pack when it first came out. Granted, everyone else took notice, so the gap has started to narrow.
> It feels like it fights any attempt at using it to do things optimally or quickly.
Serious question: Is that because you are trying to write code in another language with Go syntax? Go unquestionably requires a unique mental model that doesn't transfer from other languages; even those that appear similar on the surface. Because of that, I posit that it is a really hard language to learn. It is easy to get something working, but I mean truly learn it.
While every programming language requires its own mental model, Go seems to take it to another level (before reaching a completely different paradigm). I expect that is because its lack of features prevents you from papering over "misuse" like is possible in other, more featureful languages, so you feel it right away instead of gradually being able build the right mental model.
Golang was created specifically so that Google could mitigate the downsides from lower their hiring standards. It doesn't have any higher design aspirations.
"The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt." - Rob Pike
I suppose in a sense this is rejecting the "keyboard jockeys", but probably not in the way you mean.
You cannot separate the tool used to solve a problem from the problem itself. The choice of tool is a critical and central consideration.
I think you're giving far too much weight to that off the cuff quote from one of the creators of Go.
Really I think it's more useful to view it as a better C in the less is more tradition, compared to say C++ and Java, which at the time were pretty horrible. That's my understanding of its origin. It makes sense in that context; it doesn't have pretensions to be a super advanced state of the art language, but more of a workhorse, which as Pike pointed out here could be useful for onboarding junior programmers.
Certain things about it I think have proven really quite useful and I wish other languages would adopt them:
* It's easy to read precisely because the base language is so boring
* Programs almost never break on upgrade - this is wonderful
* Fewer dependencies, not more
* Formatters for code
Lots of little things (struct tags for example) I'm not so keen on but I think it's pretty successful in meeting its rather modest goals.
> Really I think it's more useful to view it as a better C
But Go is nothing at all like C, and it's completely unsuitable for most of the situations where C is used. I'm having trouble even imagining what you're getting at with this comparison. The largest areas of overlap I can think of are "vaguely similar syntax style" and "equally bad and outdated type system". Pretty much everything else of substance is different. Go is GC'd, Go has a runtime, etc.
That's saying the same thing. If you give someone the ability to understand a brilliant language, they will turn their attention to the language and away from the problem. That's just human nature. Shiny attracts us. Let's be honest, we all have way more fun diving deep into crafting the perfect type in Haskell. But Pike indicates that if you constrain developers to bounds that they already know, where they can't turn their attention to learning about what is brilliant, then they won't be compelled away from what actually matters – the engineering.
Is there any evidence that the Go style of constraints increases productivity or code quality or other metrics compared to "shiny" languages? I've heard that point repeated many times, but people have done a decent amount of engineering in many other languages too, without the need to be limited like that.
I expect nobody outside of Google has ever truly taken the time to study it. When was the last time you saw a programmer actually research the effectiveness of their tools and not just land on "I like this. It feels right." I never have!
But I'm not sure it matters. Go was created to test the theory, not because a theory was proven. It didn't have to be successful. It may be that the studies didn't happen even within Google, although that is our greatest chance. We do know Google actually cares about data, unlike programmers.
That said, since Go was released it seems every other language has tried to copy it with their own twist, so while that may not come from a place of evidence, it would appear that the feeling of increased productivity[1] was felt.
[1] Or something adjacent. Focusing on engineering isn't necessarily about productivity. You can't discount productivity, but it is not the top engineering concern, especially in a place like Google.
I think one core thing that you have to do with ASTs in Rust is to absolutely not store strings of the like in your AST. Instead you want to use something like static string libraries (so you get cheap clones and interning) and for things like positions in text you want to use just indices. Absolutely avoid storing references if you can avoid it!
The more your stuff is held in things that are cheap/free to clone, the less you have to fight the borrow checker… since you can get away with clones!
And for actual interpretation there’s these libraries that can help a lot for memory management with arenas and the like. It’s super specialized stuff but helps to give you perf + usability. Projects like Ruffle use this heavily and it’s nice when you figure out the patterns
Having said that OCaml and Haskell are both languages that will do all of this “for free” with their built in reference counting and GC… I just like the idea of going very fast in Rust
I've been writing a lot of Golang in the last year and I wouldn't use it for writing a parser. It's just a modernised C, the model it provides is very simple (coming from C# the simplicity actually made it harder to learn!) and is very well suited to small, focused applications where a low conceptual load are beneficial and the trade off of verbosity are acceptable.
F# or even the latest version of C# are what I would recommend. Yes Microsoft are involved but if you're going to live in a world where you won't touch anything created by evil corporations then you're going to have a hard time. Java, Golang, Python, TypeScript/Javascript and Swift all suffer from this. That leaves you with very little choice.
I'd be interested in hearing your thoughts over OCaml after a year or so of using it. The Haskell-likes are very interesting but Haskell itself has a poor learning curve / benefit ratio for me (Rust is similar there actually; I mastered C# and made heavy use of the type system but that involved going very very deep into some holes and I don't have the time to do that with Rust).
F# and Ocaml are still functionally identical to the point many programs would compile in either right? F# Ocaml and Rust seem a lot more similar to me than any of them are to haskell, or go for that matter. I like Haskell, but my brain hasn't made thinking that way native yet.
I wouldn't call Go a 'server side' language. The Go compiler is written in Go, for example! Cross compilation and (relatively) small binaries make it super easy for distribution. Syntax sugar is a fair point though, it doesn't lend itself to functional-y pattern matching.
Do you know how they avoid the GC in the Go implementation of the Go compiler? If I understand correctly they need to implement the Go garbage collector in their Go implementation of the Go compiler. But Go already has a garbage collector. So how do they avoid invoking Go's garbage collector so that they can implement the garbage collector of the Go language they are implementing?
Not sure if I'm making sense but I'd like to know more about this from those who understand this more than I do.
We can think of the Compiler as a function from a string to a string - high level (HLC), to low level code(LLC). LLC can include the garbage collection code(if it is run as a standalone executable instead of garbage collection being done by a separated runtime).
The compiler executable itself is running in a compilation process P which uses memory and has its own garbage collection. (The compiler executable was itself generated by a compilation, using a compiler written in Go itself(self-hosting) or initially, in another language).
But the compilation process P is unrelated to the process Q in which the generated code, LLC, will run when first executed. The OS which runs LLC doesn't even know about the compiler - LLC is just another binary file. The garbage collection in P doesn't affect garbage collection in Q.
Indeed, it should be easy for the compiler to generate an assembly program which constantly keeps allocating more memory until the system runs out, while compiling say a loop which allocates a struct within a loop running a billion times. Unless, of course, you explicitly also generate a garbage collector as part of the low level code.
Your question does become very interesting in the realm of security, there is a famous paper called "Trusting Trust" where a compiled compiler can still have backdoors even if the compiled code is trustworthy and the compiler code is trustworthy but the code which compiled the compiler had backdoors.
Remember that a compiler generates an executable file (can almost be thought of as an ASM transpiler), this file must contain everything the language needs to operate (oversimplification) so that includes the runtime as well as the compiled instructions from the user's code. This is compared to an interpreter which doesn't require you to pack all the implementation details into a binary, so instead you can use the host language's runtime.
All this to say: the output of a compiler is by necessity not tied to the language the compiler is written in, instead it is tied to the machine the executable should run on. A compiler "merely" translates instructions from a high level language to a machine executable one. So stuff like a GC must be coded, compiled and then "injected" into the binary so the user's code can interact with it. In an interpreted language this isn't necessary, since the host language is already running and contains these tools which would otherwise have to be injected into the binary.
They just use the implementation from the last version of the compiler, which you can follow back in a long chain to the first implementation. As for the implementation of the garbage collector, it probably just doesn't allocate anything. The basics of a garbage collector are a function "alloc" and another one "collect". The function to allocate memory usually looks something like this:
> They just use the implementation from the last version of the compiler
The garbage collector isn’t part of the compiler, it’s part of the runtime. It’s worth being clear about this distinction because I think it’s the root of the OP’s confusion.
How does clang, a C++ compiler that is itself written in C++, use <feature from C++> that it is itself implementing?
Why wouldn’t it be able to?
I don’t understand how your question specifically relates to garbage collection, or why the compiler would need to avoid it. The Go compiler is a normal Go program and garbage collection works in it the same way it does in any other Go program.
I’ve never used Go myself, but according to this https://go.dev/doc/install/source you need a Go compiler to compile Go. However, for the early versions, you needed a C compiler to compile Go.
So at some point, someone wrote enough of a Go GC in C to support enough of Go to compile itself.
I think they're asking how the code in the Go runtime (not the compiler, that being an interesting but also maybe non-obvious distinction!) that implements the garbage collector, a core feature of the language, avoids needing the garbage collector to already exist to be able to run, being written in the language that it's a core feature of. I suspect the answer is just something like "by very carefully not using language features that might tempt the compiler to emit something that requires an allocation". I think it's a fair question as it's not really obvious that that's possible--do you just avoid calling make() and new() and forming pointers to local variables that might escape? Do you need to run on a magical goroutine that won't try to grow its stack with gc-allocated segments? Can you still use slices (probably yes, just not append() or the literal syntax), closures (probably only trivial ones without local captures?), maps (probably no)...?
I think the relevant code is https://github.com/golang/go/blob/master/src/runtime/mgc.go and adjacent files. I see some annotations like //go:systemstack, //go:nosplit, //go:nowritebarrier that are probably relevant but I wouldn't know if there's any other specific requirements for that code.
This is correct but it's never as hard as it seems.
First, that is a problem only for the very first version of X. Then you use X for version X+1.
Second, building from source usually doesn't mean having to build every single dependency. Some .so or .dll are already in the system. Only when one has to build everything from scratch the first step would have to solve the original X from X problem but I think that even a Gentoo full system build doesn't start with a user setting in bytes in RAM with switches (?), setting the program counter of the CPU and its registers to eventually start the bootstrap process.
Well now I've got to go check out the go compiler! That sounds really interesting. I was mainly referring to go having a lot more developed concurrency features, which while they're great I didn't really want to use them for my toy language, it seemed like I was throwing away a lot of what makes golang great just because of the nature of my project.
The rest of the golang ecosystem I found really nice actually, and imo it had a really great set of tools for reading/writing to files - and also I like that everything is apart of the go binary, it certainly is easier than juggling between opam and dune (used for OCaml for example).
That's fair, the concurrency features are very handy though optional of course.
The ecosystem and tooling are great, probably the best I've worked with. But the main reason I reach for Go is that it's got tiny mental overhead. There's a handful of language features so it becomes obvious what to use, so you can focus on the actual goal of the project.
There are some warts of course. Heavy IO code can be riddled with err checks (actually, why I find it a bit awkward for servers). Similarly the stdlib is quite verbose when doing file system manipulation, I may try https://github.com/chigopher/pathlib because Python's pathlib is by far my favourite interface.
Lots of folks use Golang on the client side, even on mobile (for which Go has really great support with go-mobile). Of course it adds around 10-20 MB to your binary and memory footprint but in todays world that's almost nothing. I think Tailscale e.g. uses Golang as a cross-platform Wireguard layer in their mobile and desktop apps, seems to work really well. You wouldn't build a native UI using Golang of course but for low-level stuff it's fantastic. Tinygo even allows you to write Golang for microcontrollers or the web via Webassembly, lots of things aren't supported there but a large part of the standard library is.
Saying a the language adds 10-20 mb and go on to say it's almost nothing is avoiding the issue raised.
The footprint always matters and we should use the right tool for the right job.
Go is not a great language to write parsers in IMO, it just doesn't have anything that makes such a task nice. That being said, people seem to really dislike Go here, which is fine, but somewhat surprising. Go is extremely simple. If you take a look at it's creators pedigree, that should make a ton of sense: they want you to make small, focused utilities and systems that use message passing (typically via a network) as the primary means of scaling complexity. What I find odd about this is that it was originally intended as a C++ replacement.
Message passing is an horrible means of scaling complexity, unless you are so big and have so many developers working with you that you can't use anything else.
I love Go for writing servers. And in fact, I do it professionally. But I totally agree that for parsers, it’s not the right tool for the job.
First off, the only way to express union types is with runtime reflection. You might as well be coding in Python (but without the convenient syntax sugar).
Second off, “if err != nil” is really terrible in parsers. I’m actually somewhat of a defender of Go’s error handling approach in servers. Sure, it could have used a more convenient syntax. But in servers, I almost never return an error without handling it or adding additional context. The same isn’t true in parser’s though. Almost half of my parser code was error checks that simply wouldn’t exist in other languages.
For Rust, I think the value proposition is if you are also writing a virtual machine or an interpreter, your compiler front end can be written in the same language as your backend. Your other alternatives are C and C++, but then you don’t have sum types. You could write the front end in Ocaml, but then you would have to write the backend and runtime in some other language anyways.
The borrow checker is definitely a pain, but it stops being such a pain once you design your types around ownership and passing around non-owned pointers or references or indexes.
This. I've found the same, being effective in Rust really requires that you change your way of thinking about your data structures (and code layout). Once I realized that, I was no longer fighting the borrow checker and I've been about to build complex code that more or less worked immediately. As I look back on it, I think what a pain it would have been to write and debug in C, although doing it in C would appear to be "easier".
Did you discover Scala 3 and give it a thought? I think of it as Rust with an _overall_ stronger type-system, but where you don't have to worry about memory management. It has an amazing standard library, particularly around collections. You get access to the amazing JVM ecosystem. And more. Martin Odersky in fact sees Scala's future lying in being a simpler Rust.
Also, regarding F#. It runs on .NET, and indeed, since the ecosystem and community are very small, you need to rely on .NET (basically C#) libraries. But it's really not "tied" to Microsoft and is open source.
OCaml has been pretty common tool to write parsers for many years. Not a bad choice.
I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.
I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.
The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:
The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:
From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.
Thanks for your comment, you've given me a lot to chew on and I think I'll need to bookmark this page.
> I've written parsers professionally with Rust for two companies now
If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
> Especially if you need to do runtime parsing in a network service serving large traffic.
I almost expected something like this, it just makes sense with how the language is positioned. I'm not sure if you've been following cloudflare's pingora blogs but I've found them very interesting because of how they are able to really optimise parts of their networking without looking like a fast-inverse-sqrt.
> There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
> I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that.
Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
> If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
You do not need to write programming languages to need parsers and lexers. My last company was Prisma (https://prisma.io) where we had our own schema definition language, which needed a parser. The first implementation was nested structures and reference counting, which was very buggy and hard to fix. We rewrote it with the index/walker strategy described in my previous comment and got a significant speed boost and the whole codebase became much more stable.
The company I'm working for now is called Grafbase (https://grafbase.com). We aim to be the fastest GraphQL federation platform, which we are in many cases already due to the same design principles. We need to be able to parse GraphQL schemas, and one of our devs wrote a pretty fast library for that (also uses Logos):
And we also need to parse and plan the operation for every request. Here, again, the ID-based model works miracles. It's fast and easy to work with.
> I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
These are suddenly _very annoying_ to work with. If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do. It's also not great for CPU caches if your data is too nested. Keep everything flat and sorted. In the beginning it's a bit more work and thinking, but it scales much better when your project grows.
> Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
You're already on the right path if you're interested in Ocaml. Keep going.
I should've expected prisma! It's actually my main "orm" for my TS web projects, so thanks for that! Also grafbase seems interesting, I've had my fair share of issues with federated apollo servers so it'd be interesting to check out.
> If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do.
You're literally describing my variable environment, eventually I just said fuggit and added a bunch of unsafe code to the core of it just to move past these issues.
There's also the phenomenal pest library. It probably wouldn't be as fast, but I've found that usually parsing a performance critical part of a system. If it is, a manually writing the parser is definitely the way to go.
> Especially if you need to do runtime parsing in a network service serving large traffic
Yeah, that's the focus of it, and the thing you can use Rust well.
All the popular Rust parsing libraries aren't even focused on the use that most people use "parser" to name. They can't support language-parsing at all, but you only discover that after you spent weeks fighting with the type-system to get to the real PL problems.
Rust itself is parsed by a set of specialized libraries that won't generalize to other languages. Everything else is aimed at parsing data structures.
There is also the rust-analyzer which is a separate binary. Should compile with a stable rust compiler. I remember reading it's source together with the zig compiler. Both are quite impressive codebases.
I agree on F#. It changed my C && OO perspective in fantastic ways, but I too can't support anything Microsoft anymore.
But, seeing as OCaml was the basis for F#, I have a question, though:
Does OCaml allow the use of specifically sized integer types?
I seem to remember in my various explorations that OCaml just has a kind of "number" type. If I want a floating point variable, I want a specific 32- or 64- or 128-bit version; same with my ints. I did very much like F# having the ability to specify the size and signedness of my int vars.
F# is a far better option from a practical standpoint when compared to alternatives. By simple virtue of using .NET and having access to very wide selection of libraries that make it a non-issue when deciding to solve a particular business case. It also has an alternate compiler Fable which can target JS allowing the use of F# in front-end.
Other options have worse support and weaker tooling, and often not even more open development process (e.g. you can see and contribute to ongoing F# work on Github).
This tired opinion ".net bad because microsoft bad" has zero practical relevance to actually using C# itself and even more so F# and it honestly needs to die out because it borders on mental illness. You can hate microsoft products, I do so too, and still judge a particular piece of techology and the people that work on it on their merits.
You wouldn't be losing FP niceness with Zig, and the pattern matching and enum situation is also similar to Rust. Even better, in a few areas, for example arbitrary-width integers and enum tagging in unions/structs. Writing parsers and low level device drivers is actually quite comfortable in Zig.
I had a similar journey of enlightenment that likewise led me to OCaml. Unless you're doing low-level systems programming, OCaml will give you the "if it compiles, it's probably right" vibe with much less awkward stuff to type.
With some patience and practice, I think reasoning about borrows becomes second nature. And what it buys you with lexing/parsing is the ability to do zero-copy parsing.
FSharp is OCaml to great extent. So if you don't have the need to stay away from MS/.NET, it is more 'open source' than the rest of MS products. MS did release Fsharp with an Open Source License.
But, it does still run on .NET.
At this point, isn't every major language controlled by one main corporate entity?
Except Python? But Python doesn't have algebraic types, or very complete pattern matching.
I think you missed something if you felt the borrow checker made things too hard. You can just copy and move on. Most languages do less efficient things anyway.
Oh no, you're right - especially looking at my last few commits this is very much what some parts of the project became. And when I was looking at it I felt like I was throwing away so much of the goodness Rust provides and it really irritated me.
Looking at Primeys' comment he actually gave some really interesting suggestions on how to manage this without needing Rc / weak pointers or copying loads of dynamic memory all over the place. Instead you have a flat structure of copy-able elements, giving you better cache locality and a really easy way to work with them.
> I considered something like F# but I didn't like that it's tied to microsoft/.NET.
Could you explain your thought process when deciding to not use F# because it runs on top of .NET? (both of which are open-source, and .NET is what makes F# fast and usable in almost every domain)
I am genuinely curious too. .NET is a very mature, very performant runtime, and I think of F#, a beautiful, productive language, running on it a big pro. Perhaps things used to be different about/regarding Microsoft?
Yeah. I'm having so much fun with F# that I absolutely did not anticipate. Sure, it's something everyone using .NET knows about but I genuinely underestimated it and wish more people gave it a try. Such a good language.
As for the hate - my pet theory is that developers need something like a sacrificial lamb to blame their misfortunes on, and a banner to rally under which often happens to be "against that other group" or "against that competing language", and because .NET is platform that happens to be made by microsoft and is a host for two very powerful multi-paradigm languages causes it to be a point of contention for many. From what I've seen, other languages do not receive so much undeserved hate and here on HN some like Go, Ruby or BEAM family receive copious amount of unjustified praise not rooted in technical merits.
Having written probably several hundred kloc of both Haskell and OCaml, I strongly prefer Haskell. A very simple core concept wrapped in an extremely powerful shell. Haskell is a lot better for parsing tasks because (among other considerations) its more powerful type system can better express constraints on grammars.
This is, to me, an odd way to approach parsing. I get the impression the author is relatively inexperienced with Rust and the PL ideas it builds on.
A few notes:
* The AST would, I believe, be much simpler defined as an algebraic data types. It's not like the sqlite grammar is going to randomly grow new nodes that requires the extensibility their convoluted encoding requires. The encoding they uses looks like what someone familiar with OO, but not algebraic data types, would come up with.
* "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.
* The work on parser combinators would be a good place to start to see how to structure parsing in a clean way.
> I get the impression the author is relatively inexperienced
The author never claimed to be an experienced programmer. The title of the blog is "Why I love ...". Your notes look fair to me, but calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great. Experience will come.
If someone didn't study the state of the art of tokenising and parsing and still wants to write about it, it's absolutely ok to call it out as being written by someone who has only a vague idea of what they're talking about.
It's definitely necessary: it provides an answer for those who do have knowledge about parsing, read this and wonder why didn't the author do this other often used practice instead.
Strongly disagree. There should be a higher standard of articles. This amateur "look what I can do" is just noise. Here's an idea, don't tell the world about what you've done unless it is something new. We don't care and it wastes our time and fills the internet with shit. Not everyone deserves a medal for pooping.
Posts on here sometimes come from the world expert, and sometimes from enthusiastic amateurs.
I wrote a compiler in school many years ago, but besides thinking "this project is only one a world class expert or an enthusiastic amateur would attempt", I wasn't immediately sure which I was dealing with.
> * "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.
In the context of the blog post, he wants to generate structure definitions.
This is not possible with functions.
I don't know. Having written a small parser [0] for Forsyth-Edwards chess notation [1] Haskell takes the cake here in terms of simplicity and legibility; it reads almost as clearly as BNF, and there is very little technical ceremony involved, letting you focus on the actual grammar of whatever it is you are trying to parse.
Yes, a long time ago [0]. Depending on your needs, stack might still have advantages as the direct tool used by the developer (as it uses cabal underneath anyway).
You don't need switching to Stack (as other commenters suggest) to have isolated builds and project sandboxes etc. If you want to bootstrap a specific compiler version, a-la nvm/pyenv/opam, use GHCup with Cabal project setup: https://www.haskell.org/ghcup/
I like haskell a lot, but it's not like there's any shortage of reasons why people don't use it. Replicating parser-combinators in other languages is a huge win.
HN "bros" (in the ugliest sense of the word "bro") showing their sick nature by viciously downvoting a perfectly innocuous comment.
Seems like many of them have nothing better to do, probably because of layoffs and statistics and linear algebra and 'predicting the next token' (hee hee) in the input, based on gigantic corpuses of data, masquerading as "AI", and many of those bros were/are worthless anyway.
Whoops, it was a typo. I do know how to use the sed command, at least the basics; see my previous use of it ( https://news.ycombinator.com/item?id=42084984 ). But thanks, good catch.
But this is not unaided Haskell, it's a parser combinator library, isn't it?
Do you see an obvious reason why a similar approach won't work in Rust? E.g. winnow [1] seems to offer declarative enough style, and there are several more parser combinator libraries in Rust.
data Color = Color
{ r :: Word8
, b :: Word8
, c :: Word8
} deriving Show
hex_primary :: Parser Word8
hex_primary = toWord8 <$> sat isHexDigit <*> sat isHexDigit
where toWord8 a b = read ['0', 'x', a, b]
hex_color :: Parser Color
hex_color = do
_ <- char '#'
Color <$> hex_primary <*> hex_primary <*> hex_primary
Sure, it works in Rust, but it's a pretty far cry from being as simple or legible - there's a lot of extra boilerplate in the Rust.
I think it's a stretch to call parser combinator code in Haskell simple or legible. Most Haskell code is simple and legible if you know enough Haskell to read it, but Haskell isn't exactly a simple or legible language.
Haskell demonstrates the use of parser combinators very well, but I'd still use parser combinators in another language. Parser combinators are implemented in plenty of languages, including Rust, and actually doing anything with the parsed output becomes a lot easier once you leave the Haskell domain.
I'd say Haskell is even simpler than Rust: the syntactic sugar of monads/do-notation makes writing parsers easy. The same sugar transfers to most other problem domains.
But it doesn't take much to go from 0 to a parser combinator library. I roll my own each year for advent of code. It starts at like 100 lines of code (which practically writes itself - very hard to stray outside of what the types enforce) and I grow it a bit over the month when I find missing niceties.
I have the experience of writing parsers (lexers) in Ragel, using Go, Java C++, and C.
I must say, once you have some boilerplate generator in place, raw C is as good as the Rust code the author describes. Maybe even better because simplicity.
For example, this is the most of code necessary to have a JSON parser:
https://github.com/gritzko/librdx/blob/master/JSON.lex
So, just to kick this off: I wrote an eBPF disassembler and (half-hearted) emulator in Rust and I also found it a pleasant language to do parsing-type stuff in. But: I think the author cuts against their argument when they manage to necessitate a macro less than 1/6th of the way into their case study. A macro isn't quite code-gen, but it also doesn't quite feel like working idiomatically within the language, either.
Again: not throwing shade. I think this is a place where Rust is genuinely quite strong.
{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import qualified Data.Text as T
type ParseError = String
csgParse :: T.Text -> Either ParseError Int
csgParse = eitherResult . parse parser where
parser = do
as <- many' $ char 'a'
let n = length as
count n $ char 'b'
count n $ char 'c'
char '\n'
return n
ghci> csgParse "aaabbbccc\n"
Right 3
You used monadic parser, monadic parsers are known to be able to parse context-sensitive grammars. But, they hide the fact that they are combiinators, implemented with closures beneath them. For example, that "count n $ char 'b'" can be as complex as parsing a set of statements containing expressions with an operator specified (symbol, fixity, precedence) earlier in code.
In Haskell, it is easy - parameterize your expression grammar with operators, apply them, parse text. This will work even with Applicative parsers, even unextended.
Using parser combinator library "nom", this should probably do what you'd want:
fn parse_abc(input: &str, n: usize) -> IResult<&str, (Vec<char>, Vec<char>, Vec<char>)> {
let (input, result) = tuple(( many_m_n(n, n, char('a')),
many_m_n(n, n, char('b')),
many_m_n(n, n, char('c'))
))(input)?;
Ok((input, result))
}
It parses (the beginning of) the input, ensuring `n` repetitions of 'a', 'b', and 'c'. Parse errors are reported through the return type, and the remaining characters are returned for the application to deal with as it sees fit.
I can take my parser combinator library that I use for high-level compiler parsers, and use that same library in a no-std setting and compile it to a micro-controller, and deploy that as a high-performance protocol parser in an embedded environment. Exact same library! Just with fewer String and more &'static str.
So toying around with compilers translates my skill-set rather well into doing embedded protocol parsers.
That talk is great, but I remember some discussion later about Go actually NOT using this technique because of goroutine scheduling overhead and/or inefficient memory allocation patterns? The best discussion I could find is [1].
Another great talk about making efficient lexers and parsers is Andrew Kelley's "Practical Data Oriented Design" [2]. Summary: "it explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput".
The parallelism provided by the goroutines caused races and eventually led to abandoning the design in favor of the lexer storing state in an object, which was a more faithful simulation of a coroutine. Proper coroutines would have avoided the races and been more efficient than goroutines.
I feel like that talk has more to do with expressing concurrency, in problems where concurrency is a natural thing to think about, than it does with lexing.
Something that was hard when I wrote a full AST parser in Rust was representing a hierarchy of concrete AST types, with upcasting and downcasting. I was able to figure out a way, but it required some really weird type shenanigans (eg PhantomData) and some macros. Looks like they had to do crazy macros here too
Hmmm, yeah Rust’s ADTs and matching syntax would be great. Until you got to the up/down casting. I’m inexperienced enough in Rust to know if there’s good ways to handle it. Dynamic traits maybe?
[trying to remind myself how this works because it's been a while]
So it's got macros for defining "union types", which combine a bunch of individual structs into an enum with same-name variants, and implement From and TryFrom to box/unbox the structs in their group's enum
ASTInner is a struct that holds the Any (all possible AST nodes) enum in its `details` field, alongside some other info we want all AST nodes to have
And then AST<TKind> is a struct that holds (1) an RC<ASTInner>, and (2) a PhantomData<TKind>, where TKind is the (hierarchical) type of AST struct that it's known to contain
AST<TKind> can then be:
1. Downcast to a TKind (basically just unboxing it)
2. Upcast to an AST<Any>
3. Recast to a different AST<TKind> (changing the box's PhantomData type but not actually transforming the value). This uses trait implementations (implemented by the macros) to automatically know which parent types it can be "upwardly casted to", and which more-specific types it can try and be casted to
The above three methods also have try_ versions
What this means then is you can write functions against, eg, AST<Expression>. You will have to pass an AST<Expression>, but eg. an AST<BooleanLiteral> can be infallibly recast to an AST<Expression>, but an AST<Any> can only try_recast to AST<Expression> (returning an Option<AST<Expression>>)
Another cool property of this is that there are no dynamic traits, and the only heap pointers are the Rc's between AST nodes (and at the root node). Everything else is enums and concrete structs; the re-casting happens solely with that PhantomType, at the type level, without actually changing any data or even cloning the Rc unless you unbox the details (in downcast())
I worked in this codebase for a while and the dev experience was actually quite nice once I got all this set up. But figuring it out in the first place was a nightmare
I'm wondering now if it would be possible/worthwhile to extract it into a crate
Well good luck parsing sqlite syntax! I had to write a (fairly small) subset sqlite parser for work a couple of years ago. I really like sqlite, it's always a source of inspiration.
So how do you debug code written with macros like this, or come into it as a new user of the codebase?
I’m imagining seeing the node! macro used, and seeing the macro definition, but still having a tough time knowing exactly what code is produced.
Do I just use the Example and see what type hints I get from it? Can I hover over it in my IDE and see an expanded version? Do I need to reference the compiled code to be sure?
(I do all my work in JS/TS so I don’t touch any macros; just curious about the workflow here!)
Rust is really several languages, ”vanilla” rust, declarative macros and proc macros. Each have a slightly different capability set and different dialect. You get used to working with each in turn over time.
Also unit tests is generally a good playground area to understand the impacts of modifying a macro.
rust-analyzer, the Rust LSP used in e.g. VSCode, can expand declarative and proc macros recursively.
it isn't too bad, although the fewer proc macros in a code base, the better. declarative macros are slightly easier to grok, but much easier to maintain and test. (i feel the same way about opaque codegen in other languages.)
Imperative rust is really good for parsing, but you can also get a long way with regexes. Especially if you are just prototyping or doing Advent of Code.
I do still like declarative parsing over imperative, so I wrote https://docs.rs/inpt on top of the regex crate. But Andrew Gallant gets all the credit, the regex crate is overpowered.
I think except macros, most of these features are ML family language features as well. Rust stands out because it can implement this in an efficient, zero overhead abstraction way.
There's plenty of complex programming languages out there. Some are worth putting the time into. If you can program well in some other language you can get your head around Rust - give it some time - it's worth it.
Does anyone have a good EBNF notation for Sqlite? I tried to make a tree-sitter grammar, which produces C code and great Rust bindings for it. But they use some lemon parser. Not sure how to read the grammar from that.
Also I'm collecting several LALR(1) grammars here https://mingodad.github.io/parsertl-playground/playground/ that is an Yacc/Lex compatible online editor/interpreter that can generate EBNF for railroad diagram, SQL, C++ from the grammars, select "SQLite3 parser (partially working)" from "Examples" then click "Parse" to see the parse tree for the content in "Input source".
Not EBNF or anything standard, but possibly readable enough. It is an LR(1) grammar that has tested on all the test cases in Sqlite's test suite at the time:
The grammer contains things you won't have seen before, like Prio(). Think of them as macros. It all gets translated to LR(1) productions which you can ask it to print out. LR(1) productions are simpler than EBNF. They look like:
I'll throw in a plug for https://pest.rs/ a PEG-based parser-generator library in Rust. Delightful to work with and removes so much of the boilerplate involved in a parser.
I have been using this tool. The best feature imho is that you can quickly iterate on the grammar in the browser using the online editor in the homepage.
I was struggling though with the lack of strong typing in the returned parse tree, though I think some improvements have beenade there which I did not have a chance to look into yet
I cannot agree less, C++ is the best and always will be. You youngsters made up this new dialect that can also compile with the C++ compiler. This is like people putting VS Code in dark mode thinking they're now also working in the Terminal like the Gods of Binary.
I expect they are thinking of the "Safe C++" proposal P3390. This proposes to provide the syntax and other features needed to grant (a subset of the future) C++ the same safety properties as safe Rust via an equivalent mechanism (a borrow checker for C++ and the lifetime annotations to drive it, the destructive move, the nominative typing and so on).
Much as you might anticipate (although perhaps its designer Sean Baxter did not) this was not kindly looked upon by many C++ programmers and members of WG21 (the C++ committee)
The larger thing that "Safe C++" and the reaction to it misses is that Rust's boon is its Culture. The "Safe C++" proposal gives C++ a potential safety technology but does not and cannot gift it the accompanying Safety Culture. Government programmes to demand safety will be most effective - just as with other types of safety - if they deliver an improved culture not just technological change.
That sounds significantly more like C++ trying to be a dialect of Rust, rather than the other way around. I don't think that was the GGP's main gripe.
But more importantly, Safe C++ is just not a thing yet. People seem to discount the herculean effort that was required to properly implement the borrow checker, the thousands of little problems that needed to be solved for it to be sound, not to mention a few really, really hard problems, like variance, lifetimes in higher-kinded trait bounds, generic associated types, and how lifetimes interact with a Hindley-Milner type system in general.
Not trying to discount Safe C++'s efforts of course. I really hope they, too, succeed. I also hope they manage to find a syntax that's less... what it is now.
I don't think Safe C++ has a Hindley-Milner type system? I think it's just the "Just the machine integers wearing funny hats†" types from C which were passed on to C++
In K&R C this very spartan type system makes some sense, there's no resources, you're on a tiny Unix machine, you'd otherwise be grateful for an assembler. In C++ it does look kinda silly, like an SUV with a lawnmower engine. Or one of those very complicated looking board games which turns out to just be Snakes and Ladders with more steps.
But I don't think Safe C++ fixes that anyhow.
† Technically maybe the C pointer types are not just the integers wearing a funny hat. That's one of many unresolved soundness bugs in the language, hence ISO/IEC DTS 6010 (which will some day become a TR)
No, Safe C++ does not have that type system. I was just trying to emphasize the amount of, let's be honest, downright genius that had to go into that lifetime specification and borrow checker implementation.
For C++, it'll be about cramming lifetimes into diamond-inheritance OOP, which... feels even harder.
Safe C sounds like a much, much more believable project, if such a proposal were to exist.
About 2 months ago I would have said the same as the author, but I kept running against the hard edges of Rust: the borrow checker. I realised that while I really liked using algebraic data types (e.g. Enums) and pattern matching, the borrow checker and the low level memory concerns meant I spent a lot of time fighting the borrow checker instead of fighting the PL issues at the heart of my project. So while tokenising/parsing was nice, interpreting and typechecking became the bane of my existence
With that realisation I started looking for another more suitable language - I knew the FP aspects of Rust are what I was looking for so at first I considered something like F# but I didn't like that it's tied to microsoft/.NET. Looking a bit further I could have gone with something like Zig/C but then I lose the FP niceness I'm looking for. I also spent a fair amount of time looking at Go, but eventually decided that 1. I wanted a fair amount of syntax sugar, and 2. golang is a server side language, a lot of its features and library are geared towards this use case.
Finally I found OCaml, what really convinced me was seeing the syntax was like a friendly version of Haskell, or like Rust without lifetimes. In fact the first Rust compiler was written in OCaml, and OCaml is well known in the programming language space. I'm still learning OCaml so I'm not sure I can give a fair review yet, but so far it's exactly what I was looking for.
Bringing up goland always annoys me for some reason. Like, it's really practical, it is fast, but not actually low-level, it compiles fast, and most importantly it is very popular and has all the libraries. It seems like I should use it. But I just almost irrationally hate the language itself. Everything about it is just so ugly. It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009). PHP in 2009 was already a more modern and better designed language than goddamn golang. And golang didn't really improve since. I just cannot let it go somehow.
It is worse than that, as Go initially lacked generics (introduced by CLU and ML in 1976), still doesn't do even basic Pascal enumerations (1970) rather the iota/const dance, let alone the 1990's programming language design surface.
I only advocate for it on the scenarios where a garbage collected C is more than enough, regardless of the anti-GC naysayers, e.g. see TamaGo Unikernel.
> still doesn't do even basic Pascal enumerations
The term you are looking for is sum types (albeit in a gimped form in the case of Pascal). Enumerations refer to the value applied to the type, quite literally, and is identical in Pascal as every other language with enumerations, including Go. There is only so much you can do with what is little more than a counter.
I'm fairly sure he's referring to enumerations actually.
Pascal doesn't require case matching of enumerations to be exhaustive, but this can be turned on as a compiler warning in modern Pascal environments, FreePascal / Lazarus and such.
Go only has enums by convention, hence the "iota dance" referred to. I've argued before that this does qualify as "having enums" but just barely.
It wouldn't have been difficult to do a much better job of it, is the thing.
> Pascal doesn't require case matching of enumerations to be exhaustive
Normally in Pascal you would not match on the enumeration at all, but rather on the sum types.
The only reason for enumerations in Pascal (and other languages with similar constructs) is because under the hood the computer needs a binary representation to identify the type, and an incrementing number (an enum) is a convenient source for an identifier. In a theoretical world where the machine is magic you could have the sum types without enums, but in this reality...Thus, yes, in practice it is possible to go around the type system and get the enumerated value out with Ord(foo), but at that point its just an integer and your chance at exhaustive matching is out the window. It is the type system that allows more flexibility in what the compiler can tell you, not the values generated by the enumeration.
> Go only has enums by convention
"Enums by convention" would be manually typing 1, 2, 3, 4, etc. into the code. Indeed, that too is an enumeration, but not as provided by the language. Go actually has enums as a first-class feature of the language[1]. You even say so yourself later on, so this statement is rather curious. I expect you are confusing enums with sum types again.
[1] Arguably Pascal doesn't even have that, only using enums as an implementation detail to support its sum types. Granted, the difference is inconsequential in practice.
Pascal enums are not sum types, because they are not the sum of multiple types. They are an enumeration of discrete values, which is why they're called enums.
Sum types in Pascal are called variant records:
Rust conflates the 'enum' keyword with sum types. Pascal does not do this. One of us is confused about what a sum type is. It isn't me.As for Go, my full opinion on that subject may be found here. If you're... curious. Let's say.
https://news.ycombinator.com/item?id=40224485
> This is the sum type
This is the exact same thing, except in addition to the tag there is also a data component. Yes, this is the more traditional representation of sum types, but having an "undefined" data component is still identifiable as a sum type. It is the tag that makes the union a sum type.
In Typescript terms, which I think illustrates this well, it is conceptually the difference between:
and Which is to say that there is no difference with respect to the discussion here.> They are an enumeration of discrete values
Yes, the "tag" is populated with an enumerator. There is an enumerator involved, that it is certain, but it is outside of the type system as the user sees it. It's just an integer generated to serve as an identifier – an identifier like seen in the above examples – but provided automatically. The additional information you can gain from it, like exhaustive matching, comes at the type level, not the number itself.
> Rust conflates the 'enum' keyword with sum types.
Right, because it too uses an enumerator to generate the tag value. Like Ord(foo) before, you can access the enumerated in Rust with something like
The spoken usage of 'enum' in Rust is ultimately misplaced, I agree. An enumerator is not a type! But it is not wrong in identifying that an enumerator is involved. It conflates 'enum' only in the very same way you have here.There's absolutely a difference between an enumeration and a sum type.
So every language have to implement every features released in the last 50 years?
Not necessarly, but designing for stone age computers isn't ideal either, even C, Fortran and COBOL have progressed during those 50 years.
I expect it to implement features that have become ubiquitous in every other mainstream language from the last 30 years, yes.
But then what purpose would it serve? The last 30 years has brought no lack of new languages, not to mention evolution of older languages. Just use one of them.
The purpose of creating yet another language with Go was to break from what everyone else was doing, to see if a "simple" language would stop developers from playing with fun language toys all day to instead focus on actual engineering.
Arguably it was successful in that.
That's how you end up with C++ and soon C#.
No, the person you were replying to was advocating for the intersection of ubiquitous features. C++ seems to be aiming for the union.
We also end up with C23, Fortran 2023, COBOL 2023, Scheme R7RS,... even those oldies embrace modern ideas.
It’s really not. The strawman that if we add a few features that have stood the test of time in every other language we’ll end up with C++ is just not true. Nobody is proposing adding SFINAE-based conditional compilation, rvalue references, multiple inheritance, or any of the million other Byzantine features that make C++ virtually impossible to use correctly. Adding sum types and a match statement does not necessarily start you down that path.
When we can call it Go++
I know, I'm on the same boat. What I realized is I just need to avoid the companies using Go and I don't really need to be vocal about my dislike. It's not my loss if others find the language useful, but for me it either solves problems I'm not interested in solving, or the language and tooling just does not make it for me.
But, I can always just write Rust and be happy where I am. Or, to be honest, would not be very unhappy with F#, Haskell or Ocaml either.
> I just need to avoid the companies using Go
and they also will avoid you! A monthly go-lang meetup in San Francisco impressed me as the only meetup I have ever been to where no one (in a crowded venue) seemed to want to talk to anyone outside their clique
>What I realized is I just need to avoid the companies using Go
What do you mean, exactly?
I'd imagine not seeking employment at Google is a big part.
Go is definitely used a lot more outside of google as of late.
Anecdotally I would say where a lot of companies would have used Java in the past they are now turning to go for their server-side/backend service implementations.
Also it feels like pretty much everything in the k8s/container/etc. space is go-related, which kind of makes sense.
I've had to use go occasionally and it feels like the language is designed to stop me from achieving my goals.
The standard library is unimpressive (to be generous), it has plenty of footguns like C but none of its flexibility.
Also for some reason parenthesis AND \n are required. So you get the worse of C and python there.
> The standard library is unimpressive (to be generous)
Coming from Python, this is one of the major things that I just can't get past with golang (despite having to use it for work). The standard library has a lot of really interesting/impressive/useful things to cover niche cases, but is missing a lot of what I would consider basic functionality that I keep running into requiring me to go get an external module to solve the problem.
Then, on top of that, the documentation for external modules is extremely terrible. In many cases the best you can get is API documentation in the form of "these are the functions, this is what they take and return" with no explanation of what those values need to be, what the function does with them, and so on; a simple list of functions. In others, there is that plus example code which doesn't work because it hasn't been updated since the last time backwards-incompatible changes were made so you end up down a rabbit hole of trying to debug someone else's wrong code.
The only thing letting me write effective golang at this point is that VSCode can autocomplete a lot of method calls, API calls, and so on, and then tell me what parameters they need, but even then I'm just guessing about what function might exist and what it might be called.
The language itself is okay and the more I use it the more I understand why they implemented all the stuff I hate (like a lack of proper error handling leading to half of my lines of code being boilerplate `if err != nil` blocks), but if the tooling around it wasn't so good no one would take it remotely seriously.
You're intended to run gofmt on every save. golang is designed to be a sort of straight-jacket that forces everyone to write code in the same way (style etc) so that the junior devs can understand it clearly.
And so that people (of any level) don't bikeshed over silly things like tabs and spaces or { and newlines.
I really like this about go - that it formats code for you, and miss it in other languages where we have linters but not formatters, which is a terrible idea IMO.
What mainstream languages don’t have formatters nowadays? Rust has rustfmt, C and related languages have clang-tidy, python has Black…
I mean built in that everyone uses. There are of course third party formatters.
You can force everyone on a project to use a formatter just as easily in any of those languages as you can in Go, with a few lines in your CI job definitions. Whether they're third-party or not is irrelevant.
that's your preference but not universal
I’m convinced there’s a contingent of devs who don’t like/grok abstraction. And it overlaps partially with stated goals of an easy language to onboard inexperienced devs with.
Nothing wrong with that, but it will probably never work for me. Newer versions of Java are much more enjoyable to work with versus Go.
> I’m convinced there’s a contingent of devs who don’t like/grok abstraction.
I am one of those. I grok abstractions just fine (have commercially written idiomatically obtuse Scala and C#, some Haskell for fun, etc.), but I don't enjoy them.
I use them, of course (writing everything in raw asm is unproductive for most tasks), but rather than getting that warm fuzzy feeling most programmers seem to get when they finish writing a fancy clever abstraction and it works on the first try, I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away, that it is efficient and readable in the sense of being explicit and clear, rather than hiding all the complexity away in order to look pretty or maximize more abstract concerns (reusability, DRY, etc.).
This mindset is a very good fit for writing compute-heavy numerical code, GPU stuff and lots of systems level code, not so much for being a cog in a large team on enterprise web backends, so I mostly write numerical code for physics simulations. You can write many other things this way and get very fast and bloatfree websites or anything else, but it doesn't work well in large teams or people using "industry best practices". It also makes me prefer C to Rust.
>I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away,
"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."
- Antoine de Saint-Exupery
https://www.brainyquote.com/quotes/antoine_de_saintexupery_1...
> I’m convinced there’s a contingent of devs who don’t like/grok abstraction
I am one of them. I don't like Go, though. Enums and tagged unions aren't abstractions but fundamental features in my book. It's pretty transparent how they look in memory and there's nothing hidden about them.
What does confuse me are things like macros or annotations that magically insert something and make the code incomprehensible. I'm sure it's convenient to use, but it makes my brain try to manually translate it to simple instructions like a foreign language.
In my free time I like using Rust without custom traits (except a few iterators), that's close to the sweet spot for me.
I guess I’m in that camp. I can come up with a good abstraction after working on a problem for a while and refactor it into my code. Or I can come up with a really simple abstraction (eg a Go interface with 2-3 methods), and that usually works well. But I try to avoid starting a project by defining a bunch of abstractions, since I just end up writing loads of boiler plate. Yes, I’m probably doing some things wrong.
Sounds about right. Proper abstractions are difficult to get right up front, might as well pull them out only when they're obvious and profitable.
Strange that you bracket don't like/don't understand together like that.
The vast majority understand abstractions just fine, though each takes time to understand. However most people like their own abstractions best, and those of other people less. To me hell is living in a world of bad abstractions created by someone else.
Every abstraction created adds to cognitive load when reading the code and to the maintenance burden of that code. So you have an abstraction budget, which is usually in overspent IME and needs to be carefully controlled. Most of the most horrible codebases are horrible because they have too many of the wrong sort of abstraction.
Everyone lands at a different spot.
Personally, I don't want to write any new code in something that doesn't have ADTs, or the moral equivalent (Java's sealed classes). I've already written a lifetime of code without them, so I suppose part of that is not wanting to write another 20 years of the same code. :)
If you don’t like subclasses changing code, isn’t inheritance the real problem?
Man, Go gets a lot of hate on here. It's certainly not the most flexible language. If I want flexibility + speed, I tend to choose Nim for my projects. But for practical projects that I want other people to be able to pick up quickly, I usually opt for Go. I'm building a whole product manufacturing rendering system for my company, and the first-class parallelism and concurrency made it super pleasant.
I will say that the error propagation is a pain a lot of the time, but I can appreciate being forced to handle errors everywhere they pop up, explicitly.
So much of language opinion is based on people's progression of languages. My progression (of serious professional usage) looked like this:
Java -> Python -> C++ -> Rust -> Go
I have to say, given this progression going to Rust from C++ was wonderful, and going to Go from Rust was disappointing. I run into serious language issues almost daily. The one I ran into yesterday was that defer's function arguments are evaluated immediately (even if the underlying type is a reference!).
https://go.dev/play/p/zEQ77TIP8Iy
Perhaps with a progression Java -> Go -> Rust moving to rust could feel slow and painful.
I'm curious how one ends up with such ahistorical sequence. I'd expect it to be more aligned with the actual PL history. Mainstream PLs have had a fairly logical progression with each generation solving well understood problems from the previous one. And building on top of the previous generation's abstractions.
Turbo Pascal for education, C as professional lingua franca in mid-90s (manual memory management). C++ was all the rage in late 90s (OOP,STL) . Java got hot around 2003 (GC, canonical concurrency library and memory model). Scala grew in popularity around 2010-2012 (FP for the masses, much less verbosity, mainstream ADTs and pattern matching). Kotlin was cobbled together to have the Scala syntactic sugar without the Haskell-on-the-JVM complexity later.
And then they came up with golang which completely broke with any intellectual tradition and went back to before the Java heyday.
Rust feels like a Scala with pointers so the "C++ => Rust" transition looks analogous to the "Java => Scala" one.
>I'm curious how one ends up with such ahistorical sequence.
they are all actively in-use.. if gp is earlier in their career, it could all be in last 10 years.
Go is definitely of the “worse is better” philosophy. You can basically predict what someone will think of Go if you know how they feel about that design philosophy.
I remember that famous rant about how Go’s stdlib file api assumes Unix, and doesn’t handle Windows very well.
If you are against “worse is better” like the author, that’s a show stopping design flaw.
If you are for it, you would slap a windows if statement and add a unit test when your product crosses that bridge.
The problem is that most of the time, errors are not to be handled but only bubbled up. I've also seen it in Java with checked exceptions: the more explicit error handling is, the more developers feel they should somehow try to do _something_ with the error when the correct thing to do would actually be to fail in the most straightforward manner. The resulting code is often much heavier than necessary because of this and the stacktraces also get polluted by overwrapping.
I use golang for work and have done a fair amount of Rust programming. Rust feels like the higher level language. This really shouldn't be the case.
go’s error handling patterns, while lacking every established feature that makes it ergonomic, is baffling.
Embarrassing that developers are still forgetting nil pointer checks in 2024.
The error handling is one of the worst parts of Go for me. Every call that can fail ends up being followed by 3 lines of error handling even if it's just propagating the error up. The actual logic get drowned out.
I would kill for some kind of `err_yield(err)` construct that handles propagating the error if it's the caller's problem to deal with.
That said, I discovered that Go has the ability to basically encapsulate one error inside of another with a message; for example, if you get an err because your HTTP call returned a 404, you can pass that up and say "Unable to contact login server: <404 error here>". But then the caller takes that error and says "Could not authenticate user: <login error here>", and _their_ caller returns "Could not complete upload: <authentication error here>" and you end up with a four-line string of breadcrumbs that is ostensibly useful but not very readable.
Python's `raise from` produces a much more readable output since it amounts to much more than just a bunch of strings that force you to follow the stack yourself to figure out where the error was.
Fast to compile, fast to run, simple cross-compilation, a big standard library, good tooling…
As ugly and ad-hoc as the language feels, it’s hard to deny that what a lot of people want is just good built-in tooling.
I was going to say that maybe the initial lack of generics helped keep compile times low for go, but OCaml manages to have good compile times and generics, so maybe that depends on the implementation of generics (would love to hear from someone with a better understanding of this).
There are a million little decisions that affect compile time. A big factor here is inlining. When you inline functions, you may improve the generated code or you may make it worse. It’s hard to predict the result because the improvements may come about because of various other code transformation passes which you perform after inlining. After inlining, the compiler detects that certain code paths are impossible, certain calls can be devirtualized, etc., and this can enable more inlining.
Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.) The abstractions usually involve a lot of function calls and you need a compiler with aggressive inlining in order to get reasonable performance out of Rust. Usage of generics still results in the same non-virtual calls which can be inlined. But the compiler then has to do a lot of work to evaluate inlining for every instantiation of every generic.
Go is designed with the philosophy of simple abstractions, which may come with a cost. Generics are implemented in a way that means you are still doing a lot of dynamic dispatch. If you need speed in Go, you should be writing the monomorphic code yourself. Generics don’t get instantiated for every single type you use them with. They only get instantiated for every “shape” of type.
> Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.)
So when the generated asm is the same between the abstraction and the non-abstraction version, wheres the cost?
The point is that different people have different understandings of "cost." You're correct that that kind of cost is what "zero cost abstractions" means, but there are other costs, like "how long did it take to write the code," that people think about.
Cognitive cost is the most important cost to minimize.
A Rust project's cognitive cost budget comes out of what's left over after the language is done spending. This is true of any language, but many language designers do not discount cognitive costs to zero, which, with the "zero cost abstraction" slogan, Rust explicitly does.
Thank god I'm not the only one. I can still remember when the Go zealots were everywhere (it's cooled down now). Every feature Go didn't have was "too complicated and useless", while the few features it did have were "essential and perfect".
I've really tried giving Go a go, but it's truly the only language I find physically revolting. For a language that's supposed to be easy to learn, it made sooooo many weird decisions, seemingly just to be quirky. Every single other C-ish language declares types either as "String thing" or "thing: String". For no sane reason at all, Go went with "thing String". etc. etc.
I GENUINELY believe that 80% of Gos success has nothing to do with the language itself, and everything to do with it being bankrolled and used by a huge company like Google.
I recommend giving it a second chance. You will at least realize that the "thing string" problem isn't a problem, it's just something you find aesthetically displeasing.
One thing I've learned over the years is that if you go with the grain — not against it — of a language (or any system, really), the design tends to become apparent quicker. "When in Rome," and so forth. Cultural displeasure tends to disappear if you give the native way an earnest chance rather than resisting it. For example, in the beginning, marking identifiers as public by giving them a capital letter struck me as the ugliest thing ever. I don't mind it now. It's never going to be something I love looking at, but it does have the benefit of making declarations' visibility extremely obvious.
I don't think Go's popularity is due to Google at all. Google the company has never really promoted Go (unlike Microsoft with C# and Sun with Java, for example). Go is still treated as a bastard stepchild in many Google projects such as Protobuf/gRPC, Beam, and Google Cloud. The Go team has never seemed very enthusiastic about PR, either. There was that one big redesign of the Go site, but relatively little after that.
I think Go grew by word of mouth more than anything. Projects like Kubernetes, Prometheus, Traefik, etc. helped a lot. Don't forget that it took years for Go to become popular. It wasn't taken very seriously by many in the beginning. Go was not popular within Google until relatively recently. For many years the only serious thing written in Go internally at Google, as I understand it, was the dl.google.com backend.
I believe most successes of languages happen because of corporate backing and tooling/library ecosystem rather than language. It's not like most people are using Java because they are in love with the language features.
Personally I also think that if you removed memory safety overnight from Rust, people will still use it. Rust is appealing not because it's memory safe. For some uses it is, but most people flock to Rust because it offers an alternative to C++ without fifty years of accumulated cruft. Rust is a modern language, with a well working package manager/build tool and a wide ecosystem of libraries for every usecase. Memory safety and other features are just a cherry on the top. If Rust used garbage collection I am sure it would also be very popular just because of those other things.
Other languages like D or Nim tried to fit into that space also, but they don't have the budget to really make it. Most of work on those languages is done by unpaid volunteers, so there's little direction and there's a lot of one man projects.
Whatever reasons are there for using or not the language, tokenising and parsing are absolutely not a problem you want to solve with it.
> It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009).
Is there a term equivalent to "armchair quarterback" in programming? Most programmers are already in armchairs.
It's the equivalent of yelling at the TV that the ultra-successful mega-athlete sucks. I can't imagine the thought process that goes into thinking Ken Thompson, Rob Pike and Robert Griesemers are complete idiots that have no clue of what they were doing.
No one said they are complete idiots.
They made a deliberate decision to design a language that did not take many developments in PL design since the 70's into account.
They had their reasons, which make sense in the context of their employer and their backgrounds.
Many people, myself included, prefer to program with languages that do not focus so much on simplicity
It was not deliberate, it was ignorance. Time and again, the Go team made comments in various forums for years showing they really knew nothing about programming language development past 2000.
All they knew was C and that they wanted to create a language that compiles faster than C++. That's all.
You're talking about Ken Thompson, Rob Pike and Robert Griesemers, among others.
You're not doing yourself any favors.
golang feels like someone wanted to write a "web focused" version of C, but decided to ignore every issue and complaint about C raised in the past 25 years
It's a very simple and straightforward language, which I think is why people like it, but it's just a pain to use. It feels like it fights any attempt at using it to do things optimally or quickly.
> which I think is why people like it
Do people actually care that much about languages? I mean, we're here writing English, which is a complete dumpster fire. Go is undeniable perfection compared to the horror that is English. Clearly you and I don't care that much about languages.
I expect people like Go because of its tooling (what also saves English), which was a million miles ahead of the pack when it first came out. Granted, everyone else took notice, so the gap has started to narrow.
> It feels like it fights any attempt at using it to do things optimally or quickly.
Serious question: Is that because you are trying to write code in another language with Go syntax? Go unquestionably requires a unique mental model that doesn't transfer from other languages; even those that appear similar on the surface. Because of that, I posit that it is a really hard language to learn. It is easy to get something working, but I mean truly learn it.
While every programming language requires its own mental model, Go seems to take it to another level (before reaching a completely different paradigm). I expect that is because its lack of features prevents you from papering over "misuse" like is possible in other, more featureful languages, so you feel it right away instead of gradually being able build the right mental model.
> But I just almost irrationally hate the language itself.
That's the point. It's a rejection of the keyboard jockeys who become more concerned with the code itself than the problem being solved.
Golang was created specifically so that Google could mitigate the downsides from lower their hiring standards. It doesn't have any higher design aspirations.
"The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt." - Rob Pike
I suppose in a sense this is rejecting the "keyboard jockeys", but probably not in the way you mean.
You cannot separate the tool used to solve a problem from the problem itself. The choice of tool is a critical and central consideration.
I think you're giving far too much weight to that off the cuff quote from one of the creators of Go.
Really I think it's more useful to view it as a better C in the less is more tradition, compared to say C++ and Java, which at the time were pretty horrible. That's my understanding of its origin. It makes sense in that context; it doesn't have pretensions to be a super advanced state of the art language, but more of a workhorse, which as Pike pointed out here could be useful for onboarding junior programmers.
Certain things about it I think have proven really quite useful and I wish other languages would adopt them:
* It's easy to read precisely because the base language is so boring * Programs almost never break on upgrade - this is wonderful * Fewer dependencies, not more * Formatters for code
Lots of little things (struct tags for example) I'm not so keen on but I think it's pretty successful in meeting its rather modest goals.
> Really I think it's more useful to view it as a better C
But Go is nothing at all like C, and it's completely unsuitable for most of the situations where C is used. I'm having trouble even imagining what you're getting at with this comparison. The largest areas of overlap I can think of are "vaguely similar syntax style" and "equally bad and outdated type system". Pretty much everything else of substance is different. Go is GC'd, Go has a runtime, etc.
Clearly you see it differently from people who use go, have a nice day.
That's saying the same thing. If you give someone the ability to understand a brilliant language, they will turn their attention to the language and away from the problem. That's just human nature. Shiny attracts us. Let's be honest, we all have way more fun diving deep into crafting the perfect type in Haskell. But Pike indicates that if you constrain developers to bounds that they already know, where they can't turn their attention to learning about what is brilliant, then they won't be compelled away from what actually matters – the engineering.
Is there any evidence that the Go style of constraints increases productivity or code quality or other metrics compared to "shiny" languages? I've heard that point repeated many times, but people have done a decent amount of engineering in many other languages too, without the need to be limited like that.
I expect nobody outside of Google has ever truly taken the time to study it. When was the last time you saw a programmer actually research the effectiveness of their tools and not just land on "I like this. It feels right." I never have!
But I'm not sure it matters. Go was created to test the theory, not because a theory was proven. It didn't have to be successful. It may be that the studies didn't happen even within Google, although that is our greatest chance. We do know Google actually cares about data, unlike programmers.
That said, since Go was released it seems every other language has tried to copy it with their own twist, so while that may not come from a place of evidence, it would appear that the feeling of increased productivity[1] was felt.
[1] Or something adjacent. Focusing on engineering isn't necessarily about productivity. You can't discount productivity, but it is not the top engineering concern, especially in a place like Google.
I think one core thing that you have to do with ASTs in Rust is to absolutely not store strings of the like in your AST. Instead you want to use something like static string libraries (so you get cheap clones and interning) and for things like positions in text you want to use just indices. Absolutely avoid storing references if you can avoid it!
The more your stuff is held in things that are cheap/free to clone, the less you have to fight the borrow checker… since you can get away with clones!
And for actual interpretation there’s these libraries that can help a lot for memory management with arenas and the like. It’s super specialized stuff but helps to give you perf + usability. Projects like Ruffle use this heavily and it’s nice when you figure out the patterns
Having said that OCaml and Haskell are both languages that will do all of this “for free” with their built in reference counting and GC… I just like the idea of going very fast in Rust
I've been writing a lot of Golang in the last year and I wouldn't use it for writing a parser. It's just a modernised C, the model it provides is very simple (coming from C# the simplicity actually made it harder to learn!) and is very well suited to small, focused applications where a low conceptual load are beneficial and the trade off of verbosity are acceptable.
F# or even the latest version of C# are what I would recommend. Yes Microsoft are involved but if you're going to live in a world where you won't touch anything created by evil corporations then you're going to have a hard time. Java, Golang, Python, TypeScript/Javascript and Swift all suffer from this. That leaves you with very little choice.
I'd be interested in hearing your thoughts over OCaml after a year or so of using it. The Haskell-likes are very interesting but Haskell itself has a poor learning curve / benefit ratio for me (Rust is similar there actually; I mastered C# and made heavy use of the type system but that involved going very very deep into some holes and I don't have the time to do that with Rust).
F# and Ocaml are still functionally identical to the point many programs would compile in either right? F# Ocaml and Rust seem a lot more similar to me than any of them are to haskell, or go for that matter. I like Haskell, but my brain hasn't made thinking that way native yet.
I wouldn't use it to write an hello world :D
Python suffers by having been created by an evil corporation?
Have I missed something GvR or his team did?
I wouldn't call Go a 'server side' language. The Go compiler is written in Go, for example! Cross compilation and (relatively) small binaries make it super easy for distribution. Syntax sugar is a fair point though, it doesn't lend itself to functional-y pattern matching.
> The Go compiler is written in Go, for example!
Do you know how they avoid the GC in the Go implementation of the Go compiler? If I understand correctly they need to implement the Go garbage collector in their Go implementation of the Go compiler. But Go already has a garbage collector. So how do they avoid invoking Go's garbage collector so that they can implement the garbage collector of the Go language they are implementing?
Not sure if I'm making sense but I'd like to know more about this from those who understand this more than I do.
We can think of the Compiler as a function from a string to a string - high level (HLC), to low level code(LLC). LLC can include the garbage collection code(if it is run as a standalone executable instead of garbage collection being done by a separated runtime).
The compiler executable itself is running in a compilation process P which uses memory and has its own garbage collection. (The compiler executable was itself generated by a compilation, using a compiler written in Go itself(self-hosting) or initially, in another language).
But the compilation process P is unrelated to the process Q in which the generated code, LLC, will run when first executed. The OS which runs LLC doesn't even know about the compiler - LLC is just another binary file. The garbage collection in P doesn't affect garbage collection in Q.
Indeed, it should be easy for the compiler to generate an assembly program which constantly keeps allocating more memory until the system runs out, while compiling say a loop which allocates a struct within a loop running a billion times. Unless, of course, you explicitly also generate a garbage collector as part of the low level code.
Your question does become very interesting in the realm of security, there is a famous paper called "Trusting Trust" where a compiled compiler can still have backdoors even if the compiled code is trustworthy and the compiler code is trustworthy but the code which compiled the compiler had backdoors.
Remember that a compiler generates an executable file (can almost be thought of as an ASM transpiler), this file must contain everything the language needs to operate (oversimplification) so that includes the runtime as well as the compiled instructions from the user's code. This is compared to an interpreter which doesn't require you to pack all the implementation details into a binary, so instead you can use the host language's runtime.
All this to say: the output of a compiler is by necessity not tied to the language the compiler is written in, instead it is tied to the machine the executable should run on. A compiler "merely" translates instructions from a high level language to a machine executable one. So stuff like a GC must be coded, compiled and then "injected" into the binary so the user's code can interact with it. In an interpreted language this isn't necessary, since the host language is already running and contains these tools which would otherwise have to be injected into the binary.
They just use the implementation from the last version of the compiler, which you can follow back in a long chain to the first implementation. As for the implementation of the garbage collector, it probably just doesn't allocate anything. The basics of a garbage collector are a function "alloc" and another one "collect". The function to allocate memory usually looks something like this:
As you can see, it doesn't need to allocate any memory to do this.> They just use the implementation from the last version of the compiler
The garbage collector isn’t part of the compiler, it’s part of the runtime. It’s worth being clear about this distinction because I think it’s the root of the OP’s confusion.
Sure, but what’ll really bake OP’s noodle is that Go’s GC is written in Go too. :) [0]
[0] https://go.dev/src/runtime/mgc.go
How does clang, a C++ compiler that is itself written in C++, use <feature from C++> that it is itself implementing?
Why wouldn’t it be able to?
I don’t understand how your question specifically relates to garbage collection, or why the compiler would need to avoid it. The Go compiler is a normal Go program and garbage collection works in it the same way it does in any other Go program.
I’ve never used Go myself, but according to this https://go.dev/doc/install/source you need a Go compiler to compile Go. However, for the early versions, you needed a C compiler to compile Go.
So at some point, someone wrote enough of a Go GC in C to support enough of Go to compile itself.
I don't understand the question as it's written.
But the shape of the question feels like you're asking about whether an interpreter (which the compiler is not) uses the GC of the host language?
I think they're asking how the code in the Go runtime (not the compiler, that being an interesting but also maybe non-obvious distinction!) that implements the garbage collector, a core feature of the language, avoids needing the garbage collector to already exist to be able to run, being written in the language that it's a core feature of. I suspect the answer is just something like "by very carefully not using language features that might tempt the compiler to emit something that requires an allocation". I think it's a fair question as it's not really obvious that that's possible--do you just avoid calling make() and new() and forming pointers to local variables that might escape? Do you need to run on a magical goroutine that won't try to grow its stack with gc-allocated segments? Can you still use slices (probably yes, just not append() or the literal syntax), closures (probably only trivial ones without local captures?), maps (probably no)...?
I think the relevant code is https://github.com/golang/go/blob/master/src/runtime/mgc.go and adjacent files. I see some annotations like //go:systemstack, //go:nosplit, //go:nowritebarrier that are probably relevant but I wouldn't know if there's any other specific requirements for that code.
Why would the runtime not be allowed to use GC? I can understand the question being: how do you implement the GC code without using GC?
Yeah that's the code I mean
On a high level the question is "how do you bootstrap X, if you need X to bootstrap X?".
This is correct but it's never as hard as it seems.
First, that is a problem only for the very first version of X. Then you use X for version X+1.
Second, building from source usually doesn't mean having to build every single dependency. Some .so or .dll are already in the system. Only when one has to build everything from scratch the first step would have to solve the original X from X problem but I think that even a Gentoo full system build doesn't start with a user setting in bytes in RAM with switches (?), setting the program counter of the CPU and its registers to eventually start the bootstrap process.
Definitely not making sense. Other answers appear to assume you don't know what a compiler is, but I'm not so sure. Re-state the question perhaps?
Well now I've got to go check out the go compiler! That sounds really interesting. I was mainly referring to go having a lot more developed concurrency features, which while they're great I didn't really want to use them for my toy language, it seemed like I was throwing away a lot of what makes golang great just because of the nature of my project.
The rest of the golang ecosystem I found really nice actually, and imo it had a really great set of tools for reading/writing to files - and also I like that everything is apart of the go binary, it certainly is easier than juggling between opam and dune (used for OCaml for example).
That's fair, the concurrency features are very handy though optional of course.
The ecosystem and tooling are great, probably the best I've worked with. But the main reason I reach for Go is that it's got tiny mental overhead. There's a handful of language features so it becomes obvious what to use, so you can focus on the actual goal of the project.
There are some warts of course. Heavy IO code can be riddled with err checks (actually, why I find it a bit awkward for servers). Similarly the stdlib is quite verbose when doing file system manipulation, I may try https://github.com/chigopher/pathlib because Python's pathlib is by far my favourite interface.
Lots of folks use Golang on the client side, even on mobile (for which Go has really great support with go-mobile). Of course it adds around 10-20 MB to your binary and memory footprint but in todays world that's almost nothing. I think Tailscale e.g. uses Golang as a cross-platform Wireguard layer in their mobile and desktop apps, seems to work really well. You wouldn't build a native UI using Golang of course but for low-level stuff it's fantastic. Tinygo even allows you to write Golang for microcontrollers or the web via Webassembly, lots of things aren't supported there but a large part of the standard library is.
Saying a the language adds 10-20 mb and go on to say it's almost nothing is avoiding the issue raised. The footprint always matters and we should use the right tool for the right job.
It's not ignoring it, it's saying that 20Mb of data isn't really a lot these days which is objectively true for most contexts.
Go is not a great language to write parsers in IMO, it just doesn't have anything that makes such a task nice. That being said, people seem to really dislike Go here, which is fine, but somewhat surprising. Go is extremely simple. If you take a look at it's creators pedigree, that should make a ton of sense: they want you to make small, focused utilities and systems that use message passing (typically via a network) as the primary means of scaling complexity. What I find odd about this is that it was originally intended as a C++ replacement.
Go is simple at the cost of increasing the complexity of stuff written in it.
Message passing is an horrible means of scaling complexity, unless you are so big and have so many developers working with you that you can't use anything else.
I love Go for writing servers. And in fact, I do it professionally. But I totally agree that for parsers, it’s not the right tool for the job.
First off, the only way to express union types is with runtime reflection. You might as well be coding in Python (but without the convenient syntax sugar).
Second off, “if err != nil” is really terrible in parsers. I’m actually somewhat of a defender of Go’s error handling approach in servers. Sure, it could have used a more convenient syntax. But in servers, I almost never return an error without handling it or adding additional context. The same isn’t true in parser’s though. Almost half of my parser code was error checks that simply wouldn’t exist in other languages.
For Rust, I think the value proposition is if you are also writing a virtual machine or an interpreter, your compiler front end can be written in the same language as your backend. Your other alternatives are C and C++, but then you don’t have sum types. You could write the front end in Ocaml, but then you would have to write the backend and runtime in some other language anyways.
The borrow checker is definitely a pain, but it stops being such a pain once you design your types around ownership and passing around non-owned pointers or references or indexes.
This. I've found the same, being effective in Rust really requires that you change your way of thinking about your data structures (and code layout). Once I realized that, I was no longer fighting the borrow checker and I've been about to build complex code that more or less worked immediately. As I look back on it, I think what a pain it would have been to write and debug in C, although doing it in C would appear to be "easier".
Did you discover Scala 3 and give it a thought? I think of it as Rust with an _overall_ stronger type-system, but where you don't have to worry about memory management. It has an amazing standard library, particularly around collections. You get access to the amazing JVM ecosystem. And more. Martin Odersky in fact sees Scala's future lying in being a simpler Rust.
Also, regarding F#. It runs on .NET, and indeed, since the ecosystem and community are very small, you need to rely on .NET (basically C#) libraries. But it's really not "tied" to Microsoft and is open source.
OCaml has been pretty common tool to write parsers for many years. Not a bad choice.
I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.
I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.
The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:
https://crates.io/crates/logos
The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:
Now you can specialize these with type aliases: And implement methods: From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.
Thanks for your comment, you've given me a lot to chew on and I think I'll need to bookmark this page.
> I've written parsers professionally with Rust for two companies now
If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
> Especially if you need to do runtime parsing in a network service serving large traffic.
I almost expected something like this, it just makes sense with how the language is positioned. I'm not sure if you've been following cloudflare's pingora blogs but I've found them very interesting because of how they are able to really optimise parts of their networking without looking like a fast-inverse-sqrt.
> There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
> I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that.
Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
> If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
You do not need to write programming languages to need parsers and lexers. My last company was Prisma (https://prisma.io) where we had our own schema definition language, which needed a parser. The first implementation was nested structures and reference counting, which was very buggy and hard to fix. We rewrote it with the index/walker strategy described in my previous comment and got a significant speed boost and the whole codebase became much more stable.
The company I'm working for now is called Grafbase (https://grafbase.com). We aim to be the fastest GraphQL federation platform, which we are in many cases already due to the same design principles. We need to be able to parse GraphQL schemas, and one of our devs wrote a pretty fast library for that (also uses Logos):
https://crates.io/crates/cynic-parser
And we also need to parse and plan the operation for every request. Here, again, the ID-based model works miracles. It's fast and easy to work with.
> I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
These are suddenly _very annoying_ to work with. If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do. It's also not great for CPU caches if your data is too nested. Keep everything flat and sorted. In the beginning it's a bit more work and thinking, but it scales much better when your project grows.
> Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
You're already on the right path if you're interested in Ocaml. Keep going.
I should've expected prisma! It's actually my main "orm" for my TS web projects, so thanks for that! Also grafbase seems interesting, I've had my fair share of issues with federated apollo servers so it'd be interesting to check out.
> If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do.
You're literally describing my variable environment, eventually I just said fuggit and added a bunch of unsafe code to the core of it just to move past these issues.
There's also the phenomenal pest library. It probably wouldn't be as fast, but I've found that usually parsing a performance critical part of a system. If it is, a manually writing the parser is definitely the way to go.
> Especially if you need to do runtime parsing in a network service serving large traffic
Yeah, that's the focus of it, and the thing you can use Rust well.
All the popular Rust parsing libraries aren't even focused on the use that most people use "parser" to name. They can't support language-parsing at all, but you only discover that after you spent weeks fighting with the type-system to get to the real PL problems.
Rust itself is parsed by a set of specialized libraries that won't generalize to other languages. Everything else is aimed at parsing data structures.
There is also the rust-analyzer which is a separate binary. Should compile with a stable rust compiler. I remember reading it's source together with the zig compiler. Both are quite impressive codebases.
(OCaml Question Ahead)
I agree on F#. It changed my C && OO perspective in fantastic ways, but I too can't support anything Microsoft anymore.
But, seeing as OCaml was the basis for F#, I have a question, though:
Does OCaml allow the use of specifically sized integer types?
I seem to remember in my various explorations that OCaml just has a kind of "number" type. If I want a floating point variable, I want a specific 32- or 64- or 128-bit version; same with my ints. I did very much like F# having the ability to specify the size and signedness of my int vars.
Thanks in advance, OCaml folks.
F# is a far better option from a practical standpoint when compared to alternatives. By simple virtue of using .NET and having access to very wide selection of libraries that make it a non-issue when deciding to solve a particular business case. It also has an alternate compiler Fable which can target JS allowing the use of F# in front-end.
Other options have worse support and weaker tooling, and often not even more open development process (e.g. you can see and contribute to ongoing F# work on Github).
This tired opinion ".net bad because microsoft bad" has zero practical relevance to actually using C# itself and even more so F# and it honestly needs to die out because it borders on mental illness. You can hate microsoft products, I do so too, and still judge a particular piece of techology and the people that work on it on their merits.
You wouldn't be losing FP niceness with Zig, and the pattern matching and enum situation is also similar to Rust. Even better, in a few areas, for example arbitrary-width integers and enum tagging in unions/structs. Writing parsers and low level device drivers is actually quite comfortable in Zig.
I had a similar journey of enlightenment that likewise led me to OCaml. Unless you're doing low-level systems programming, OCaml will give you the "if it compiles, it's probably right" vibe with much less awkward stuff to type.
With some patience and practice, I think reasoning about borrows becomes second nature. And what it buys you with lexing/parsing is the ability to do zero-copy parsing.
FSharp is OCaml to great extent. So if you don't have the need to stay away from MS/.NET, it is more 'open source' than the rest of MS products. MS did release Fsharp with an Open Source License.
But, it does still run on .NET.
At this point, isn't every major language controlled by one main corporate entity?
Except Python? But Python doesn't have algebraic types, or very complete pattern matching.
I still believe that a variant of python that has algebraic types and pattern matching beats Rust for writing parsers quickly.
My effort has been in adding these features to a front end language that transpiles to an underlying FP language, including but not limited to Rust.
I think you missed something if you felt the borrow checker made things too hard. You can just copy and move on. Most languages do less efficient things anyway.
Oh no, you're right - especially looking at my last few commits this is very much what some parts of the project became. And when I was looking at it I felt like I was throwing away so much of the goodness Rust provides and it really irritated me.
Looking at Primeys' comment he actually gave some really interesting suggestions on how to manage this without needing Rc / weak pointers or copying loads of dynamic memory all over the place. Instead you have a flat structure of copy-able elements, giving you better cache locality and a really easy way to work with them.
> I considered something like F# but I didn't like that it's tied to microsoft/.NET.
Could you explain your thought process when deciding to not use F# because it runs on top of .NET? (both of which are open-source, and .NET is what makes F# fast and usable in almost every domain)
I am genuinely curious too. .NET is a very mature, very performant runtime, and I think of F#, a beautiful, productive language, running on it a big pro. Perhaps things used to be different about/regarding Microsoft?
Yeah. I'm having so much fun with F# that I absolutely did not anticipate. Sure, it's something everyone using .NET knows about but I genuinely underestimated it and wish more people gave it a try. Such a good language.
As for the hate - my pet theory is that developers need something like a sacrificial lamb to blame their misfortunes on, and a banner to rally under which often happens to be "against that other group" or "against that competing language", and because .NET is platform that happens to be made by microsoft and is a host for two very powerful multi-paradigm languages causes it to be a point of contention for many. From what I've seen, other languages do not receive so much undeserved hate and here on HN some like Go, Ruby or BEAM family receive copious amount of unjustified praise not rooted in technical merits.
Having written probably several hundred kloc of both Haskell and OCaml, I strongly prefer Haskell. A very simple core concept wrapped in an extremely powerful shell. Haskell is a lot better for parsing tasks because (among other considerations) its more powerful type system can better express constraints on grammars.
This is, to me, an odd way to approach parsing. I get the impression the author is relatively inexperienced with Rust and the PL ideas it builds on.
A few notes:
* The AST would, I believe, be much simpler defined as an algebraic data types. It's not like the sqlite grammar is going to randomly grow new nodes that requires the extensibility their convoluted encoding requires. The encoding they uses looks like what someone familiar with OO, but not algebraic data types, would come up with.
* "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.
* The work on parser combinators would be a good place to start to see how to structure parsing in a clean way.
> I get the impression the author is relatively inexperienced
The author never claimed to be an experienced programmer. The title of the blog is "Why I love ...". Your notes look fair to me, but calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great. Experience will come.
If someone didn't study the state of the art of tokenising and parsing and still wants to write about it, it's absolutely ok to call it out as being written by someone who has only a vague idea of what they're talking about.
>> I get the impression the author is relatively inexperienced
> calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great.
I'll observe that the commenter did not make the value judgement about inexperience that you appear to think they did.
> calling out inexperience is unnecessary IMO
I don't know the author, so it's useful for me to see in the comments that some people think they are not so experienced.
Doesn't mean I won't respect the author at all, it's great that they write about what they do!
"calling out" is too fuzzy a term to be useful. It covers "mentioning" and "accusing". I wouldn't use it, for that reason.
"unnecessary" is the same. Who defines what's necessary? Is Hacker News necessary?
It's definitely necessary: it provides an answer for those who do have knowledge about parsing, read this and wonder why didn't the author do this other often used practice instead.
Strongly disagree. There should be a higher standard of articles. This amateur "look what I can do" is just noise. Here's an idea, don't tell the world about what you've done unless it is something new. We don't care and it wastes our time and fills the internet with shit. Not everyone deserves a medal for pooping.
Posts on here sometimes come from the world expert, and sometimes from enthusiastic amateurs.
I wrote a compiler in school many years ago, but besides thinking "this project is only one a world class expert or an enthusiastic amateur would attempt", I wasn't immediately sure which I was dealing with.
> * "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.
In the context of the blog post, he wants to generate structure definitions. This is not possible with functions.
I don't know. Having written a small parser [0] for Forsyth-Edwards chess notation [1] Haskell takes the cake here in terms of simplicity and legibility; it reads almost as clearly as BNF, and there is very little technical ceremony involved, letting you focus on the actual grammar of whatever it is you are trying to parse.
[0] https://github.com/ryandv/chesskell/blob/master/src/Chess/Fa...
[1] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...
Haskell definitely takes the cake in terms of leveraging parser combinators, but you’re still stuck with Haskell to deal with the result.
That's what they call a "win-win".
For some of us, "being stuck with Haskell" isn't a problem.
For the rest, being stuck with real-world problems instead of self-inflicted ones is preferable :-)
Doesn't seems so, otherwise the computing world wouldn't be so full of NIH syndrome. :)
re-inventing known language features in inferior languages isn't more real-world, it's a self-inflicting kool-aid thirst ^_^
Has Cabal been fixed yet?
Yes, a long time ago [0]. Depending on your needs, stack might still have advantages as the direct tool used by the developer (as it uses cabal underneath anyway).
[0] https://stackoverflow.com/a/51016806/4126514
At least since August 2017: https://downloads.haskell.org/~cabal/Cabal-3.0.0.0/doc/users...
You don't need switching to Stack (as other commenters suggest) to have isolated builds and project sandboxes etc. If you want to bootstrap a specific compiler version, a-la nvm/pyenv/opam, use GHCup with Cabal project setup: https://www.haskell.org/ghcup/
Yes. Use stack [0].
[0] https://docs.haskellstack.org/en/stable/
I like haskell a lot, but it's not like there's any shortage of reasons why people don't use it. Replicating parser-combinators in other languages is a huge win.
As someone who really enjoys Haskell, I used to think like that. But I realized for problems like parsing, it really is just excellent.
don't make it sound as if it's bad, it's actually superb on all these levels: the typelevel, the SMP runtime, and throughput.
$ echo "Haskell" | sed 's/ke/-ki'
Has-kill
$
HN "bros" (in the ugliest sense of the word "bro") showing their sick nature by viciously downvoting a perfectly innocuous comment.
Seems like many of them have nothing better to do, probably because of layoffs and statistics and linear algebra and 'predicting the next token' (hee hee) in the input, based on gigantic corpuses of data, masquerading as "AI", and many of those bros were/are worthless anyway.
| sed '/k/sk'
Has-skill
$
Write the full transform in Haskell?
Whoops, it was a typo. I do know how to use the sed command, at least the basics; see my previous use of it ( https://news.ycombinator.com/item?id=42084984 ). But thanks, good catch.
But this is not unaided Haskell, it's a parser combinator library, isn't it?
Do you see an obvious reason why a similar approach won't work in Rust? E.g. winnow [1] seems to offer declarative enough style, and there are several more parser combinator libraries in Rust.
[1]: https://docs.rs/winnow/latest/winnow/
I think it's a stretch to call parser combinator code in Haskell simple or legible. Most Haskell code is simple and legible if you know enough Haskell to read it, but Haskell isn't exactly a simple or legible language.
Haskell demonstrates the use of parser combinators very well, but I'd still use parser combinators in another language. Parser combinators are implemented in plenty of languages, including Rust, and actually doing anything with the parsed output becomes a lot easier once you leave the Haskell domain.
I'd say Haskell is even simpler than Rust: the syntactic sugar of monads/do-notation makes writing parsers easy. The same sugar transfers to most other problem domains.
The nom crate has an RGB parser example: https://docs.rs/nom/latest/nom/#example
It’s slightly longer, but more legible.
But it doesn't take much to go from 0 to a parser combinator library. I roll my own each year for advent of code. It starts at like 100 lines of code (which practically writes itself - very hard to stray outside of what the types enforce) and I grow it a bit over the month when I find missing niceties.
I wouldn't consider FEN a great parsing example, simply because it can be implement in a simple function with a single loop.
Just a few days ago, I wrote a FEN "parser" for an experimental quad-bitboard impelementation. It almost wrote itself.
P.S.: I am the author of chessIO on Hackage
I have the experience of writing parsers (lexers) in Ragel, using Go, Java C++, and C. I must say, once you have some boilerplate generator in place, raw C is as good as the Rust code the author describes. Maybe even better because simplicity. For example, this is the most of code necessary to have a JSON parser: https://github.com/gritzko/librdx/blob/master/JSON.lex
In fact, that eBNF only produces the lexer. The parser part is not that impressive either, 120 LoC and quite repetitive https://github.com/gritzko/librdx/blob/master/JSON.c
So, I believe, a parser infrastructure evolves till it only needs eBNF to make a parser. That is the saturation point.
That repetitivness can be seen as a downside, not a virtue. And I feel that Rust's ADTs make working with the resulting syntax tree much easier.
Though I agree that a little code generation and/or macro magic can make C significantly more workable.
I love love love ragel.
Won't the code here:
https://github.com/gritzko/librdx/blob/master/JSON.lex
accept "[" as valid json?
(pick zero of everything in JSON except one delimiter...)I usually begin with the RFCs:
https://datatracker.ietf.org/doc/html/rfc4627#autoid-3
I'm not sure one can implement JSON with ragel... I believe ragel can only handle regular languages and JSON is context free.
That is a lexer, so yes, it accepts almost any sequence of valid tokens. Pure Ragel only parses regular languages, but there are ways.
So, just to kick this off: I wrote an eBPF disassembler and (half-hearted) emulator in Rust and I also found it a pleasant language to do parsing-type stuff in. But: I think the author cuts against their argument when they manage to necessitate a macro less than 1/6th of the way into their case study. A macro isn't quite code-gen, but it also doesn't quite feel like working idiomatically within the language, either.
Again: not throwing shade. I think this is a place where Rust is genuinely quite strong.
How can one define an infinite grammar in Rust?
E.g., a context-free rule S ::= abc|aabbcc|aaabbbccc|... can effectively parse a^Nb^Nc^N which is an example of context-sensitive grammar.
This is a simple example, but something like that can be seen in practice. One example is when language allows definition of operators.
So, how does Rust handle that?
In Haskell I think it's something like:
The question comes from Haskell, yes: https://byorgey.wordpress.com/2012/01/05/parsing-context-sen...
You used monadic parser, monadic parsers are known to be able to parse context-sensitive grammars. But, they hide the fact that they are combiinators, implemented with closures beneath them. For example, that "count n $ char 'b'" can be as complex as parsing a set of statements containing expressions with an operator specified (symbol, fixity, precedence) earlier in code.
In Haskell, it is easy - parameterize your expression grammar with operators, apply them, parse text. This will work even with Applicative parsers, even unextended.
But in Rust? I haven't seen how it can be done.
Using parser combinator library "nom", this should probably do what you'd want:
It parses (the beginning of) the input, ensuring `n` repetitions of 'a', 'b', and 'c'. Parse errors are reported through the return type, and the remaining characters are returned for the application to deal with as it sees fit.https://play.rust-lang.org/?version=stable&mode=debug&editio...
> this should probably do what you'd want
If you have to specify N, no, it doesn't
Link us your eBPF disassembler if you can. Sounds cool.
It's not. If you wrote one, it'd be more interesting than mine.
One mind-blowing experience for me:
I can take my parser combinator library that I use for high-level compiler parsers, and use that same library in a no-std setting and compile it to a micro-controller, and deploy that as a high-performance protocol parser in an embedded environment. Exact same library! Just with fewer String and more &'static str.
So toying around with compilers translates my skill-set rather well into doing embedded protocol parsers.
Related, I love Rob Pike's talk about lexical Scanning in Go (2011).
Educational and elegant approach.
https://www.youtube.com/watch?v=HxaD_trXwRE
That talk is great, but I remember some discussion later about Go actually NOT using this technique because of goroutine scheduling overhead and/or inefficient memory allocation patterns? The best discussion I could find is [1].
Another great talk about making efficient lexers and parsers is Andrew Kelley's "Practical Data Oriented Design" [2]. Summary: "it explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput".
--
1: https://news.ycombinator.com/item?id=31649617
2: https://www.youtube.com/watch?v=IroPQ150F6c
Yeah I actually remember that too, this article mentions it:
Coroutines for Go - https://research.swtch.com/coro
The parallelism provided by the goroutines caused races and eventually led to abandoning the design in favor of the lexer storing state in an object, which was a more faithful simulation of a coroutine. Proper coroutines would have avoided the races and been more efficient than goroutines.
I feel like that talk has more to do with expressing concurrency, in problems where concurrency is a natural thing to think about, than it does with lexing.
Something that was hard when I wrote a full AST parser in Rust was representing a hierarchy of concrete AST types, with upcasting and downcasting. I was able to figure out a way, but it required some really weird type shenanigans (eg PhantomData) and some macros. Looks like they had to do crazy macros here too
Curious what the rest of the prior art looks like
Hmmm, yeah Rust’s ADTs and matching syntax would be great. Until you got to the up/down casting. I’m inexperienced enough in Rust to know if there’s good ways to handle it. Dynamic traits maybe?
Sorry to bother you, but would that be open-source by any chance? Is there any public repo available? Thank you.
Yup! You can find it here: https://github.com/brundonsmith/bagel-rs/blob/master/src/mod...
[trying to remind myself how this works because it's been a while]
So it's got macros for defining "union types", which combine a bunch of individual structs into an enum with same-name variants, and implement From and TryFrom to box/unbox the structs in their group's enum
ASTInner is a struct that holds the Any (all possible AST nodes) enum in its `details` field, alongside some other info we want all AST nodes to have
And then AST<TKind> is a struct that holds (1) an RC<ASTInner>, and (2) a PhantomData<TKind>, where TKind is the (hierarchical) type of AST struct that it's known to contain
AST<TKind> can then be:
1. Downcast to a TKind (basically just unboxing it)
2. Upcast to an AST<Any>
3. Recast to a different AST<TKind> (changing the box's PhantomData type but not actually transforming the value). This uses trait implementations (implemented by the macros) to automatically know which parent types it can be "upwardly casted to", and which more-specific types it can try and be casted to
The above three methods also have try_ versions
What this means then is you can write functions against, eg, AST<Expression>. You will have to pass an AST<Expression>, but eg. an AST<BooleanLiteral> can be infallibly recast to an AST<Expression>, but an AST<Any> can only try_recast to AST<Expression> (returning an Option<AST<Expression>>)
Another cool property of this is that there are no dynamic traits, and the only heap pointers are the Rc's between AST nodes (and at the root node). Everything else is enums and concrete structs; the re-casting happens solely with that PhantomType, at the type level, without actually changing any data or even cloning the Rc unless you unbox the details (in downcast())
I worked in this codebase for a while and the dev experience was actually quite nice once I got all this set up. But figuring it out in the first place was a nightmare
I'm wondering now if it would be possible/worthwhile to extract it into a crate
I wrote my fairly share of parsers the last year, and the one I liked a lot is from Salsa examples, you can find it here[0].
[0] https://github.com/salsa-rs/salsa/blob/e4d36daf2dc4a09600975...
Maybe it can work as a quick glimpse into how parser and lexer can work in Rust https://github.com/jmaczan/0x6b73746b
I wrote it long time ago and it’s not fully implemented tho
Sorry my OCD is kicking in but "Asterisk" is spelled wrong as "Asteriks" in your entire sample code.
Well good luck parsing sqlite syntax! I had to write a (fairly small) subset sqlite parser for work a couple of years ago. I really like sqlite, it's always a source of inspiration.
The railroad diagrams are tremendously useful:
https://www.sqlite.org/syntaxdiagrams.html
I don't think the lemon parser generator gets enough credit:
https://sqlite.org/src/doc/trunk/doc/lemon.html
With respect of the choice of the language, any language with Algebraic Data Types would work great. Even Typescript would be great for this.
FWIW I wrote a small introduction to writing parsers by hand in Rust a while ago:
https://www.nhatcher.com/post/a-rustic-invitation-to-parsing...
So how do you debug code written with macros like this, or come into it as a new user of the codebase?
I’m imagining seeing the node! macro used, and seeing the macro definition, but still having a tough time knowing exactly what code is produced.
Do I just use the Example and see what type hints I get from it? Can I hover over it in my IDE and see an expanded version? Do I need to reference the compiled code to be sure?
(I do all my work in JS/TS so I don’t touch any macros; just curious about the workflow here!)
Run:
And you’ll see the resulting code.Rust is really several languages, ”vanilla” rust, declarative macros and proc macros. Each have a slightly different capability set and different dialect. You get used to working with each in turn over time.
Also unit tests is generally a good playground area to understand the impacts of modifying a macro.
rust-analyzer, the Rust LSP used in e.g. VSCode, can expand declarative and proc macros recursively.
it isn't too bad, although the fewer proc macros in a code base, the better. declarative macros are slightly easier to grok, but much easier to maintain and test. (i feel the same way about opaque codegen in other languages.)
Imperative rust is really good for parsing, but you can also get a long way with regexes. Especially if you are just prototyping or doing Advent of Code.
I do still like declarative parsing over imperative, so I wrote https://docs.rs/inpt on top of the regex crate. But Andrew Gallant gets all the credit, the regex crate is overpowered.
I think except macros, most of these features are ML family language features as well. Rust stands out because it can implement this in an efficient, zero overhead abstraction way.
I like MegaParsec in haskell quite expressive, based on my limited experience using nom in Rust
I've found that the logos crate is really nice for writing lexers in rust
https://docs.rs/logos/0.14.2/logos/
Mentioning macros as a reason to love Rust goes against my experience with them.
I love using macros, writing it however
this is the third day in a row this article is being posted here.
this time it got traction. funny how HN works.
https://news.ycombinator.com/item?id=42055954
https://news.ycombinator.com/item?id=42058920
Every rust article: "Look how great this rust feature is and how clean and concise the resulting code is!"
Me: "How can a programming language be so damn complex? Am I just dumb?"
There's plenty of complex programming languages out there. Some are worth putting the time into. If you can program well in some other language you can get your head around Rust - give it some time - it's worth it.
Does anyone have a good EBNF notation for Sqlite? I tried to make a tree-sitter grammar, which produces C code and great Rust bindings for it. But they use some lemon parser. Not sure how to read the grammar from that.
The lemon tool that is used by SQLite can output the grammar as SQL database that you can manipulate. There is https://github.com/ricomariani/CG-SQL-author that goes way beyond and you'll need to create the Rust generation, you can play with it here with a Lua backend https://mingodad.github.io/CG-SQL-Lua-playground/ .
Also I'm collecting several LALR(1) grammars here https://mingodad.github.io/parsertl-playground/playground/ that is an Yacc/Lex compatible online editor/interpreter that can generate EBNF for railroad diagram, SQL, C++ from the grammars, select "SQLite3 parser (partially working)" from "Examples" then click "Parse" to see the parse tree for the content in "Input source".
I also created https://mingodad.github.io/plgh/json2ebnf.html to have a unified view of tree-sitter grammars and https://mingodad.github.io/lua-wasm-playground/ where there is an Lua script to generate an alternative EBNF to write tree-sitter grammars that can later be converted to the standard "grammar.js".
Not EBNF or anything standard, but possibly readable enough. It is an LR(1) grammar that has tested on all the test cases in Sqlite's test suite at the time:
https://lrparsing.sourceforge.net/doc/examples/lrparsing-sql...
The grammer contains things you won't have seen before, like Prio(). Think of them as macros. It all gets translated to LR(1) productions which you can ask it to print out. LR(1) productions are simpler than EBNF. They look like:
Documentation on what the macros do, and how to get it to spit out the LR1(1) productions is here:https://lrparsing.sourceforge.net/doc/html/
It was used to do a similar task the OP is attempting.
This is great. Do you have any pointers to where those tests are? It’s hard to test the grammar without those.
Edit: Never mind. I see it right there under the parser. Thanks!
It looks pretty much like BNF. Not too far off, anyway. https://sqlite.org/src/doc/trunk/doc/lemon.html#syntax
Perhaps this ANTLR v4 sqlite grammar? [1]
--
1: https://github.com/antlr/grammars-v4/tree/master/sql/sqlite
I actually have some experience porting these antlrs over to tree-sitter. I'll give it a shot.
I'll throw in a plug for https://pest.rs/ a PEG-based parser-generator library in Rust. Delightful to work with and removes so much of the boilerplate involved in a parser.
I have been using this tool. The best feature imho is that you can quickly iterate on the grammar in the browser using the online editor in the homepage.
I was struggling though with the lack of strong typing in the returned parse tree, though I think some improvements have beenade there which I did not have a chance to look into yet
That feature is on the roadmap for Pest 3: https://github.com/pest-parser/pest/issues/882
I cannot agree less, C++ is the best and always will be. You youngsters made up this new dialect that can also compile with the C++ compiler. This is like people putting VS Code in dark mode thinking they're now also working in the Terminal like the Gods of Binary.
Rust being a dialect of c++ is certainly a novel take
I expect they are thinking of the "Safe C++" proposal P3390. This proposes to provide the syntax and other features needed to grant (a subset of the future) C++ the same safety properties as safe Rust via an equivalent mechanism (a borrow checker for C++ and the lifetime annotations to drive it, the destructive move, the nominative typing and so on).
Much as you might anticipate (although perhaps its designer Sean Baxter did not) this was not kindly looked upon by many C++ programmers and members of WG21 (the C++ committee)
The larger thing that "Safe C++" and the reaction to it misses is that Rust's boon is its Culture. The "Safe C++" proposal gives C++ a potential safety technology but does not and cannot gift it the accompanying Safety Culture. Government programmes to demand safety will be most effective - just as with other types of safety - if they deliver an improved culture not just technological change.
That sounds significantly more like C++ trying to be a dialect of Rust, rather than the other way around. I don't think that was the GGP's main gripe.
But more importantly, Safe C++ is just not a thing yet. People seem to discount the herculean effort that was required to properly implement the borrow checker, the thousands of little problems that needed to be solved for it to be sound, not to mention a few really, really hard problems, like variance, lifetimes in higher-kinded trait bounds, generic associated types, and how lifetimes interact with a Hindley-Milner type system in general.
Not trying to discount Safe C++'s efforts of course. I really hope they, too, succeed. I also hope they manage to find a syntax that's less... what it is now.
I don't think Safe C++ has a Hindley-Milner type system? I think it's just the "Just the machine integers wearing funny hats†" types from C which were passed on to C++
In K&R C this very spartan type system makes some sense, there's no resources, you're on a tiny Unix machine, you'd otherwise be grateful for an assembler. In C++ it does look kinda silly, like an SUV with a lawnmower engine. Or one of those very complicated looking board games which turns out to just be Snakes and Ladders with more steps.
But I don't think Safe C++ fixes that anyhow.
† Technically maybe the C pointer types are not just the integers wearing a funny hat. That's one of many unresolved soundness bugs in the language, hence ISO/IEC DTS 6010 (which will some day become a TR)
No, Safe C++ does not have that type system. I was just trying to emphasize the amount of, let's be honest, downright genius that had to go into that lifetime specification and borrow checker implementation.
For C++, it'll be about cramming lifetimes into diamond-inheritance OOP, which... feels even harder.
Safe C sounds like a much, much more believable project, if such a proposal were to exist.