Show HN: Skeletoken, a Package for Editing Tokenizers

github.com

1 points by stephantul 12 hours ago

Hello!

I work on Hugging Face tokenizers a lot in my day job. Editing tokenizers, e.g., adding or removing tokens is super painful. This is why I wrote a library for working with the format.

It contains many useful tools for working with tokenizers, checking them, making them lowercased, etc.

There’s still loads of features to add and probably bugs to iron out, but I’ve been using it and it seems to work well!

Please let me know what you think, Stéphan