Before, when two file types matched the same glob, the file type that
one was non-deterministic.
Now, "the more specific" file type wins. What this means is that we
break up the file by its extensions and prioritize the more literal glob
- If its just `*`, then its lowest priority
- If it contains `*` and other logic, then its next
- If it doesn't contain a `*`, then its the highest priority
This leaves out other glob syntax like `{one,two}` as those are
closed-ended and so considered specific still.
Fixes#487
Previous method misaligns highlights when there are double width asian characters
```
39 | 한글 eglish
| ^^^^^^
```
This commit fixes the highlight to have correct alignment.
```
39 | 한글 eglish
| ^^^^^^
```
`unicode-rs` crate is used by the Rust compiler [1].
[1]: 34a6c9f26e/compiler/rustc_errors/src/emitter.rs (L861)
`go.mod` seems to be a specification file which we tend to lump in with
the language itself since a weirdly spell dependency will likely show up
in code.
`go.sum` seems to be like a lock file which we quarantine into its own
file type.
Fixes#458
First, this centralizes the concept of lock files, focusing on intent,
rather than syntax. We are assuming `requirements.txt` for Python is
being used like a regular lock file and not as a dependency
specification.
Second, we then ignore the content. Though a lock file will generally
contain things that could show up in a dependency specification, the
large dependency trees make that harder to manage. We still have the
dependency specification file which will match with the users code.
Fixes#445
For `rg`, keeping the file types strict makes sense, For spell
checking, `Cargo.toml` is a lot more closely related in handling to
`*.rs` than it is to `pyproject.toml` due to ecosystem package names.
Part of #362
This cuts varcon lookup times in half but I still suspect slower than
phf. Like with bsearch and unlike, the cost is consistent between hits
and misses.
At least this doesn't have the compile hit of PHF + unicase. Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
Before, only some dicts did we guarentee were pre-sorted. Now, all are
for-sure pre-sorted.
This also gives each dict the size-check to avoid lookup.
But this is really about refactoring in prep for playing with other
lookup options, like tries.