Commit graph

40 commits

Author SHA1 Message Date
Ed Page
5ae7bda8eb style: Silence clippy 2022-05-16 09:09:17 -05:00
Ed Page
5c83dec07b style: Remove unused variable 2021-12-14 15:41:52 -06:00
Ed Page
c8d1058a71 refactor(dict): Change typos-dict to trie
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
bbbf985777 perf(dict): Switch varcon to a burst-trie
This cuts varcon lookup times in half but I still suspect slower than
phf.  Like with bsearch and unlike, the cost is consistent between hits
and misses.

At least this doesn't have the compile hit of PHF + unicase.  Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
a1e95bc7c0 refactor(dict): Pull out table-lookup logic
Before, only some dicts did we guarentee were pre-sorted.  Now, all are
for-sure pre-sorted.

This also gives each dict the size-check to avoid lookup.

But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
b1cf03c7eb refactor(varcon): Move away from PHF
This is mostly to give implementation flexibility for changing out how
we store the data to reduce compilation memory usage.

This does have performance impact, jumping from ~220ns to ~320ns for a
dict lookup, according to our micro benchmarks.
2021-06-04 14:59:46 -05:00
Ed Page
b99f32dea8 perf(dict): Bypass vars when possible
Variant support slows us down by 10-50$.  I assume most people will run
with `en` and so most of this overhead is to waste.  So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to.  If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.

Before:
```
check_file/Typos/code   time:   [35.860 us 36.021 us 36.187 us]
                        thrpt:  [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time:   [26.966 ms 27.215 ms 27.521 ms]
                        thrpt:  [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code   time:   [33.837 us 33.928 us 34.031 us]
                        thrpt:  [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time:   [17.521 ms 17.620 ms 17.730 ms]
                        thrpt:  [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```

This puts us inline with `--no-default-features --features dict`

Fixes #253
2021-05-19 13:55:41 -05:00
Ed Page
d65fa79d0e refactor(dict): Make feature flag paths clearer 2021-05-18 19:45:11 -05:00
Ed Page
639e65b88a fix(dict): Handle cases from Linux
These were found while running `typos` on Linux and inspecting a
sampling of the results.  #249 represents additional changes to make.
There were some identifiers, that looked like hardware registers, that
I'm unsure of what can be done for them.
2021-05-18 12:02:03 -05:00
Ed Page
fb0dac4297 refactor(dict): Allow 0..n corrections in BuiltIn
The main use case is taking `ther` -> `there` and adding `the` and
`their`.
2021-05-18 12:02:03 -05:00
Ed Page
04e55e4e85 fix(dict): Correctly connect dict with varcon
We had a bug where `finallizes` with EnGb would not correct to
`finalises`
2021-05-17 21:23:12 -05:00
Ed Page
b830872ad0 chore: Update enumflags2 2021-05-13 10:20:15 -05:00
Ed Page
cec850890c
Merge pull request #238 from epage/range
fix(dict)!: Clarify word sizes with Ranges
2021-05-01 08:54:08 -05:00
Ed Page
6216fa0837 fix(dict)!: Clarify word sizes with Ranges
The code was generated with separate min / max, rather than using a
Range and ensuring the API is used correctly.
2021-04-30 21:33:33 -05:00
Ed Page
2fc1f5468e chore(cli): Allow building without expensive parts
The obvious case is building for docs.rs but this can be helpful for
special use cases or faster development iteration.
2021-04-30 21:31:25 -05:00
Ed Page
9bfb506c6d fix(typos)!: Clarify Case::Uppers name
`Scream` was referrin to `SCREAMING_CASE` but outside of that context, I
think `Upper` is more accurate.
2021-04-21 20:36:35 -05:00
Ed Page
b17f9c3a12 feat: Const some fns 2021-03-29 20:27:06 -05:00
Ed Page
ce16d38cfd perf(dict): Skip checking numbers 2020-11-11 18:52:23 -06:00
Ed Page
482d320407 fix(dict): Ensure we fall through to built-in dict 2020-11-11 12:22:29 -06:00
Ed Page
6bdbd821e3 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change
```
real    0m24.432s
user    0m32.492s
sys     0m4.190s
```
2020-11-10 20:57:04 -06:00
Ed Page
beaa0f4091 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change:
```
real    0m24.060s
user    0m31.559s
sys     0m4.258s
```
2020-11-10 20:57:00 -06:00
Ed Page
18e31fa578 perf: Avoid hashing withut custom dict
`HashMap::get` (at least hashbrown) hashes before getting and doesn't
check if dict is empty.  For the custom dict, a common use case will
have the dict be empty.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

Bypassing `HashMap::get`
```
real    0m16.415s
user    0m14.519s
sys     0m4.118s
```

On a moderately sized repo.
2020-11-10 20:56:54 -06:00
Ed Page
150c5bfdc1 perf: Hash faster for custom dicts
If we have to hash for the custom dict, we might as well be fast about
it.  We do not need a cryptographically secure algorithm since the
content is fixed for the user.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With ahash:
```
real    0m23.993s
user    0m30.800s
sys     0m4.440s
```
2020-11-10 20:56:49 -06:00
Ed Page
527b9837b4 feat: Custom dictionary support
Switching `valid-*` to just `*` where you map typo to correction, with
support for always-valid and never-valid.

Fixes #9
2020-10-27 21:15:25 -05:00
Ed Page
043692afe0 feat(dict): Override builtin dictionary
Sometimes you just have to live with a typo or its done intentionally
(like weird company names).  With this commit, a user can now identifier
blessed identifiers and words.

This is ostly what is needed for #9 but sometimes people will have
common typos that they'll want to provide corrections for.
2020-09-02 20:24:54 -05:00
Ed Page
ab4a5bbdaf feat: Support english dialects
The goal is to be as accepting and unobtrusive to new code bases as
possible.  To this end, we correct typos into the closest english
dialect.

If someone wants to opt-in, they can have typos correct to a specific
english dialect.

Fixes #52
Fixes #22
2020-08-20 19:37:37 -05:00
Ed Page
bc1302f01b feat: Support multiple, valid corrections
Some of the other spell checkers already do this. While I've not checked
where we might need it for our dictionary, this will be important for
dialects.
2020-07-04 20:52:48 -05:00
Ed Page
ce1ef2ca30 refactor!: Move dict implementation into CLI 2019-10-28 11:00:47 -06:00
Ed Page
164ee9cb84 refactor: Split bin/lib into separate crates 2019-08-08 10:04:51 -05:00
Ed Page
adcbe68621 refactor(dict): Split out a trait 2019-07-27 19:50:36 -06:00
Ed Page
953064e7d1 fix(dict): Fix should match typo's case
Fixes #10
2019-06-26 07:22:59 -06:00
Ed Page
a5b8636bdb refactor(dict): Allow for owned corrections 2019-06-24 21:46:40 -06:00
Ed Page
859769b835 refactor: Rename Symbol to Identifier
This is more descriptive
2019-06-24 21:46:39 -06:00
Ed Page
3d1fb3b1ae feat(parse): Process words composing symbols 2019-06-15 22:21:40 -06:00
Ed Page
905de9bd8d chore(CI): Fighting clippy 2019-06-14 14:53:34 -06:00
Ed Page
f1e3163ba2 fix: Clippy 2019-06-14 07:04:58 -06:00
Ed Page
9ccfc9c27d fix: Clippy 2019-06-14 06:51:22 -06:00
Ed Page
af66072272 feat(dict): Perform case-insensitive comparisons 2019-06-13 19:55:21 -06:00
Ed Page
b6aabc9392 refactor: Switch to bytes for symbol lookup 2019-04-16 18:15:12 -06:00
Ed Page
85ee5cfac9 fix(api): Split lib 2019-01-24 08:24:20 -07:00