Ed Page
32f5e6c682
refactor(typos)!: Bake ignores into parser
...
This is prep for other items to be ignored
BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens. Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
ded90f2387
perf(parser): Auto-detect unicode
...
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
95417f3a41
refactor(parser): Consolidate utf8/ascii logic
2021-06-29 05:10:02 -05:00
Ed Page
83b2804623
fix(ci): Don't fail codegen checks
2021-06-28 14:06:47 -05:00
Ed Page
4066d21790
style: Address clippy
2021-06-28 13:51:06 -05:00
Ed Page
3a4d039c4f
chore: Reduce code-gen memory usage
...
More `const fn` removals to reduce compilation memory use
2021-06-07 08:58:34 -05:00
Ed Page
04f5d40e57
chore: Release
2021-06-05 14:39:37 -05:00
Ed Page
2b1f565eaa
refactor(varcon): Remove reliance on const-fn
...
This dropped RSS (memory usage) from 4GB to 1.5GB when compiling.
The extra `match` could impact performance but not too concerned since
the default is to not look within vars.
2021-06-04 15:01:08 -05:00
Ed Page
b1cf03c7eb
refactor(varcon): Move away from PHF
...
This is mostly to give implementation flexibility for changing out how
we store the data to reduce compilation memory usage.
This does have performance impact, jumping from ~220ns to ~320ns for a
dict lookup, according to our micro benchmarks.
2021-06-04 14:59:46 -05:00
Ed Page
1cb9b37120
chore: Update codespell dict
...
Based on 2ed354c at https://github.com/codespell-project/codespell
2021-05-22 21:44:56 -05:00
Ed Page
3e66a99674
chore: Release
2021-05-21 20:41:02 -05:00
Ed Page
3995745362
chore: Release
2021-05-21 20:39:12 -05:00
Ed Page
b99f32dea8
perf(dict): Bypass vars when possible
...
Variant support slows us down by 10-50$. I assume most people will run
with `en` and so most of this overhead is to waste. So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to. If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.
Before:
```
check_file/Typos/code time: [35.860 us 36.021 us 36.187 us]
thrpt: [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time: [26.966 ms 27.215 ms 27.521 ms]
thrpt: [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code time: [33.837 us 33.928 us 34.031 us]
thrpt: [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time: [17.521 ms 17.620 ms 17.730 ms]
thrpt: [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```
This puts us inline with `--no-default-features --features dict`
Fixes #253
2021-05-19 13:55:41 -05:00
Ed Page
639e65b88a
fix(dict): Handle cases from Linux
...
These were found while running `typos` on Linux and inspecting a
sampling of the results. #249 represents additional changes to make.
There were some identifiers, that looked like hardware registers, that
I'm unsure of what can be done for them.
2021-05-18 12:02:03 -05:00
Ed Page
fb0dac4297
refactor(dict): Allow 0..n corrections in BuiltIn
...
The main use case is taking `ther` -> `there` and adding `the` and
`their`.
2021-05-18 12:02:03 -05:00
Ed Page
77cfccb392
refactor(varcon): Clarify check's meanings
2021-05-15 19:29:27 -05:00
Ed Page
b830872ad0
chore: Update enumflags2
2021-05-13 10:20:15 -05:00
Ed Page
7c803681c4
chore: Release
2021-05-13 09:58:09 -05:00
Ed Page
3b9061dece
Merge pull request #240 from crate-ci/dependabot/cargo/codegenrs-1.0.0
...
chore(deps): Bump codegenrs from 0.1.5 to 1.0.0
2021-05-01 09:04:51 -05:00
dependabot[bot]
d72fa7acba
chore(deps): Bump codegenrs from 0.1.5 to 1.0.0
...
Bumps [codegenrs](https://github.com/crate-ci/codegenrs ) from 0.1.5 to 1.0.0.
- [Release notes](https://github.com/crate-ci/codegenrs/releases )
- [Changelog](https://github.com/crate-ci/codegenrs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/crate-ci/codegenrs/compare/v0.1.5...v1.0.0 )
Signed-off-by: dependabot[bot] <support@github.com>
2021-05-01 07:01:59 +00:00
Ed Page
6216fa0837
fix(dict)!: Clarify word sizes with Ranges
...
The code was generated with separate min / max, rather than using a
Range and ensuring the API is used correctly.
2021-04-30 21:33:33 -05:00
Ed Page
f40ed5a328
style: Address clippy
2021-04-30 11:37:16 -05:00
Ed Page
517da7ecd2
perf(parser): Allow people to bypass unicode cost
2021-04-29 21:07:59 -05:00
Ed Page
09d2124d0f
perf(parser): Limit inner-loop assers
2021-04-29 18:31:05 -05:00
Ed Page
287c4cbfe9
refactor(parser): Give more impl flexibility
2021-04-29 18:31:05 -05:00
Ed Page
9cbc7410a4
fix(parser)!: Defer to Unicode XID for identifiers
...
This saves us from having to have configuration for every detail. If
people need more control, we can offer it later.
Fixes #225
2021-04-29 18:30:57 -05:00
Ed Page
f15cc58f71
fix(parser): Flip leading digits to work correctly
2021-04-29 18:30:14 -05:00
Ed Page
4b94352b7a
perf(parser): Try hand-rolled number parsing
2021-04-29 18:30:14 -05:00
Ed Page
6b92e345cc
perf(parser): Speed up UTF-8 validation
2021-04-27 21:17:46 -05:00
Ed Page
819702c82f
refactor(parser): Unify str/bytes code paths
...
The main goal is to support replacing the parser with `nom` where I need
access to `str` only functionality.
With crates like simdutf8, this might also offer up performance gains
since they see the biggest benefit when doing large blocks of
validation.
2021-04-27 21:17:43 -05:00
Ed Page
fce11d6c35
refactor(parser)!: Allow short-circuiting word splitting
...
This is prep for experiments with getting this information ahead of
time.
See #224
2021-04-27 21:17:38 -05:00
Ed Page
9bfb506c6d
fix(typos)!: Clarify Case::Upper
s name
...
`Scream` was referrin to `SCREAMING_CASE` but outside of that context, I
think `Upper` is more accurate.
2021-04-21 20:36:35 -05:00
Ed Page
1f4c587692
chore({{crate_name}}): Release {{version}}
2021-04-14 19:13:25 -05:00
Ed Page
b4459bef33
chore: Fix readme paths in Cargo.toml
2021-04-13 21:36:47 -05:00
Ed Page
d7978658d4
test(cli): Ensure we apply corrections
2021-04-10 19:13:48 -05:00
Ed Page
b5f606f201
refactor(typos): Simplify the top-level API
2021-03-01 11:50:23 -06:00
Ed Page
1010d2ffe5
refactor(tokenizer): Remove stale function
2021-03-01 11:50:23 -06:00
dependabot-preview[bot]
b8d3190ce9
chore(deps): bump itertools from 0.9.0 to 0.10.0
...
Bumps [itertools](https://github.com/bluss/rust-itertools ) from 0.9.0 to 0.10.0.
- [Release notes](https://github.com/bluss/rust-itertools/releases )
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/bluss/rust-itertools/compare/v0.9.0...v0.10.0 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2021-01-03 03:40:45 +00:00
Ed Page
67222e9338
style: Address clippy
2021-01-02 13:49:28 -06:00
Ed Page
692f0ac095
refactor(typos): Focus API on primary use case
2021-01-02 13:10:40 -06:00
Ed Page
aba85df435
docs(typos): Clarify intent
2021-01-02 13:10:40 -06:00
Ed Page
48112a47e9
refactor(parser): Abstract over lifetimes
2021-01-02 13:10:30 -06:00
Ed Page
bc90bacff2
refactor(typos): Pull out file logic
2021-01-02 13:10:30 -06:00
Ed Page
e741f96de3
refactor(typos): Decouple parsing from checks
2021-01-02 13:10:22 -06:00
Ed Page
1e64080c05
refactor(typos): Open up the name Parser
2021-01-02 12:58:33 -06:00
Ed Page
7fdd0dee16
style(typos): Make parser ordering clearer
2021-01-02 12:58:33 -06:00
Ed Page
d5a781fd25
Merge pull request #179 from crate-ci/dependabot/cargo/unicode-segmentation-1.7.1
...
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
2020-12-01 08:24:59 -06:00
dependabot-preview[bot]
5640d23b95
chore(deps): bump csv from 1.1.4 to 1.1.5
...
Bumps [csv](https://github.com/BurntSushi/rust-csv ) from 1.1.4 to 1.1.5.
- [Release notes](https://github.com/BurntSushi/rust-csv/releases )
- [Commits](https://github.com/BurntSushi/rust-csv/compare/1.1.4...1.1.5 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:19:21 +00:00
dependabot-preview[bot]
7fa5a9eadf
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
...
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation ) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases )
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:18:35 +00:00
Ed Page
d96de581f3
fix(report): Rendering issues with errors
...
- We aren't consistent in quoting words
- We used byte offsets rather than column counts
- We mixed styles between disallowed and corrections
Fixes #165
2020-11-24 18:52:24 -06:00
Ed Page
9b0cd5b5f0
fix(report): Show path for errors
2020-11-23 11:20:12 -06:00
Ed Page
869b916ca6
fix: Handle broken pipe
2020-11-21 21:57:12 -06:00
Ed Page
7a1fac7fab
refactor(report): Use native types
2020-11-11 18:44:27 -06:00
Ed Page
6bdbd821e3
perf(dict): Avoid hashing unknwon words
...
Bypass hashing when we know (through str::len) that a word won't be in
the dict.
Master:
```
real 0m26.675s
user 0m33.683s
sys 0m4.535s
```
With this change
```
real 0m24.432s
user 0m32.492s
sys 0m4.190s
```
2020-11-10 20:57:04 -06:00
Ed Page
beaa0f4091
perf(dict): Avoid hashing unknwon words
...
Bypass hashing when we know (through str::len) that a word won't be in
the dict.
Master:
```
real 0m26.675s
user 0m33.683s
sys 0m4.535s
```
With this change:
```
real 0m24.060s
user 0m31.559s
sys 0m4.258s
```
2020-11-10 20:57:00 -06:00
Ed Page
e4edbc5f7e
chore: Update dependencies
2020-11-10 19:47:13 -06:00
Ed Page
b7700fa214
refactor: Don't special case --files
2020-11-10 06:30:27 -06:00
Ed Page
628c011f77
fix(report): Ensure json output is clean
2020-11-10 06:30:27 -06:00
Ed Page
e12cd8ed55
refactor: Layer files/filenames on buffer processing
2020-11-10 06:30:27 -06:00
Ed Page
eb20ba9f11
refactor(report): Make Parse consistent with Typos
2020-11-10 06:30:27 -06:00
Ed Page
97f90da9bc
refactor: Move off of lazy_static
2020-11-10 06:30:27 -06:00
Ed Page
3bcd8a130e
refactor(report): Merge the typos types
2020-11-10 06:30:23 -06:00
Ed Page
fe282a0aea
refactor: Pull out common policy
2020-11-07 20:04:58 -06:00
Ed Page
736db10708
fix(format): Clarify message types
2020-10-28 21:01:33 -05:00
Ed Page
527b9837b4
feat: Custom dictionary support
...
Switching `valid-*` to just `*` where you map typo to correction, with
support for always-valid and never-valid.
Fixes #9
2020-10-27 21:15:25 -05:00
dependabot-preview[bot]
84e56b22b5
chore(deps): bump derive_more from 0.99.9 to 0.99.11
...
Bumps [derive_more](https://github.com/JelteF/derive_more ) from 0.99.9 to 0.99.11.
- [Release notes](https://github.com/JelteF/derive_more/releases )
- [Changelog](https://github.com/JelteF/derive_more/blob/master/CHANGELOG.md )
- [Commits](https://github.com/JelteF/derive_more/compare/v0.99.9...v0.99.11 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-10-01 08:11:46 +00:00
Ed Page
a63dfa0f8c
perf: Faster binary-file detection
...
This switches us from a homegrown implementation to `context_inspector`
- Adds some optimizations by looking for the BoM.
- We used the same algorithm for finding Null bytes
- `context_inspector` caps how much of the buffer is searche though
Besides performance, `content_inspector` also has some known-binary
magic numbers to avoid bad detections.
Fixes #34
2020-08-21 16:29:11 -05:00
Ed Page
ab4a5bbdaf
feat: Support english dialects
...
The goal is to be as accepting and unobtrusive to new code bases as
possible. To this end, we correct typos into the closest english
dialect.
If someone wants to opt-in, they can have typos correct to a specific
english dialect.
Fixes #52
Fixes #22
2020-08-20 19:37:37 -05:00
Ed Page
294c25c67a
fix(dict): Missing a correction
2020-08-15 21:03:00 -05:00
Ed Page
5d7e91d214
fix(ci): Report more failures
2020-07-04 20:52:48 -05:00
Ed Page
bc1302f01b
feat: Support multiple, valid corrections
...
Some of the other spell checkers already do this. While I've not checked
where we might need it for our dictionary, this will be important for
dialects.
2020-07-04 20:52:48 -05:00
Ed Page
a5ed18ee46
fix(replace): Don't error on successful replacement
2020-07-04 20:52:47 -05:00
dependabot-preview[bot]
354fec1aa1
chore(deps): bump nom from 5.1.1 to 5.1.2
...
Bumps [nom](https://github.com/Geal/nom ) from 5.1.1 to 5.1.2.
- [Release notes](https://github.com/Geal/nom/releases )
- [Changelog](https://github.com/Geal/nom/blob/master/CHANGELOG.md )
- [Commits](https://github.com/Geal/nom/compare/5.1.1...5.1.2 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-04 20:52:47 -05:00
dependabot-preview[bot]
146998f331
chore(deps): bump derive_more from 0.99.7 to 0.99.9
...
Bumps [derive_more](https://github.com/JelteF/derive_more ) from 0.99.7 to 0.99.9.
- [Release notes](https://github.com/JelteF/derive_more/releases )
- [Changelog](https://github.com/JelteF/derive_more/blob/master/CHANGELOG.md )
- [Commits](https://github.com/JelteF/derive_more/compare/v0.99.7...v0.99.9 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-04 20:52:47 -05:00
Ed Page
f4400c2280
refactor(varcon): Pull out core types
2020-07-04 20:52:47 -05:00
Ed Page
4ae8f91436
feat(varcon): Make all types hashable
2020-07-04 20:52:47 -05:00
Ed Page
94bc42eb07
fix(varcon): Name of crate was wrong
2020-07-04 20:52:47 -05:00
Ed Page
7f983992bd
feat(dict): varcon dict
2020-07-04 20:52:47 -05:00
Ed Page
814ff82aff
refactor: Follow monorepo pattern elsewhere
2020-07-04 20:52:47 -05:00