Ed Page
2fc71b2e13
chore: Release
2022-10-11 09:48:09 -05:00
Robert
611bd09d9d
feat(dict): Add assorted typos
...
baged -> bagged
baged -> badge
codesbase -> codebase (+ variants)
depercate -> deprecate (+ variants)
fallthough -> fallthrough
2022-10-11 15:14:16 +11:00
Ed Page
16ca0accbb
chore: Release
2022-10-06 08:26:11 -05:00
Jonas Platte
6d6713180e
feat(dict): Add whaat -> what
2022-10-06 14:27:29 +02:00
Ed Page
2b667ffe55
fix: Correctly calculate trie
2022-10-04 10:57:28 -05:00
Ed Page
f78135acd2
chore: Bump MSRV to 1.64.0
2022-10-04 10:51:03 -05:00
Ed Page
32485c4bad
chore: Upgrade dependencies
2022-10-03 11:36:25 -05:00
Ed Page
fd5abef1a7
chore: Release
2022-09-22 13:38:17 -05:00
Robert
bcd622e33c
feat(dict): Add 'targest -> target'
2022-09-22 16:07:12 +10:00
Ed Page
668a94791b
chore: Release
2022-09-21 19:28:48 -05:00
Frank Steffahn
a2cc907420
feat(dict): Add 'pararmeter -> parameter'
2022-09-22 01:59:52 +02:00
Ed Page
384aaef311
chore: Release
2022-09-15 08:43:18 -05:00
Jonas Platte
e8e20f28bb
feat(dict): Add 'stte' typo
2022-09-15 10:35:19 +02:00
Ed Page
3161cd6a82
chore: Release
2022-09-06 09:25:25 -05:00
Yuta Hayashibe
7ee918f078
Removed their from the correction candidate of thje
2022-09-06 23:01:10 +09:00
Yuta Hayashibe
d207af69ae
Add some typos
2022-09-06 19:33:15 +09:00
Ed Page
c6d876294c
chore: Release
2022-09-01 10:43:57 -05:00
Ed Page
7f470e1721
Revert "Revert "fix: remove thead -> thread""
...
This reverts commit 1e58c65276
.
2022-09-01 10:28:53 -05:00
Ed Page
c49aff00be
test: Make platform agnostic
2022-09-01 07:15:42 -05:00
Ed Page
fdb425c279
chore: Release
2022-08-30 09:28:57 -05:00
Robert
5483e8976a
feat(dict): add typos from Fig monorepo
2022-08-30 13:10:21 +10:00
Ed Page
4c2445fb57
chore: Release
2022-08-25 16:24:58 -05:00
Ed Page
cb91b89080
fix(parse): Ignore CSS hex values that start with digits
...
Fixes #542
2022-08-25 16:05:57 -05:00
Ed Page
0612303e7d
chore: Release
2022-08-23 09:24:26 -05:00
Ed Page
5896efe198
Merge pull request #540 from epage/typo
...
fix: Misc config updates
2022-08-23 09:22:23 -05:00
Ed Page
1e58c65276
Revert "fix: remove thead -> thread"
...
This reverts commit 69f89505d8
.
2022-08-23 08:21:47 -05:00
Jonas Platte
272ac51fdb
feat(dict): Add typos for "inappropriate[ly]"
2022-08-17 13:56:52 +02:00
Ed Page
c7e576614e
chore: Release
2022-08-16 07:56:56 -05:00
Ed Page
2fce5f7f09
fix: Remove unused log dependency
2022-08-16 07:56:31 -05:00
Ed Page
62847112ff
chore: Release
2022-08-16 07:53:32 -05:00
Ed Page
2d51f44345
fix: Remove extra build dependency
2022-08-16 07:52:51 -05:00
Ed Page
9b70dca40c
chore: Release
2022-08-16 07:49:04 -05:00
Ed Page
d40d24b811
Merge pull request #537 from epage/thead
...
fix: remove thead -> thread
2022-08-16 07:47:32 -05:00
Uyarn
69f89505d8
fix: remove thead -> thread
...
This supersedes #533
Fixes #532
2022-08-16 07:40:30 -05:00
Jonas Platte
d6e9d52477
feat(dict): Add "deffer" to typo list
2022-08-16 14:20:40 +02:00
Ayaz Hafiz
6be109774b
Correct "opauqe" to "opaque"
...
I can't find any references to "opauqe" as an actual word, so I believe
this to be safe.
2022-08-15 11:27:45 -05:00
Ed Page
4d9c507595
chore: Release
2022-08-13 12:03:26 -05:00
Yuta Hayashibe
cb3736663e
Add other corrections
2022-08-13 23:50:43 +09:00
Yuta Hayashibe
50da882077
Add typos
2022-08-13 14:36:41 +09:00
Ed Page
80f1ed0290
chore: Bump MSRV to 1.60
2022-08-03 09:32:45 -05:00
Ed Page
ea38677643
chore: Update dependencies
2022-08-03 09:29:38 -05:00
Ed Page
a8599f6a19
test: Move codegen to tests
2022-08-03 09:07:04 -05:00
dependabot[bot]
4ffa72dac1
chore(deps): Bump once_cell from 1.12.0 to 1.13.0
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.12.0...v1.13.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-08-01 07:02:51 +00:00
Ed Page
a6674d5be4
chore: Release
2022-07-22 11:09:08 -05:00
Ed Page
3e9caf0731
fix(dict): Run codegen for #516
2022-07-22 10:14:08 -05:00
Jonas Platte
e7be2d3983
feat(dict): Add 'anonymized' typos
2022-07-22 16:31:59 +02:00
dependabot[bot]
7106dff9b1
chore(deps): Bump clap from 3.1.18 to 3.2.8
...
Bumps [clap](https://github.com/clap-rs/clap ) from 3.1.18 to 3.2.8.
- [Release notes](https://github.com/clap-rs/clap/releases )
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md )
- [Commits](https://github.com/clap-rs/clap/compare/v3.1.18...v3.2.8 )
---
updated-dependencies:
- dependency-name: clap
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-07-01 07:04:47 +00:00
Ed Page
aff7161142
chore: Release
2022-06-15 16:11:53 -05:00
Ed Page
7c953a71ec
chore: Upgrade to 2021 edition
2022-06-01 06:53:10 -05:00
Ed Page
b15558f0f3
chore: Set rust-version
2022-06-01 06:51:59 -05:00
Ed Page
927308c726
chore: Release
2022-05-16 09:33:53 -05:00
Ed Page
5ae7bda8eb
style: Silence clippy
2022-05-16 09:09:17 -05:00
Ed Page
778fd7a53d
chore: Release
2022-05-10 14:24:11 -05:00
Ed Page
fd5398316f
fix(parser): Better short base64 detection
...
Previously, we bailed out if the string is too short (<90) and there
weren't non-alpha-base64 bytes present. What we ignored were the
padding bytes.
We key off of padding bytes to detect that a string is in fact base64
encoded. Like the other cases, there can be false positives but those
strings should show up elsewhere or the compiler will fail.
This was called out in #485
2022-05-10 14:02:59 -05:00
Ed Page
bd5048def5
fix(parser): Allow backslashes after ignore items
...
To allow `\\` to start a token, we couldn't let it end a token. By
switching the termiantor to a peek, we can now make it end a token
**and** start a token, allowing us to work better with windows paths.
Fixes #481
2022-05-10 14:02:54 -05:00
Ed Page
1720e7d65e
fix(parser): Ignore items at end of input
2022-05-10 13:38:03 -05:00
Ed Page
7e15afe81f
test(parser): Add reproduction of #481
2022-05-10 12:58:19 -05:00
Ed Page
4869764f7b
test(parser): Remove unclear test case
...
Unsure why this case is here and it causes difficulties
2022-05-10 12:58:13 -05:00
Ed Page
ad89736832
refactor(parser): Clarify precedence levels
2022-05-10 12:58:08 -05:00
Ed Page
9f623c618b
chore: Release
2022-04-28 09:39:14 -05:00
Denis Kasak
29508a689b
feat(dict): Add typo identitiy -> identity
2022-04-28 16:24:18 +02:00
Ed Page
dcc3c0b11e
chore: Release
2022-04-25 11:49:02 -05:00
Jonas Platte
5f5ef1468d
feat(dict): Add 'signign' typo to words.csv
2022-04-25 11:26:08 -05:00
Jonas Platte
bbd71ab434
feat(dict): Add 'unencyrpted' typo to words.csv
2022-04-25 11:25:48 -05:00
SeongChan Lee
4e4f136ec6
Fix tokenizer for uppercase UUID
...
Microsoft toolchains usually emit UUID/GUID in UPPERCASE
2022-04-25 11:12:25 +09:00
Ed Page
7d3e9bb070
chore: Release
2022-04-18 09:39:53 -05:00
Ed Page
e63659c208
fix: Ignore CSS colors
...
Fixes #462
2022-04-18 09:19:44 -05:00
Ed Page
9c273c6cfb
Merge pull request #451 from crate-ci/dependabot/cargo/nom-7.1.1
...
chore(deps): Bump nom from 7.1.0 to 7.1.1
2022-04-01 09:34:31 -05:00
dependabot[bot]
0281c7023e
chore(deps): Bump nom from 7.1.0 to 7.1.1
...
Bumps [nom](https://github.com/Geal/nom ) from 7.1.0 to 7.1.1.
- [Release notes](https://github.com/Geal/nom/releases )
- [Changelog](https://github.com/Geal/nom/blob/main/CHANGELOG.md )
- [Commits](https://github.com/Geal/nom/compare/7.1.0...7.1.1 )
---
updated-dependencies:
- dependency-name: nom
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-04-01 07:02:37 +00:00
dependabot[bot]
40080cb01e
chore(deps): Bump once_cell from 1.9.0 to 1.10.0
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.9.0...v1.10.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-04-01 07:02:26 +00:00
Ed Page
86c54fffbf
style: Update clippy
2022-03-29 15:07:19 -05:00
Ed Page
1d16086495
chore: Release
2022-03-09 08:59:49 -06:00
Ed Page
ab61b33572
Merge pull request #443 from crate-ci/dependabot/cargo/unicode-segmentation-1.9.0
...
chore(deps): Bump unicode-segmentation from 1.8.0 to 1.9.0
2022-03-01 08:30:25 -06:00
dependabot[bot]
a58b735e5e
chore(deps): Bump unicode-segmentation from 1.8.0 to 1.9.0
...
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation ) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases )
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits )
---
updated-dependencies:
- dependency-name: unicode-segmentation
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-03-01 07:03:33 +00:00
dependabot[bot]
f3107c4794
chore(deps): Bump clap from 3.0.13 to 3.1.3
...
Bumps [clap](https://github.com/clap-rs/clap ) from 3.0.13 to 3.1.3.
- [Release notes](https://github.com/clap-rs/clap/releases )
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md )
- [Commits](https://github.com/clap-rs/clap/compare/v3.0.13...v3.1.3 )
---
updated-dependencies:
- dependency-name: clap
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-03-01 07:03:22 +00:00
Ed Page
b686760935
chore: Release
2022-02-14 09:05:09 -06:00
Ed Page
c3bb4adfa1
fix(parser): Allow commas in urls
...
Got us closer to https://www.ietf.org/rfc/rfc3986.txt
Fixes #433
2022-02-14 08:49:55 -06:00
Ed Page
09203fd592
fix(parser): Recognize URLs with passwords
2022-02-14 08:21:56 -06:00
Ed Page
05773fe815
chore: Release
2022-02-08 07:12:19 -06:00
Sebastian Neubauer
fa5a724cec
feat(dict): Add more typos
2022-02-08 13:41:44 +01:00
Ed Page
8ddb09eff3
chore: Update dependencies
2022-02-01 10:34:12 -06:00
dependabot[bot]
a3f39efdc8
chore(deps): Bump clap from 3.0.0 to 3.0.13
...
Bumps [clap](https://github.com/clap-rs/clap ) from 3.0.0 to 3.0.13.
- [Release notes](https://github.com/clap-rs/clap/releases )
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md )
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v3.0.0...v3.0.13 )
---
updated-dependencies:
- dependency-name: clap
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-02-01 07:02:33 +00:00
Ed Page
5b7fe620ec
chore: Release
2022-01-26 14:32:31 -06:00
Ed Page
a39074fc7f
fix(parser): Detect shorter base64 values
...
This is part of the way to #413 . In that case, they aren't providing
padding though.
2022-01-26 14:18:01 -06:00
Ed Page
2c5f2ecedd
chore: Release
2022-01-26 10:01:15 -06:00
Ed Page
3c78d65462
fix(parser): Don't stop on almost-printfs
...
When we added support for printf interopolation, we had to adjust our
separator matching to not eat the start of printf interpolation.
When doing so, I overlooked the need to still eat it in the catch-all.
If we don't, we then try to read `%` as part of the identifier and bail
out early.
Fixes #411
2022-01-26 09:39:23 -06:00
Ed Page
4b2e66487c
chore: Release
2022-01-24 20:35:08 -06:00
Ed Page
0c49c3ea2b
fix(parser): Allow markdown formatting around ordinals
...
Fixes #409
2022-01-24 20:01:06 -06:00
Ed Page
f7fd7c0e42
chore: Release
2022-01-21 10:39:27 -06:00
Ed Page
5598b5b3e9
fix(dict): Workes should also correct to workers
...
Fixes #402
2022-01-21 10:10:56 -06:00
Ed Page
71b53cb23e
chore: Release
2021-12-18 17:52:11 -06:00
Ed Page
5c83dec07b
style: Remove unused variable
2021-12-14 15:41:52 -06:00
Ed Page
469a9aedc2
chore: Release
2021-12-14 12:58:03 -06:00
Frank Steffahn
2748d6a148
fix(dict): Typo in Typos ( #3870
2021-12-14 12:54:48 -06:00
Ed Page
f99eb040de
chore: Update dependencies
2021-12-01 08:05:54 -06:00
Ed Page
3b3a944c93
fix: Detect descrepancy
...
Found this in the clap code base.
2021-11-24 15:09:01 -06:00
Ed Page
c0e8a2c932
chore: Release
2021-11-16 07:46:33 -06:00
Ed Page
8e29e94060
chore: Update cargo-release
2021-11-16 07:44:08 -06:00
Ed Page
3ca0aed0a7
Merge pull request #374 from Flakebi/fix-escape
...
Fix multiple escape sequences
2021-11-15 08:18:41 -06:00
Neubauer, Sebastian
3fc6089660
fix: Fix multiple escape sequences
...
If escape sequences follow straight after each other, there is no
delimiter in-between.
In such a case, parsing previously stopped and did not find any
typos further in the file.
2021-11-15 11:31:53 +01:00
Neubauer, Sebastian
76ec666970
feat(dict): Add more corrections
...
I encountered these when going through a codebase with another tool.
2021-11-12 23:02:08 +01:00
Ed Page
4f17586d08
chore: Update MSRV
2021-11-08 11:56:01 -06:00
Ed Page
a8ae8a5c26
chore: Update boiletplate
2021-11-08 10:11:02 -06:00
Ed Page
153f570ec9
chore: Release
2021-11-03 11:48:12 -05:00
Ed Page
fcac819478
fix: Address false positives
...
Hard to say how to handle `doen't` since we don't handle contractions.
For now, I've gone ahead and added corrections to the part of the
contraction. Hopefully that doesn't confuse people
Part of #362
2021-10-23 08:21:53 -05:00
Ed Page
efae838e5c
perf: Remove some function overhead
...
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
3cd24f5cca
chore: Release
2021-09-14 10:03:34 -05:00
Ed Page
e20879dae1
fix: Reduce false positives from ordinals
...
Just ignoring them since our focus is on programmer typos and these
can't be identifiers. This is simpler and is less work at runtime.
Fixes #331
2021-09-14 08:53:31 -05:00
Ed Page
92e46848a3
chore: Update dependencies
2021-09-01 06:38:52 -05:00
Ed Page
dbea7ab1e0
chore: Release
2021-08-30 09:16:40 -05:00
Ville Skyttä
4fcd7ba16f
feat(dict): Suggest surrounded
for surrouned
too
2021-08-29 21:22:24 +03:00
Nick Mathewson
739d1a2f7c
Ignore hexadecimal "hashes" of length 32 or greater.
...
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text. By treating such strings as ignored, we can resist a larger
category of false positives.
Closes #326 .
2021-08-20 12:34:59 -04:00
Ed Page
613a0cba4b
chore: Iterate on release process
2021-08-16 11:23:25 -05:00
mendess
5747aba05d
Add instantialed as a typo for instantiated
2021-08-06 14:33:50 +01:00
Ed Page
2dce866937
chore: Release
2021-08-02 09:55:25 -05:00
Ed Page
a5f0dd8ee9
fix(token): Continue parsing on c-escape
2021-08-02 09:29:10 -05:00
Ed Page
3e5d2e0620
Merge pull request #324 from epage/escape
...
fix(token): Continue parsing on c-escape
2021-08-02 09:23:42 -05:00
Ed Page
fdeba0e71b
fix(token): Continue parsing on c-escape
2021-08-02 09:11:54 -05:00
dependabot[bot]
febcee3332
chore(deps): Bump env_logger from 0.8.4 to 0.9.0
...
Bumps [env_logger](https://github.com/env-logger-rs/env_logger ) from 0.8.4 to 0.9.0.
- [Release notes](https://github.com/env-logger-rs/env_logger/releases )
- [Changelog](https://github.com/env-logger-rs/env_logger/blob/main/CHANGELOG.md )
- [Commits](https://github.com/env-logger-rs/env_logger/compare/v0.8.4...v0.9.0 )
---
updated-dependencies:
- dependency-name: env_logger
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-08-01 07:05:08 +00:00
Ed Page
2304fc6735
chore: Release
2021-07-30 12:12:07 -05:00
Ed Page
9a8d41fcb2
chore: Release
2021-07-30 12:09:59 -05:00
Ed Page
2202b7f661
fix(parser): Handle c-escape/printf
...
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.
With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).
Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
3049852bfd
fix(dict): Avoid contraction false positive
...
Fixes #317
2021-07-30 10:42:57 -05:00
Ed Page
f60e798a2a
chore: Release
2021-07-27 15:31:01 -05:00
Ed Page
3486c23bdb
chore: Release
2021-07-27 15:29:18 -05:00
Ed Page
49459cede7
feat(dict): Add more corrections
2021-07-27 14:53:13 -05:00
Ed Page
6037eebfdc
style: Clippy
2021-07-27 14:28:16 -05:00
Ed Page
70fbd63b00
fix: Update dictionary
2021-07-27 14:21:00 -05:00
Ed Page
960471ae23
fix: Prevent old typos from coming back
2021-07-27 14:16:13 -05:00
Ed Page
4e99217896
test: Ensure words are stored lowercase
2021-07-27 14:16:12 -05:00
Ed Page
0008713395
test: Ensure words.csv stays sorted
2021-07-27 14:16:12 -05:00
Ed Page
41048d15b3
test: Prevent correcting corrections
2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1
fix: Correcting to typos
2021-07-27 13:58:57 -05:00
Ed Page
5b29113ec8
refactor(typos): Remove unused calculations
...
In #293 , we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
7a2a5042a1
refactor(dict): Remove useless entries
2021-07-02 10:24:59 -05:00
Ed Page
4c2f2c434a
feat(dict): Shared PHF support
2021-07-01 11:14:30 -05:00
Ed Page
3b43272724
refactor(dict): Separate dictgen concerns
2021-07-01 11:00:33 -05:00
Ed Page
c8d1058a71
refactor(dict): Change typos-dict to trie
...
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
bbbf985777
perf(dict): Switch varcon to a burst-trie
...
This cuts varcon lookup times in half but I still suspect slower than
phf. Like with bsearch and unlike, the cost is consistent between hits
and misses.
At least this doesn't have the compile hit of PHF + unicase. Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb
refactor(dict): Be more cache concious
2021-06-30 19:56:03 -05:00
Ed Page
f176055834
refactor(dict): Make room for trie logic
2021-06-30 19:56:03 -05:00
Ed Page
a1e95bc7c0
refactor(dict): Pull out table-lookup logic
...
Before, only some dicts did we guarentee were pre-sorted. Now, all are
for-sure pre-sorted.
This also gives each dict the size-check to avoid lookup.
But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82
chore: Skip more releases
2021-06-29 15:39:28 -05:00
Ed Page
9149c4765d
chore: Release
2021-06-29 15:05:18 -05:00
Ed Page
c83f655109
feat(parser): Ignore URLs
...
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146
fix(parser): Ensure we get full base64
...
We greedily matched separators, including ones that might be part of
base64. This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b
feat(parser): Ignore emails
...
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).
This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6
feat(parser): Ignore base64
...
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).
Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796
feat(parser): Ignore SHA-1+
...
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b
fix(parser): Go ahead and do lower UUIDs
...
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1
feat(parser): Ignore UUIDs
...
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase. RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case. The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682
refactor(typos)!: Bake ignores into parser
...
This is prep for other items to be ignored
BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens. Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
ded90f2387
perf(parser): Auto-detect unicode
...
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
95417f3a41
refactor(parser): Consolidate utf8/ascii logic
2021-06-29 05:10:02 -05:00
Ed Page
83b2804623
fix(ci): Don't fail codegen checks
2021-06-28 14:06:47 -05:00
Ed Page
4066d21790
style: Address clippy
2021-06-28 13:51:06 -05:00
Ed Page
3a4d039c4f
chore: Reduce code-gen memory usage
...
More `const fn` removals to reduce compilation memory use
2021-06-07 08:58:34 -05:00
Ed Page
04f5d40e57
chore: Release
2021-06-05 14:39:37 -05:00
Ed Page
2b1f565eaa
refactor(varcon): Remove reliance on const-fn
...
This dropped RSS (memory usage) from 4GB to 1.5GB when compiling.
The extra `match` could impact performance but not too concerned since
the default is to not look within vars.
2021-06-04 15:01:08 -05:00
Ed Page
b1cf03c7eb
refactor(varcon): Move away from PHF
...
This is mostly to give implementation flexibility for changing out how
we store the data to reduce compilation memory usage.
This does have performance impact, jumping from ~220ns to ~320ns for a
dict lookup, according to our micro benchmarks.
2021-06-04 14:59:46 -05:00
Ed Page
1cb9b37120
chore: Update codespell dict
...
Based on 2ed354c at https://github.com/codespell-project/codespell
2021-05-22 21:44:56 -05:00
Ed Page
3e66a99674
chore: Release
2021-05-21 20:41:02 -05:00
Ed Page
3995745362
chore: Release
2021-05-21 20:39:12 -05:00
Ed Page
b99f32dea8
perf(dict): Bypass vars when possible
...
Variant support slows us down by 10-50$. I assume most people will run
with `en` and so most of this overhead is to waste. So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to. If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.
Before:
```
check_file/Typos/code time: [35.860 us 36.021 us 36.187 us]
thrpt: [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time: [26.966 ms 27.215 ms 27.521 ms]
thrpt: [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code time: [33.837 us 33.928 us 34.031 us]
thrpt: [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time: [17.521 ms 17.620 ms 17.730 ms]
thrpt: [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```
This puts us inline with `--no-default-features --features dict`
Fixes #253
2021-05-19 13:55:41 -05:00
Ed Page
639e65b88a
fix(dict): Handle cases from Linux
...
These were found while running `typos` on Linux and inspecting a
sampling of the results. #249 represents additional changes to make.
There were some identifiers, that looked like hardware registers, that
I'm unsure of what can be done for them.
2021-05-18 12:02:03 -05:00
Ed Page
fb0dac4297
refactor(dict): Allow 0..n corrections in BuiltIn
...
The main use case is taking `ther` -> `there` and adding `the` and
`their`.
2021-05-18 12:02:03 -05:00
Ed Page
77cfccb392
refactor(varcon): Clarify check's meanings
2021-05-15 19:29:27 -05:00
Ed Page
b830872ad0
chore: Update enumflags2
2021-05-13 10:20:15 -05:00
Ed Page
7c803681c4
chore: Release
2021-05-13 09:58:09 -05:00
Ed Page
3b9061dece
Merge pull request #240 from crate-ci/dependabot/cargo/codegenrs-1.0.0
...
chore(deps): Bump codegenrs from 0.1.5 to 1.0.0
2021-05-01 09:04:51 -05:00
dependabot[bot]
d72fa7acba
chore(deps): Bump codegenrs from 0.1.5 to 1.0.0
...
Bumps [codegenrs](https://github.com/crate-ci/codegenrs ) from 0.1.5 to 1.0.0.
- [Release notes](https://github.com/crate-ci/codegenrs/releases )
- [Changelog](https://github.com/crate-ci/codegenrs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/crate-ci/codegenrs/compare/v0.1.5...v1.0.0 )
Signed-off-by: dependabot[bot] <support@github.com>
2021-05-01 07:01:59 +00:00
Ed Page
6216fa0837
fix(dict)!: Clarify word sizes with Ranges
...
The code was generated with separate min / max, rather than using a
Range and ensuring the API is used correctly.
2021-04-30 21:33:33 -05:00
Ed Page
f40ed5a328
style: Address clippy
2021-04-30 11:37:16 -05:00
Ed Page
517da7ecd2
perf(parser): Allow people to bypass unicode cost
2021-04-29 21:07:59 -05:00
Ed Page
09d2124d0f
perf(parser): Limit inner-loop assers
2021-04-29 18:31:05 -05:00
Ed Page
287c4cbfe9
refactor(parser): Give more impl flexibility
2021-04-29 18:31:05 -05:00
Ed Page
9cbc7410a4
fix(parser)!: Defer to Unicode XID for identifiers
...
This saves us from having to have configuration for every detail. If
people need more control, we can offer it later.
Fixes #225
2021-04-29 18:30:57 -05:00
Ed Page
f15cc58f71
fix(parser): Flip leading digits to work correctly
2021-04-29 18:30:14 -05:00
Ed Page
4b94352b7a
perf(parser): Try hand-rolled number parsing
2021-04-29 18:30:14 -05:00
Ed Page
6b92e345cc
perf(parser): Speed up UTF-8 validation
2021-04-27 21:17:46 -05:00
Ed Page
819702c82f
refactor(parser): Unify str/bytes code paths
...
The main goal is to support replacing the parser with `nom` where I need
access to `str` only functionality.
With crates like simdutf8, this might also offer up performance gains
since they see the biggest benefit when doing large blocks of
validation.
2021-04-27 21:17:43 -05:00
Ed Page
fce11d6c35
refactor(parser)!: Allow short-circuiting word splitting
...
This is prep for experiments with getting this information ahead of
time.
See #224
2021-04-27 21:17:38 -05:00
Ed Page
9bfb506c6d
fix(typos)!: Clarify Case::Upper
s name
...
`Scream` was referrin to `SCREAMING_CASE` but outside of that context, I
think `Upper` is more accurate.
2021-04-21 20:36:35 -05:00
Ed Page
1f4c587692
chore({{crate_name}}): Release {{version}}
2021-04-14 19:13:25 -05:00
Ed Page
b4459bef33
chore: Fix readme paths in Cargo.toml
2021-04-13 21:36:47 -05:00
Ed Page
d7978658d4
test(cli): Ensure we apply corrections
2021-04-10 19:13:48 -05:00
Ed Page
b5f606f201
refactor(typos): Simplify the top-level API
2021-03-01 11:50:23 -06:00
Ed Page
1010d2ffe5
refactor(tokenizer): Remove stale function
2021-03-01 11:50:23 -06:00
dependabot-preview[bot]
b8d3190ce9
chore(deps): bump itertools from 0.9.0 to 0.10.0
...
Bumps [itertools](https://github.com/bluss/rust-itertools ) from 0.9.0 to 0.10.0.
- [Release notes](https://github.com/bluss/rust-itertools/releases )
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/bluss/rust-itertools/compare/v0.9.0...v0.10.0 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2021-01-03 03:40:45 +00:00
Ed Page
67222e9338
style: Address clippy
2021-01-02 13:49:28 -06:00
Ed Page
692f0ac095
refactor(typos): Focus API on primary use case
2021-01-02 13:10:40 -06:00
Ed Page
aba85df435
docs(typos): Clarify intent
2021-01-02 13:10:40 -06:00
Ed Page
48112a47e9
refactor(parser): Abstract over lifetimes
2021-01-02 13:10:30 -06:00
Ed Page
bc90bacff2
refactor(typos): Pull out file logic
2021-01-02 13:10:30 -06:00
Ed Page
e741f96de3
refactor(typos): Decouple parsing from checks
2021-01-02 13:10:22 -06:00
Ed Page
1e64080c05
refactor(typos): Open up the name Parser
2021-01-02 12:58:33 -06:00
Ed Page
7fdd0dee16
style(typos): Make parser ordering clearer
2021-01-02 12:58:33 -06:00
Ed Page
d5a781fd25
Merge pull request #179 from crate-ci/dependabot/cargo/unicode-segmentation-1.7.1
...
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
2020-12-01 08:24:59 -06:00
dependabot-preview[bot]
5640d23b95
chore(deps): bump csv from 1.1.4 to 1.1.5
...
Bumps [csv](https://github.com/BurntSushi/rust-csv ) from 1.1.4 to 1.1.5.
- [Release notes](https://github.com/BurntSushi/rust-csv/releases )
- [Commits](https://github.com/BurntSushi/rust-csv/compare/1.1.4...1.1.5 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:19:21 +00:00
dependabot-preview[bot]
7fa5a9eadf
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
...
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation ) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases )
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:18:35 +00:00