Commit graph

755 commits

Author SHA1 Message Date
Martin Fischer
d3e48b202b fix(cli): Allow ot and stap in sh 2023-08-08 18:24:44 +02:00
Martin Fischer
e5e8f25c8e fix(cli): Allow nd in css 2023-08-08 18:24:44 +02:00
Martin Fischer
5181c42c57 fix(cli): Allow Nd in man pages 2023-08-08 18:24:44 +02:00
Ed Page
6f0c32c802
Merge pull request #793 from not-my-profile/refactor-file-type-specifics
refactor(cli): Introduce file_type_specifics module
2023-08-08 11:22:10 -05:00
Martin Fischer
d9a1085018 refactor(cli): Abstract away regex ignores
This isn't perfect as this only helps when doing checks and not in the
parsing impls.

This supersedes #797
2023-08-08 10:40:22 -05:00
Martin Fischer
5dbe0948d3 docs(cli): Add comments to file-type specific ignores 2023-08-08 06:22:34 +02:00
Martin Fischer
fa39bca152 refactor(cli): Introduce file_type_specifics module
This makes the definition of file-type specifics less repetitive.

Resolves #759.
2023-08-08 06:22:34 +02:00
Ed Page
d4258b1aa0 fix(cli): Remove stray character on disallowed words 2023-08-07 16:24:51 -05:00
Ed Page
a3bf84ade6 chore(dict): Don't clear disallowed words 2023-08-07 16:24:51 -05:00
Ed Page
8a7996b4bc chore: Release 2023-08-01 10:43:28 -05:00
Ed Page
6e72bdc74c feat(dict): July updates
Fixes #777
2023-08-01 10:28:23 -05:00
Ed Page
143cc59fab chore: Release 2023-07-14 14:05:04 -05:00
Ed Page
e981fc41fb chore: Release 2023-07-14 14:04:02 -05:00
Ed Page
e14c4725cd chore: Release 2023-07-14 14:03:38 -05:00
Ed Page
2861ad8299 chore: Release 2023-07-14 14:03:01 -05:00
Ed Page
5a63cb3be6 chore: Release 2023-07-14 14:02:39 -05:00
Ed Page
ea0db833b5 chore: Release 2023-07-14 14:01:55 -05:00
Ed Page
37e2b40f24 chore: Release 2023-07-14 14:01:12 -05:00
Ed Page
a2c9d2076a Merge remote-tracking branch 'upstream/master' 2023-07-14 14:00:31 -05:00
Ed Page
9fa116eaf6 chore: Release 2023-07-14 13:59:11 -05:00
Ed Page
b6c78eb8ac refactor(typos): Upgrade to winnow 0.5 2023-07-14 13:29:24 -05:00
Ed Page
6f40717c8f refactor(typos): Switch to BStr for better debugging 2023-07-14 13:28:36 -05:00
Ed Page
4fd4537856 fix(varcon)!: Make API independent of winnow 2023-07-14 12:48:41 -05:00
Ed Page
6cc3e3f9e0 refactor(varcon)!: Upgrade to winnow 0.5 2023-07-14 12:44:39 -05:00
Ed Page
9426924f8f fix: Hide optional dependencies 2023-07-14 12:33:02 -05:00
Ed Page
0bde06af9a chore(varcon): Add parse tracing 2023-07-14 12:32:16 -05:00
Ed Page
e98fc52b0d chore(typos): Add parse tracing 2023-07-14 12:32:07 -05:00
Ed Page
a1ad167632 refactor(varcon): Resolve winnow deprecations 2023-07-14 12:23:13 -05:00
Ed Page
ca9612c045 chore: Release 2023-07-10 10:01:29 -05:00
Ed Page
f69eec1ce3
Merge pull request #729 from scop/feat/trim-in-extension
feat(cli): Strip `.in` suffix(es) before attempting filename match
2023-07-10 09:59:19 -05:00
Ville Skyttä
d6ac36f057 fix(cli): Make .in stripping work with non-UTF-8 filenames 2023-07-09 12:01:53 +03:00
Ed Page
ccdede0f8c chore: Release 2023-07-03 09:34:49 -05:00
Ed Page
25a5437d1f feat(dict): June updates
Fixes #733
2023-07-03 09:18:24 -05:00
Ed Page
2158ddd42c chore: Release 2023-06-30 09:43:35 -05:00
Ed Page
8913c8af5f fix(dict): Don't correct contiguities 2023-06-30 09:31:04 -05:00
Ed Page
2f61fa1697 chore: Release 2023-06-29 11:13:01 -05:00
Martin Fischer
b4cc2ed919 fix(dict): Don't correct unuseful 2023-06-26 23:15:46 +02:00
Martin Fischer
043179a22e fix(dict): Don't correct intension, intensional & intensionally 2023-06-26 23:15:13 +02:00
Martin Fischer
983b6a5827 fix(dict): Don't correct pervious & perviously 2023-06-26 23:14:39 +02:00
Martin Fischer
989136e755 fix(dict): Don't correct simulative 2023-06-26 23:13:37 +02:00
Ed Page
888116ae2f chore: Release 2023-06-26 15:43:01 -05:00
Martin Fischer
5553044346 fix(dict): Restore correction orders
This commit restores the correction orders lost in
09f096a096.

The commit was generated by running the commands:

    git show 09f096a0968e61e22c963a024c8c3d74453d812a~:./assets/words.csv HEAD:./assets/words.csv > assets/words.csv
    SNAPSHOTS=overwrite cargo test verify
    SNAPSHOTS=overwrite cargo test codegen
2023-06-26 22:14:06 +02:00
Martin Fischer
8d026ac23e feat(dict): Preserve correction order
We want to be able to recommend more likely corrections first,
e.g. for "poped" we want to recommend "popped" before "pooped".
2023-06-26 22:13:47 +02:00
Martin Fischer
89d5a97a8a test: Add some tests for dict processing logic 2023-06-26 19:23:27 +02:00
Martin Fischer
49a0eaab7b refactor: Make dict processing logic testable
Previously all the dictionary cleanup logic was in the function:

    fn generate<W: std::io::Write>(file: &mut W, dict: &[u8])

which parsed the provided buffer as CSV and also took care of writing
the processed dictionary back as CSV.  This commit factors out the CSV
handling, leaving a `process` function behind so that it can be easily
tested in the following commit.
2023-06-26 19:22:04 +02:00
Ed Page
2fffb1bb2b chore: Release 2023-06-26 09:05:53 -05:00
Ed Page
37d4d626b6 fix(dict): Dont correct currency code
Fixes #767
2023-06-26 08:48:57 -05:00
Ed Page
d9e1ae0a39 chore: Release 2023-06-22 12:33:50 -05:00
Ed Page
5f1d3c23bc fix(config): Force-skip config files
This doesn't use `extend-exclude` which means that `typos typos.toml`
will stil be skipped

This doesn't just skip the currently loaded config but any file name
that looks like a config, which might be a big aggressive but allows us
to do layered config in the future....  We've been saying that for a
while.

Fixes #711
2023-06-22 12:20:07 -05:00
Ed Page
4ccd3cba61
Merge pull request #757 from figsoda/deprecated
chore: remove usages of deprecated functions
2023-06-22 10:39:16 -05:00
figsoda
ef1907fb1e chore: Remove usages of deprecated functions 2023-06-22 11:22:46 -04:00
Ed Page
8074cc6029 chore: Release 2023-06-22 10:13:38 -05:00
Ed Page
9e293916d2 fix(config): Always apply type defaults
Fixes #760
2023-06-22 09:57:15 -05:00
Ed Page
72c773aade test(config): Verify bad merge 2023-06-22 09:16:23 -05:00
Ed Page
84c8b30e06 chore: Release 2023-06-21 14:51:20 -05:00
Ed Page
fce92e4e5f fix(config): User file types override default file types
This also ensures `typos --type-list` will report the glob in only one
place.

Fixes #754
2023-06-21 14:31:04 -05:00
Ed Page
77f0389a9e test(config): Reproduce types bug 2023-06-21 14:17:19 -05:00
Ed Page
1a4e9428d6 chore: Release 2023-06-20 09:25:04 -05:00
Ed Page
427d127e8a fix: Ensure stdout is locked
In 3a29410c1b, we switched to anstream
which doesn't seem to be locking properly (rust-lang/cargo#12289).  For
now, we are working around it.

Fixes #749
2023-06-20 09:04:31 -05:00
Ed Page
59888680cc chore: Release 2023-06-19 10:28:02 -05:00
Tobias Klauser
28fbdec824
fix(dict): Don't correct "accreting"
"accreting" is the present participle of "accrete", see
https://en.wiktionary.org/wiki/accreting. Don't correct it to
"accrediting".
2023-06-19 16:58:17 +02:00
Viktor Szépe
16b84f46fd
Remove trailing space from Wikipedia dictionary 2023-06-08 17:57:56 +02:00
Ed Page
f37d3f8e3c chore: Release 2023-06-08 10:29:00 -05:00
Ed Page
172ff0bb5a chore: Release 2023-06-08 10:27:43 -05:00
Ed Page
7523a865f7 feat(dict): Pull in codespell items 2023-06-08 09:26:12 -05:00
Ed Page
a9d9fc03a2 test(dict): Report more cases to user 2023-06-08 09:23:10 -05:00
Ed Page
09f096a096 chore(dict): Automate more cleanup 2023-06-08 08:54:36 -05:00
Ed Page
ccc72d7b42 fix: Update 3rd party dicts 2023-06-08 07:56:08 -05:00
Ed Page
20b36ca07f chore: Release 2023-06-01 19:49:16 -05:00
Ed Page
27c9fe7c79 chore: Release 2023-06-01 06:22:12 -05:00
Ed Page
5277bf390e fix(dict): Allow plural deque -> deques 2023-06-01 06:07:12 -05:00
Ed Page
0ded1c8a4e fix(dict): Include May updates 2023-06-01 06:02:02 -05:00
Ed Page
a78f83bab3
Merge pull request #731 from crate-ci/renovate/criterion-0.x
chore(deps): update rust crate criterion to 0.5
2023-06-01 09:45:34 -05:00
renovate[bot]
e06f63c31d chore(deps): update compatible 2023-06-01 02:54:09 +00:00
renovate[bot]
9aa8d04d94
chore(deps): update rust crate criterion to 0.5 2023-06-01 00:53:02 +00:00
Ville Skyttä
9c74d015f3 feat(cli): Strip .in suffix(es) only on non-match
Makes user assigned `.in` work.
2023-05-25 15:24:04 +03:00
Ville Skyttä
90d4676dd7 feat(cli): Strip .in suffix(es)
`.in` is typically used for build system template input files,
containing some placeholders to replace. In some cases, multiple rounds
of replacements are used, each with their own `.in`, so remove all
trailing instances of it before attempting a filename match.

Closes https://github.com/crate-ci/typos/issues/727
2023-05-24 22:54:45 +03:00
Ed Page
38a1b19481 chore: Release 2023-05-22 13:44:05 -05:00
Ed Page
5c98b91f18 chore: Release 2023-05-19 08:51:04 -05:00
Ed Page
adfd866ed9 test(cli): Check more former false positives 2023-05-19 08:18:25 -05:00
Ed Page
641e734fe7 fix(dict): Dont correct add-ons
Fixes #721
2023-05-19 08:13:01 -05:00
Ed Page
9e01ccbd3e test(cli): Prevent false-positive regressions 2023-05-19 08:11:14 -05:00
Ed Page
78a3c66d00 chore: Release 2023-05-03 08:57:26 -05:00
Ed Page
b5b09d7129 chore: Release 2023-05-03 08:56:01 -05:00
Ed Page
83b6d30708
Merge pull request #719 from epage/april
fix: Add April, 2023's typos
2023-05-03 08:52:35 -05:00
Ed Page
f7c2691b63 fix: Add April, 2023's typos
Fixes #705
2023-05-03 08:28:35 -05:00
renovate[bot]
5131fb8167
chore(deps): update compatible 2023-05-01 15:30:51 +00:00
Ed Page
808e862bfb chore: Resolve deprecations 2023-04-27 23:26:55 -05:00
Ed Page
d17ca898d9 chore: Upgrade to 0.4.3 2023-04-27 23:24:25 -05:00
Ed Page
9433f016bb style: Fix formatting 2023-04-24 00:11:34 -05:00
Ed Page
64e40cffee chore: Release 2023-04-19 09:47:11 -05:00
Ed Page
78058ce3e3 chore: Release 2023-04-19 08:35:04 -05:00
Ed Page
7f65ff4f24 chore: Release 2023-04-12 22:24:06 -05:00
Ed Page
5145767575 chore: Update anstyle 2023-04-12 21:52:15 -05:00
Ed Page
217e403326 docs(cli): Show SSL cipher suites
See #438
2023-04-11 01:17:33 -05:00
Ed Page
66d82e5e51 chore: Release 2023-03-30 07:50:08 -05:00
Ed Page
8db59330b7 test(cli): Add UTF16 test 2023-03-30 07:45:24 -05:00
Ed Page
ae7f313230 fix(cli): Actually decode UTF-16
Two problems
- I thought we had a UTF-16 test but apparently we didn't
- I didn't read enough fine print in the `encoding_rs` API

These combined meant the last release completely broke UTF-16 support.
2023-03-30 07:27:55 -05:00
Ed Page
144ee4d018 chore: Release 2023-03-29 21:55:35 -05:00
Ed Page
15cdad2a3f chore: Release 2023-03-29 21:54:57 -05:00
Ed Page
039edba3de fix(dict): Add March's typos
Fixes #677
2023-03-29 21:40:36 -05:00
Ed Page
98be58dbc9 refactor: Switch out the UTF-16 encoding impl
Fixes #702
2023-03-29 20:42:48 -05:00
renovate[bot]
e1a138b637 chore(deps): update compatible 2023-04-01 07:05:05 +00:00
Ed Page
6cf303d421 chore: Release 2023-03-18 04:20:06 -05:00
Ed Page
53e2855fa0 chore: Release 2023-03-18 04:19:19 -05:00
Ed Page
8a6fc1895d chore: Release 2023-03-18 04:18:47 -05:00
Ed Page
243b4efc9e chore: Update winnow 2023-03-18 04:11:55 -05:00
Ed Page
08f154e45b test: Try to fix CI 2023-03-18 02:15:16 -05:00
Ed Page
e15de8b72e chore: Release 2023-03-18 02:09:49 -05:00
Ed Page
ac46a6ba54 feat(config): Custom ignores
Typos primarily works off of identifiers and words.  We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc.  This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.

This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes.  Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.

While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question.  Ideally, we'd implement document settings / directives for
these cases (#316).
2023-03-18 01:25:39 -05:00
Ed Page
9d376417a0 test: Baseline for generic ignore 2023-03-18 01:20:01 -05:00
Ed Page
a1a601195e chore: Release 2023-03-17 23:59:36 -05:00
Ed Page
e4d2d0e54d fix: Actuall ignore ignored identifiers 2023-03-17 23:47:25 -05:00
Ed Page
2ee6ef4654 test(cli): Show extend-ignore-identifiers-re bug 2023-03-17 23:45:54 -05:00
Ed Page
0eae00fee2 test(cli): Consolidate files 2023-03-17 23:30:24 -05:00
Ed Page
797574c10a chore: Release 2023-03-17 23:09:02 -05:00
Ed Page
af90817e50 feat(dict): extend-ignore-identifiers-re support
This opens the door for users to provide patterns for identifiers that
are always valid.  The key limitation is "identifiers".  Run `typos
--identifiers` to verify what you are trying to write the regex for.

Fixes #651
2023-03-17 22:40:55 -05:00
Ed Page
03286f0f82 chore: Release 2023-03-16 05:22:42 -05:00
Ed Page
52e1743c58 chore: Update to anstream 2023-03-16 05:07:38 -05:00
Ed Page
57502b53cc chore: Release 2023-03-16 03:47:08 -05:00
Ed Page
720bd7b28c fix(dict): Allow commitish
Fixes #690
2023-03-16 03:20:57 -05:00
Ed Page
9504315f7e chore: Update Styled pattern 2023-03-14 08:22:10 -05:00
Ed Page
dc0eafc7e5 chore: Release 2023-03-14 02:14:39 -05:00
Ed Page
3a29410c1b fix: Improve color env variable support
- `CLICOLOR=1` now works correctly
- `NO_COLOR=` now works correctly
- Auto-enable colors in CI


For running `typos` on the Linux kernel (176,210 typos to be printed), we went from 20.082s to
<20.450s.  Where in that range is unclear due to jitter in my system.
```console
$ hyperfine -L typos ./typos-main,./typos-anstream "{typos} ../../../linux" -i
Benchmark 1: ./typos-main ../../../linux
  Time (mean ± σ):     20.082 s ±  0.111 s    [User: 39.668 s, System: 0.474 s]
  Range (min … max):   19.961 s … 20.331 s    10 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: ./typos-anstream ../../../linux
  Time (mean ± σ):     20.426 s ±  0.104 s    [User: 40.301 s, System: 0.523 s]
  Range (min … max):   20.316 s … 20.661 s    10 runs

  Warning: Ignoring non-zero exit code.

Summary
  './typos-main ../../../linux' ran
    1.02 ± 0.01 times faster than './typos-anstream ../../../linux'

$ CLICOLOR_FORCE=1 hyperfine -L typos ./typos-anstream "{typos} ../../../linux" -i
Benchmark 1: ./typos-anstream ../../../linux
  Time (mean ± σ):     20.262 s ±  0.075 s    [User: 39.961 s, System: 0.542 s]
  Range (min … max):   20.154 s … 20.420 s    10 runs

  Warning: Ignoring non-zero exit code.

$ CLICOLOR=0 hyperfine -L typos ./typos-anstream "{typos} ../../../linux" -i
Benchmark 1: ./typos-anstream ../../../linux
  Time (mean ± σ):     20.296 s ±  0.065 s    [User: 40.003 s, System: 0.565 s]
  Range (min … max):   20.169 s … 20.383 s    10 runs

  Warning: Ignoring non-zero exit code.
```
2023-03-13 23:01:45 -05:00
Ed Page
28e7f17a65 chore: Release 2023-03-13 20:45:44 -05:00
WaterLemons2k
6855a78893 feat(ci): Use composite to run action
Using composite instead of docker to avoid building image,
it can make the action faster.

If the `typos` command doesn't exist, download and extract it.
2023-03-14 23:04:09 +08:00
Ed Page
34289639e6 chore: Release 2023-03-13 12:33:33 -05:00
Ed Page
4de8a7c366
Merge pull request #679 from phip1611/erronerous
feat(dict): Add: erronerous -> erroneous
2023-03-13 12:32:01 -05:00
Ed Page
6003b48885
Merge pull request #680 from phip1611/existend
change(dict): existend => existed,existent
2023-03-13 12:31:45 -05:00
Ed Page
3d0de83fb6 chore: Release 2023-03-13 11:57:15 -05:00
Eric Fu
4eeb460bb7 fix: ignore go.mod by default 2023-03-13 23:53:05 +08:00
Philipp Schuster
cc26a8f693 change: existend => existed,existent 2023-03-12 14:42:09 +01:00
Philipp Schuster
500b9e3445 add: erronerous -> erroneous 2023-03-12 13:29:49 +01:00
Ed Page
5f7454815c
Merge pull request #685 from epage/shuffle
fix(pre-commit): Separate cli from pre-commit package
2023-03-13 09:32:39 -05:00
Ed Page
13dbffcf7c fix(pre-commit): Separate cli from pre-commit package
Fixes #682
2023-03-08 10:19:20 -06:00
Jonas Platte
f8ec64571f
feat(dict): Add empheral -> ephemeral 2023-03-08 14:01:24 +01:00
Ed Page
08a9831825 chore: Release 2023-03-05 20:43:46 -06:00
George Dietrich
3feda3ca91
feat: Add additional corrections 2023-03-06 09:41:47 -05:00
Ed Page
f4293b58c5 chore: Release 2023-02-28 06:30:27 -06:00
Ed Page
ce20d2f220 chore: Update dependencies 2023-02-28 05:46:19 -06:00
Ed Page
d752626069 chore: Update dependencies 2023-02-27 23:34:02 -06:00
Ed Page
8cef23a8f4 chore: Release 2023-02-27 16:14:07 -06:00
Jonas Platte
afcd316ddd
feat(dict): Add more encryption-related typos 2023-02-28 19:03:44 +01:00
Ed Page
9f306b2be9 chore: Release 2023-02-27 15:33:00 -06:00
Damian Barabonkov
6a4e0ead52 feat(dict): Add grouepd -> grouped 2023-02-28 18:19:46 +01:00
Ed Page
59a10c298a chore: Release 2023-02-23 10:44:11 -06:00
Ed Page
7cd5a8c99f fix(dict): Don't correct Referer 2023-02-23 10:24:07 -06:00
Ed Page
ed8683ab81 chore: Release 2023-02-22 11:26:17 -06:00
Ed Page
1ca59423d4 chore: Release 2023-02-22 11:25:22 -06:00
Ed Page
d99eb1601b refactor: Resolve deprecations 2023-02-21 11:11:24 -06:00
Ed Page
15e748d0e5 refactor: Switch to winnow 2023-02-21 10:41:45 -06:00
Jiralite
17cc43aaca
feat: Add "someoene" 2023-02-13 14:48:09 +00:00
Ed Page
6e14cefb85 chore: Release 2023-02-01 10:09:00 -06:00
Ed Page
adce192ca3 chore: Update dependencies 2023-02-01 09:31:38 -06:00
Jiralite
9094b0b9aa
feat: Add 3 typos 2023-02-01 14:34:33 +00:00
Ed Page
97770bdd02 chore: Release 2023-01-25 10:31:49 -06:00
Naïm Favier
3817b97017
fix(dict): "substitutents" → "substituents"
is a more likely replacement than "substitutes".
2023-01-24 15:53:38 +01:00
Naïm Favier
d9ace5fd25
fix(dict): "substituters" is valid
https://en.wiktionary.org/wiki/substituters
2023-01-24 15:52:27 +01:00
Olivier Delhomme
ee8446b127 Adds 'regylar' as a typo for 'regular'. 2023-01-17 20:37:35 +01:00
Ed Page
12c6491895 chore: Release 2023-01-16 08:43:06 -06:00
Jonas Platte
5391527894
feat(dict): Add serialzie -> serialize 2023-01-16 13:37:21 +01:00
renovate[bot]
4f6f07b904 chore(deps): update compatible 2023-01-01 02:13:39 +00:00
Ed Page
1d8996e205 chore: Release 2022-12-06 13:54:17 -06:00
Ed Page
c963f68083 fix(dict): Remove nilable
See conversation in #613
2022-12-06 10:47:14 -06:00
Ed Page
98c3a33cc6 chore: Release 2022-12-01 20:00:28 -06:00
renovate[bot]
aa2789b65f chore(deps): update safe 2022-12-02 00:14:56 +00:00
Ed Page
905a150be8 chore: Release 2022-11-21 22:25:30 -06:00
Nathan Baulch
d7b3b548f0 feat(dict): 133 assorted typos 2022-11-21 21:48:40 -06:00
Ed Page
39b28c3010 chore: Release 2022-11-03 22:28:10 -05:00
Ed Page
87a02e2a2a chore: Switch to workspace inheritance 2022-11-01 14:20:38 -05:00
Ed Page
1cd8a74031 chore: Upgrade dependencies 2022-11-01 14:14:35 -05:00
Ed Page
2dca8dea3c chore: Bump versions 2022-10-25 07:10:22 -05:00
Jonas Platte
f0e268bb7e
feat(dict): Add decreypted -> decrypted 2022-10-25 12:10:43 +02:00
Ed Page
4e7e37799b chore: Release 2022-10-20 07:08:07 -05:00
Jonas Platte
02afa6e98b
feat(dict): Add wrappning -> wrapping 2022-10-20 10:07:39 +02:00
Ed Page
2fc71b2e13 chore: Release 2022-10-11 09:48:09 -05:00
Robert
611bd09d9d
feat(dict): Add assorted typos
baged -> bagged
baged -> badge
codesbase -> codebase (+ variants)
depercate -> deprecate (+ variants)
fallthough -> fallthrough
2022-10-11 15:14:16 +11:00
Ed Page
16ca0accbb chore: Release 2022-10-06 08:26:11 -05:00
Jonas Platte
6d6713180e
feat(dict): Add whaat -> what 2022-10-06 14:27:29 +02:00
Ed Page
2b667ffe55 fix: Correctly calculate trie 2022-10-04 10:57:28 -05:00
Ed Page
f78135acd2 chore: Bump MSRV to 1.64.0 2022-10-04 10:51:03 -05:00
Ed Page
32485c4bad chore: Upgrade dependencies 2022-10-03 11:36:25 -05:00
Ed Page
fd5abef1a7 chore: Release 2022-09-22 13:38:17 -05:00
Robert
bcd622e33c
feat(dict): Add 'targest -> target' 2022-09-22 16:07:12 +10:00
Ed Page
668a94791b chore: Release 2022-09-21 19:28:48 -05:00
Frank Steffahn
a2cc907420 feat(dict): Add 'pararmeter -> parameter' 2022-09-22 01:59:52 +02:00
Ed Page
384aaef311 chore: Release 2022-09-15 08:43:18 -05:00
Jonas Platte
e8e20f28bb
feat(dict): Add 'stte' typo 2022-09-15 10:35:19 +02:00
Ed Page
3161cd6a82 chore: Release 2022-09-06 09:25:25 -05:00
Yuta Hayashibe
7ee918f078 Removed their from the correction candidate of thje 2022-09-06 23:01:10 +09:00
Yuta Hayashibe
d207af69ae Add some typos 2022-09-06 19:33:15 +09:00
Ed Page
c6d876294c chore: Release 2022-09-01 10:43:57 -05:00
Ed Page
7f470e1721 Revert "Revert "fix: remove thead -> thread""
This reverts commit 1e58c65276.
2022-09-01 10:28:53 -05:00
Ed Page
c49aff00be test: Make platform agnostic 2022-09-01 07:15:42 -05:00
Ed Page
fdb425c279 chore: Release 2022-08-30 09:28:57 -05:00
Robert
5483e8976a
feat(dict): add typos from Fig monorepo 2022-08-30 13:10:21 +10:00
Ed Page
4c2445fb57 chore: Release 2022-08-25 16:24:58 -05:00
Ed Page
cb91b89080 fix(parse): Ignore CSS hex values that start with digits
Fixes #542
2022-08-25 16:05:57 -05:00
Ed Page
0612303e7d chore: Release 2022-08-23 09:24:26 -05:00
Ed Page
5896efe198
Merge pull request #540 from epage/typo
fix: Misc config updates
2022-08-23 09:22:23 -05:00
Ed Page
1e58c65276 Revert "fix: remove thead -> thread"
This reverts commit 69f89505d8.
2022-08-23 08:21:47 -05:00
Jonas Platte
272ac51fdb
feat(dict): Add typos for "inappropriate[ly]" 2022-08-17 13:56:52 +02:00
Ed Page
c7e576614e chore: Release 2022-08-16 07:56:56 -05:00
Ed Page
2fce5f7f09 fix: Remove unused log dependency 2022-08-16 07:56:31 -05:00
Ed Page
62847112ff chore: Release 2022-08-16 07:53:32 -05:00
Ed Page
2d51f44345 fix: Remove extra build dependency 2022-08-16 07:52:51 -05:00
Ed Page
9b70dca40c chore: Release 2022-08-16 07:49:04 -05:00
Ed Page
d40d24b811
Merge pull request #537 from epage/thead
fix: remove thead -> thread
2022-08-16 07:47:32 -05:00
Uyarn
69f89505d8 fix: remove thead -> thread
This supersedes #533

Fixes #532
2022-08-16 07:40:30 -05:00
Jonas Platte
d6e9d52477
feat(dict): Add "deffer" to typo list 2022-08-16 14:20:40 +02:00
Ayaz Hafiz
6be109774b Correct "opauqe" to "opaque"
I can't find any references to "opauqe" as an actual word, so I believe
this to be safe.
2022-08-15 11:27:45 -05:00
Ed Page
4d9c507595 chore: Release 2022-08-13 12:03:26 -05:00
Yuta Hayashibe
cb3736663e Add other corrections 2022-08-13 23:50:43 +09:00
Yuta Hayashibe
50da882077 Add typos 2022-08-13 14:36:41 +09:00
Ed Page
80f1ed0290 chore: Bump MSRV to 1.60 2022-08-03 09:32:45 -05:00
Ed Page
ea38677643 chore: Update dependencies 2022-08-03 09:29:38 -05:00
Ed Page
a8599f6a19 test: Move codegen to tests 2022-08-03 09:07:04 -05:00
dependabot[bot]
4ffa72dac1
chore(deps): Bump once_cell from 1.12.0 to 1.13.0
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.12.0...v1.13.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-01 07:02:51 +00:00
Ed Page
a6674d5be4 chore: Release 2022-07-22 11:09:08 -05:00
Ed Page
3e9caf0731 fix(dict): Run codegen for #516 2022-07-22 10:14:08 -05:00
Jonas Platte
e7be2d3983
feat(dict): Add 'anonymized' typos 2022-07-22 16:31:59 +02:00
dependabot[bot]
7106dff9b1
chore(deps): Bump clap from 3.1.18 to 3.2.8
Bumps [clap](https://github.com/clap-rs/clap) from 3.1.18 to 3.2.8.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.1.18...v3.2.8)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-01 07:04:47 +00:00
Ed Page
aff7161142 chore: Release 2022-06-15 16:11:53 -05:00
Ed Page
7c953a71ec chore: Upgrade to 2021 edition 2022-06-01 06:53:10 -05:00
Ed Page
b15558f0f3 chore: Set rust-version 2022-06-01 06:51:59 -05:00
Ed Page
927308c726 chore: Release 2022-05-16 09:33:53 -05:00
Ed Page
5ae7bda8eb style: Silence clippy 2022-05-16 09:09:17 -05:00
Ed Page
778fd7a53d chore: Release 2022-05-10 14:24:11 -05:00
Ed Page
fd5398316f fix(parser): Better short base64 detection
Previously, we bailed out if the string is too short (<90) and there
weren't non-alpha-base64 bytes present.  What we ignored were the
padding bytes.

We key off of padding bytes to detect that a string is in fact base64
encoded.  Like the other cases, there can be false positives but those
strings should show up elsewhere or the compiler will fail.

This was called out in #485
2022-05-10 14:02:59 -05:00
Ed Page
bd5048def5 fix(parser): Allow backslashes after ignore items
To allow `\\` to start a token, we couldn't let it end a token.  By
switching the termiantor to a peek, we can now make it end a token
**and** start a token, allowing us to work better with windows paths.

Fixes #481
2022-05-10 14:02:54 -05:00
Ed Page
1720e7d65e fix(parser): Ignore items at end of input 2022-05-10 13:38:03 -05:00
Ed Page
7e15afe81f test(parser): Add reproduction of #481 2022-05-10 12:58:19 -05:00
Ed Page
4869764f7b test(parser): Remove unclear test case
Unsure why this case is here and it causes difficulties
2022-05-10 12:58:13 -05:00
Ed Page
ad89736832 refactor(parser): Clarify precedence levels 2022-05-10 12:58:08 -05:00
Ed Page
9f623c618b chore: Release 2022-04-28 09:39:14 -05:00
Denis Kasak
29508a689b feat(dict): Add typo identitiy -> identity 2022-04-28 16:24:18 +02:00
Ed Page
dcc3c0b11e chore: Release 2022-04-25 11:49:02 -05:00
Jonas Platte
5f5ef1468d feat(dict): Add 'signign' typo to words.csv 2022-04-25 11:26:08 -05:00
Jonas Platte
bbd71ab434 feat(dict): Add 'unencyrpted' typo to words.csv 2022-04-25 11:25:48 -05:00
SeongChan Lee
4e4f136ec6 Fix tokenizer for uppercase UUID
Microsoft toolchains usually emit UUID/GUID in UPPERCASE
2022-04-25 11:12:25 +09:00
Ed Page
7d3e9bb070 chore: Release 2022-04-18 09:39:53 -05:00
Ed Page
e63659c208 fix: Ignore CSS colors
Fixes #462
2022-04-18 09:19:44 -05:00
Ed Page
9c273c6cfb
Merge pull request #451 from crate-ci/dependabot/cargo/nom-7.1.1
chore(deps): Bump nom from 7.1.0 to 7.1.1
2022-04-01 09:34:31 -05:00
dependabot[bot]
0281c7023e
chore(deps): Bump nom from 7.1.0 to 7.1.1
Bumps [nom](https://github.com/Geal/nom) from 7.1.0 to 7.1.1.
- [Release notes](https://github.com/Geal/nom/releases)
- [Changelog](https://github.com/Geal/nom/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Geal/nom/compare/7.1.0...7.1.1)

---
updated-dependencies:
- dependency-name: nom
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-01 07:02:37 +00:00
dependabot[bot]
40080cb01e
chore(deps): Bump once_cell from 1.9.0 to 1.10.0
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-01 07:02:26 +00:00
Ed Page
86c54fffbf style: Update clippy 2022-03-29 15:07:19 -05:00
Ed Page
1d16086495 chore: Release 2022-03-09 08:59:49 -06:00
Ed Page
ab61b33572
Merge pull request #443 from crate-ci/dependabot/cargo/unicode-segmentation-1.9.0
chore(deps): Bump unicode-segmentation from 1.8.0 to 1.9.0
2022-03-01 08:30:25 -06:00
dependabot[bot]
a58b735e5e
chore(deps): Bump unicode-segmentation from 1.8.0 to 1.9.0
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases)
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits)

---
updated-dependencies:
- dependency-name: unicode-segmentation
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-01 07:03:33 +00:00
dependabot[bot]
f3107c4794
chore(deps): Bump clap from 3.0.13 to 3.1.3
Bumps [clap](https://github.com/clap-rs/clap) from 3.0.13 to 3.1.3.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.0.13...v3.1.3)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-01 07:03:22 +00:00
Ed Page
b686760935 chore: Release 2022-02-14 09:05:09 -06:00
Ed Page
c3bb4adfa1 fix(parser): Allow commas in urls
Got us closer to https://www.ietf.org/rfc/rfc3986.txt

Fixes #433
2022-02-14 08:49:55 -06:00
Ed Page
09203fd592 fix(parser): Recognize URLs with passwords 2022-02-14 08:21:56 -06:00
Ed Page
05773fe815 chore: Release 2022-02-08 07:12:19 -06:00
Sebastian Neubauer
fa5a724cec feat(dict): Add more typos 2022-02-08 13:41:44 +01:00
Ed Page
8ddb09eff3 chore: Update dependencies 2022-02-01 10:34:12 -06:00
dependabot[bot]
a3f39efdc8
chore(deps): Bump clap from 3.0.0 to 3.0.13
Bumps [clap](https://github.com/clap-rs/clap) from 3.0.0 to 3.0.13.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v3.0.0...v3.0.13)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-01 07:02:33 +00:00
Ed Page
5b7fe620ec chore: Release 2022-01-26 14:32:31 -06:00
Ed Page
a39074fc7f fix(parser): Detect shorter base64 values
This is part of the way to #413.  In that case, they aren't providing
padding though.
2022-01-26 14:18:01 -06:00
Ed Page
2c5f2ecedd chore: Release 2022-01-26 10:01:15 -06:00
Ed Page
3c78d65462 fix(parser): Don't stop on almost-printfs
When we added support for printf interopolation, we had to adjust our
separator matching to not eat the start of printf interpolation.

When doing so, I overlooked the need to still eat it in the catch-all.
If we don't, we then try to read `%` as part of the identifier and bail
out early.

Fixes #411
2022-01-26 09:39:23 -06:00
Ed Page
4b2e66487c chore: Release 2022-01-24 20:35:08 -06:00
Ed Page
0c49c3ea2b fix(parser): Allow markdown formatting around ordinals
Fixes #409
2022-01-24 20:01:06 -06:00
Ed Page
f7fd7c0e42 chore: Release 2022-01-21 10:39:27 -06:00
Ed Page
5598b5b3e9 fix(dict): Workes should also correct to workers
Fixes #402
2022-01-21 10:10:56 -06:00
Ed Page
71b53cb23e chore: Release 2021-12-18 17:52:11 -06:00
Ed Page
5c83dec07b style: Remove unused variable 2021-12-14 15:41:52 -06:00
Ed Page
469a9aedc2 chore: Release 2021-12-14 12:58:03 -06:00
Frank Steffahn
2748d6a148
fix(dict): Typo in Typos (#3870 2021-12-14 12:54:48 -06:00
Ed Page
f99eb040de chore: Update dependencies 2021-12-01 08:05:54 -06:00
Ed Page
3b3a944c93 fix: Detect descrepancy
Found this in the clap code base.
2021-11-24 15:09:01 -06:00
Ed Page
c0e8a2c932 chore: Release 2021-11-16 07:46:33 -06:00
Ed Page
8e29e94060 chore: Update cargo-release 2021-11-16 07:44:08 -06:00
Ed Page
3ca0aed0a7
Merge pull request #374 from Flakebi/fix-escape
Fix multiple escape sequences
2021-11-15 08:18:41 -06:00
Neubauer, Sebastian
3fc6089660 fix: Fix multiple escape sequences
If escape sequences follow straight after each other, there is no
delimiter in-between.
In such a case, parsing previously stopped and did not find any
typos further in the file.
2021-11-15 11:31:53 +01:00
Neubauer, Sebastian
76ec666970 feat(dict): Add more corrections
I encountered these when going through a codebase with another tool.
2021-11-12 23:02:08 +01:00
Ed Page
4f17586d08 chore: Update MSRV 2021-11-08 11:56:01 -06:00
Ed Page
a8ae8a5c26 chore: Update boiletplate 2021-11-08 10:11:02 -06:00
Ed Page
153f570ec9 chore: Release 2021-11-03 11:48:12 -05:00
Ed Page
fcac819478 fix: Address false positives
Hard to say how to handle `doen't` since we don't handle contractions.
For now, I've gone ahead and added corrections to the part of the
contraction.  Hopefully that doesn't confuse people

Part of #362
2021-10-23 08:21:53 -05:00
Ed Page
efae838e5c perf: Remove some function overhead
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
3cd24f5cca chore: Release 2021-09-14 10:03:34 -05:00
Ed Page
e20879dae1 fix: Reduce false positives from ordinals
Just ignoring them since our focus is on programmer typos and these
can't be identifiers.  This is simpler and is less work at runtime.

Fixes #331
2021-09-14 08:53:31 -05:00
Ed Page
92e46848a3 chore: Update dependencies 2021-09-01 06:38:52 -05:00
Ed Page
dbea7ab1e0 chore: Release 2021-08-30 09:16:40 -05:00
Ville Skyttä
4fcd7ba16f feat(dict): Suggest surrounded for surrouned too 2021-08-29 21:22:24 +03:00
Nick Mathewson
739d1a2f7c Ignore hexadecimal "hashes" of length 32 or greater.
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text.  By treating such strings as ignored, we can resist a larger
category of false positives.

Closes #326.
2021-08-20 12:34:59 -04:00
Ed Page
613a0cba4b chore: Iterate on release process 2021-08-16 11:23:25 -05:00
mendess
5747aba05d Add instantialed as a typo for instantiated 2021-08-06 14:33:50 +01:00
Ed Page
2dce866937 chore: Release 2021-08-02 09:55:25 -05:00
Ed Page
a5f0dd8ee9 fix(token): Continue parsing on c-escape 2021-08-02 09:29:10 -05:00
Ed Page
3e5d2e0620
Merge pull request #324 from epage/escape
fix(token): Continue parsing on c-escape
2021-08-02 09:23:42 -05:00
Ed Page
fdeba0e71b fix(token): Continue parsing on c-escape 2021-08-02 09:11:54 -05:00
dependabot[bot]
febcee3332
chore(deps): Bump env_logger from 0.8.4 to 0.9.0
Bumps [env_logger](https://github.com/env-logger-rs/env_logger) from 0.8.4 to 0.9.0.
- [Release notes](https://github.com/env-logger-rs/env_logger/releases)
- [Changelog](https://github.com/env-logger-rs/env_logger/blob/main/CHANGELOG.md)
- [Commits](https://github.com/env-logger-rs/env_logger/compare/v0.8.4...v0.9.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-01 07:05:08 +00:00
Ed Page
2304fc6735 chore: Release 2021-07-30 12:12:07 -05:00
Ed Page
9a8d41fcb2 chore: Release 2021-07-30 12:09:59 -05:00
Ed Page
2202b7f661 fix(parser): Handle c-escape/printf
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.

With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).

Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
3049852bfd fix(dict): Avoid contraction false positive
Fixes #317
2021-07-30 10:42:57 -05:00
Ed Page
f60e798a2a chore: Release 2021-07-27 15:31:01 -05:00
Ed Page
3486c23bdb chore: Release 2021-07-27 15:29:18 -05:00
Ed Page
49459cede7 feat(dict): Add more corrections 2021-07-27 14:53:13 -05:00
Ed Page
6037eebfdc style: Clippy 2021-07-27 14:28:16 -05:00
Ed Page
70fbd63b00 fix: Update dictionary 2021-07-27 14:21:00 -05:00
Ed Page
960471ae23 fix: Prevent old typos from coming back 2021-07-27 14:16:13 -05:00
Ed Page
4e99217896 test: Ensure words are stored lowercase 2021-07-27 14:16:12 -05:00
Ed Page
0008713395 test: Ensure words.csv stays sorted 2021-07-27 14:16:12 -05:00
Ed Page
41048d15b3 test: Prevent correcting corrections 2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1 fix: Correcting to typos 2021-07-27 13:58:57 -05:00
Ed Page
5b29113ec8 refactor(typos): Remove unused calculations
In #293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
7a2a5042a1 refactor(dict): Remove useless entries 2021-07-02 10:24:59 -05:00
Ed Page
4c2f2c434a feat(dict): Shared PHF support 2021-07-01 11:14:30 -05:00
Ed Page
3b43272724 refactor(dict): Separate dictgen concerns 2021-07-01 11:00:33 -05:00
Ed Page
c8d1058a71 refactor(dict): Change typos-dict to trie
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
bbbf985777 perf(dict): Switch varcon to a burst-trie
This cuts varcon lookup times in half but I still suspect slower than
phf.  Like with bsearch and unlike, the cost is consistent between hits
and misses.

At least this doesn't have the compile hit of PHF + unicase.  Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb refactor(dict): Be more cache concious 2021-06-30 19:56:03 -05:00
Ed Page
f176055834 refactor(dict): Make room for trie logic 2021-06-30 19:56:03 -05:00
Ed Page
a1e95bc7c0 refactor(dict): Pull out table-lookup logic
Before, only some dicts did we guarentee were pre-sorted.  Now, all are
for-sure pre-sorted.

This also gives each dict the size-check to avoid lookup.

But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82 chore: Skip more releases 2021-06-29 15:39:28 -05:00
Ed Page
9149c4765d chore: Release 2021-06-29 15:05:18 -05:00
Ed Page
c83f655109 feat(parser): Ignore URLs
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146 fix(parser): Ensure we get full base64
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b feat(parser): Ignore emails
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).

This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6 feat(parser): Ignore base64
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).

Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796 feat(parser): Ignore SHA-1+
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b fix(parser): Go ahead and do lower UUIDs
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1 feat(parser): Ignore UUIDs
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase.  RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case.  The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682 refactor(typos)!: Bake ignores into parser
This is prep for other items to be ignored

BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens.  Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
ded90f2387 perf(parser): Auto-detect unicode
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
95417f3a41 refactor(parser): Consolidate utf8/ascii logic 2021-06-29 05:10:02 -05:00
Ed Page
83b2804623 fix(ci): Don't fail codegen checks 2021-06-28 14:06:47 -05:00
Ed Page
4066d21790 style: Address clippy 2021-06-28 13:51:06 -05:00
Ed Page
3a4d039c4f chore: Reduce code-gen memory usage
More `const fn` removals to reduce compilation memory use
2021-06-07 08:58:34 -05:00
Ed Page
04f5d40e57 chore: Release 2021-06-05 14:39:37 -05:00
Ed Page
2b1f565eaa refactor(varcon): Remove reliance on const-fn
This dropped RSS (memory usage) from 4GB to 1.5GB when compiling.

The extra `match` could impact performance but not too concerned since
the default is to not look within vars.
2021-06-04 15:01:08 -05:00
Ed Page
b1cf03c7eb refactor(varcon): Move away from PHF
This is mostly to give implementation flexibility for changing out how
we store the data to reduce compilation memory usage.

This does have performance impact, jumping from ~220ns to ~320ns for a
dict lookup, according to our micro benchmarks.
2021-06-04 14:59:46 -05:00
Ed Page
1cb9b37120 chore: Update codespell dict
Based on 2ed354c at https://github.com/codespell-project/codespell
2021-05-22 21:44:56 -05:00
Ed Page
3e66a99674 chore: Release 2021-05-21 20:41:02 -05:00
Ed Page
3995745362 chore: Release 2021-05-21 20:39:12 -05:00
Ed Page
b99f32dea8 perf(dict): Bypass vars when possible
Variant support slows us down by 10-50$.  I assume most people will run
with `en` and so most of this overhead is to waste.  So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to.  If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.

Before:
```
check_file/Typos/code   time:   [35.860 us 36.021 us 36.187 us]
                        thrpt:  [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time:   [26.966 ms 27.215 ms 27.521 ms]
                        thrpt:  [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code   time:   [33.837 us 33.928 us 34.031 us]
                        thrpt:  [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time:   [17.521 ms 17.620 ms 17.730 ms]
                        thrpt:  [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```

This puts us inline with `--no-default-features --features dict`

Fixes #253
2021-05-19 13:55:41 -05:00
Ed Page
639e65b88a fix(dict): Handle cases from Linux
These were found while running `typos` on Linux and inspecting a
sampling of the results.  #249 represents additional changes to make.
There were some identifiers, that looked like hardware registers, that
I'm unsure of what can be done for them.
2021-05-18 12:02:03 -05:00
Ed Page
fb0dac4297 refactor(dict): Allow 0..n corrections in BuiltIn
The main use case is taking `ther` -> `there` and adding `the` and
`their`.
2021-05-18 12:02:03 -05:00
Ed Page
77cfccb392 refactor(varcon): Clarify check's meanings 2021-05-15 19:29:27 -05:00
Ed Page
b830872ad0 chore: Update enumflags2 2021-05-13 10:20:15 -05:00
Ed Page
7c803681c4 chore: Release 2021-05-13 09:58:09 -05:00
Ed Page
3b9061dece
Merge pull request #240 from crate-ci/dependabot/cargo/codegenrs-1.0.0
chore(deps): Bump codegenrs from 0.1.5 to 1.0.0
2021-05-01 09:04:51 -05:00
dependabot[bot]
d72fa7acba
chore(deps): Bump codegenrs from 0.1.5 to 1.0.0
Bumps [codegenrs](https://github.com/crate-ci/codegenrs) from 0.1.5 to 1.0.0.
- [Release notes](https://github.com/crate-ci/codegenrs/releases)
- [Changelog](https://github.com/crate-ci/codegenrs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crate-ci/codegenrs/compare/v0.1.5...v1.0.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-01 07:01:59 +00:00
Ed Page
6216fa0837 fix(dict)!: Clarify word sizes with Ranges
The code was generated with separate min / max, rather than using a
Range and ensuring the API is used correctly.
2021-04-30 21:33:33 -05:00
Ed Page
f40ed5a328 style: Address clippy 2021-04-30 11:37:16 -05:00
Ed Page
517da7ecd2 perf(parser): Allow people to bypass unicode cost 2021-04-29 21:07:59 -05:00
Ed Page
09d2124d0f perf(parser): Limit inner-loop assers 2021-04-29 18:31:05 -05:00
Ed Page
287c4cbfe9 refactor(parser): Give more impl flexibility 2021-04-29 18:31:05 -05:00
Ed Page
9cbc7410a4 fix(parser)!: Defer to Unicode XID for identifiers
This saves us from having to have configuration for every detail.  If
people need more control, we can offer it later.

Fixes #225
2021-04-29 18:30:57 -05:00
Ed Page
f15cc58f71 fix(parser): Flip leading digits to work correctly 2021-04-29 18:30:14 -05:00
Ed Page
4b94352b7a perf(parser): Try hand-rolled number parsing 2021-04-29 18:30:14 -05:00
Ed Page
6b92e345cc perf(parser): Speed up UTF-8 validation 2021-04-27 21:17:46 -05:00
Ed Page
819702c82f refactor(parser): Unify str/bytes code paths
The main goal is to support replacing the parser with `nom` where I need
access to `str` only functionality.

With crates like simdutf8, this might also offer up performance gains
since they see the biggest benefit when doing large blocks of
validation.
2021-04-27 21:17:43 -05:00
Ed Page
fce11d6c35 refactor(parser)!: Allow short-circuiting word splitting
This is prep for experiments with getting this information ahead of
time.

See #224
2021-04-27 21:17:38 -05:00
Ed Page
9bfb506c6d fix(typos)!: Clarify Case::Uppers name
`Scream` was referrin to `SCREAMING_CASE` but outside of that context, I
think `Upper` is more accurate.
2021-04-21 20:36:35 -05:00
Ed Page
1f4c587692 chore({{crate_name}}): Release {{version}} 2021-04-14 19:13:25 -05:00
Ed Page
b4459bef33 chore: Fix readme paths in Cargo.toml 2021-04-13 21:36:47 -05:00
Ed Page
d7978658d4 test(cli): Ensure we apply corrections 2021-04-10 19:13:48 -05:00
Ed Page
b5f606f201 refactor(typos): Simplify the top-level API 2021-03-01 11:50:23 -06:00
Ed Page
1010d2ffe5 refactor(tokenizer): Remove stale function 2021-03-01 11:50:23 -06:00
dependabot-preview[bot]
b8d3190ce9
chore(deps): bump itertools from 0.9.0 to 0.10.0
Bumps [itertools](https://github.com/bluss/rust-itertools) from 0.9.0 to 0.10.0.
- [Release notes](https://github.com/bluss/rust-itertools/releases)
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bluss/rust-itertools/compare/v0.9.0...v0.10.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2021-01-03 03:40:45 +00:00
Ed Page
67222e9338 style: Address clippy 2021-01-02 13:49:28 -06:00
Ed Page
692f0ac095 refactor(typos): Focus API on primary use case 2021-01-02 13:10:40 -06:00
Ed Page
aba85df435 docs(typos): Clarify intent 2021-01-02 13:10:40 -06:00
Ed Page
48112a47e9 refactor(parser): Abstract over lifetimes 2021-01-02 13:10:30 -06:00
Ed Page
bc90bacff2 refactor(typos): Pull out file logic 2021-01-02 13:10:30 -06:00
Ed Page
e741f96de3 refactor(typos): Decouple parsing from checks 2021-01-02 13:10:22 -06:00
Ed Page
1e64080c05 refactor(typos): Open up the name Parser 2021-01-02 12:58:33 -06:00
Ed Page
7fdd0dee16 style(typos): Make parser ordering clearer 2021-01-02 12:58:33 -06:00
Ed Page
d5a781fd25
Merge pull request #179 from crate-ci/dependabot/cargo/unicode-segmentation-1.7.1
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
2020-12-01 08:24:59 -06:00
dependabot-preview[bot]
5640d23b95
chore(deps): bump csv from 1.1.4 to 1.1.5
Bumps [csv](https://github.com/BurntSushi/rust-csv) from 1.1.4 to 1.1.5.
- [Release notes](https://github.com/BurntSushi/rust-csv/releases)
- [Commits](https://github.com/BurntSushi/rust-csv/compare/1.1.4...1.1.5)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:19:21 +00:00
dependabot-preview[bot]
7fa5a9eadf
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases)
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:18:35 +00:00
Ed Page
d96de581f3 fix(report): Rendering issues with errors
- We aren't consistent in quoting words
- We used byte offsets rather than column counts
- We mixed styles between disallowed and corrections

Fixes #165
2020-11-24 18:52:24 -06:00
Ed Page
9b0cd5b5f0 fix(report): Show path for errors 2020-11-23 11:20:12 -06:00
Ed Page
869b916ca6 fix: Handle broken pipe 2020-11-21 21:57:12 -06:00
Ed Page
7a1fac7fab refactor(report): Use native types 2020-11-11 18:44:27 -06:00
Ed Page
6bdbd821e3 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change
```
real    0m24.432s
user    0m32.492s
sys     0m4.190s
```
2020-11-10 20:57:04 -06:00
Ed Page
beaa0f4091 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change:
```
real    0m24.060s
user    0m31.559s
sys     0m4.258s
```
2020-11-10 20:57:00 -06:00
Ed Page
e4edbc5f7e chore: Update dependencies 2020-11-10 19:47:13 -06:00
Ed Page
b7700fa214 refactor: Don't special case --files 2020-11-10 06:30:27 -06:00
Ed Page
628c011f77 fix(report): Ensure json output is clean 2020-11-10 06:30:27 -06:00
Ed Page
e12cd8ed55 refactor: Layer files/filenames on buffer processing 2020-11-10 06:30:27 -06:00
Ed Page
eb20ba9f11 refactor(report): Make Parse consistent with Typos 2020-11-10 06:30:27 -06:00
Ed Page
97f90da9bc refactor: Move off of lazy_static 2020-11-10 06:30:27 -06:00
Ed Page
3bcd8a130e refactor(report): Merge the typos types 2020-11-10 06:30:23 -06:00
Ed Page
fe282a0aea refactor: Pull out common policy 2020-11-07 20:04:58 -06:00
Ed Page
736db10708 fix(format): Clarify message types 2020-10-28 21:01:33 -05:00
Ed Page
527b9837b4 feat: Custom dictionary support
Switching `valid-*` to just `*` where you map typo to correction, with
support for always-valid and never-valid.

Fixes #9
2020-10-27 21:15:25 -05:00
dependabot-preview[bot]
84e56b22b5
chore(deps): bump derive_more from 0.99.9 to 0.99.11
Bumps [derive_more](https://github.com/JelteF/derive_more) from 0.99.9 to 0.99.11.
- [Release notes](https://github.com/JelteF/derive_more/releases)
- [Changelog](https://github.com/JelteF/derive_more/blob/master/CHANGELOG.md)
- [Commits](https://github.com/JelteF/derive_more/compare/v0.99.9...v0.99.11)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-10-01 08:11:46 +00:00
Ed Page
a63dfa0f8c perf: Faster binary-file detection
This switches us from a homegrown implementation to `context_inspector`
- Adds some optimizations by looking for the BoM.
- We used the same algorithm for finding Null bytes
- `context_inspector` caps how much of the buffer is searche though

Besides performance, `content_inspector` also has some known-binary
magic numbers to avoid bad detections.

Fixes #34
2020-08-21 16:29:11 -05:00
Ed Page
ab4a5bbdaf feat: Support english dialects
The goal is to be as accepting and unobtrusive to new code bases as
possible.  To this end, we correct typos into the closest english
dialect.

If someone wants to opt-in, they can have typos correct to a specific
english dialect.

Fixes #52
Fixes #22
2020-08-20 19:37:37 -05:00
Ed Page
294c25c67a fix(dict): Missing a correction 2020-08-15 21:03:00 -05:00
Ed Page
5d7e91d214 fix(ci): Report more failures 2020-07-04 20:52:48 -05:00
Ed Page
bc1302f01b feat: Support multiple, valid corrections
Some of the other spell checkers already do this. While I've not checked
where we might need it for our dictionary, this will be important for
dialects.
2020-07-04 20:52:48 -05:00
Ed Page
a5ed18ee46 fix(replace): Don't error on successful replacement 2020-07-04 20:52:47 -05:00
dependabot-preview[bot]
354fec1aa1 chore(deps): bump nom from 5.1.1 to 5.1.2
Bumps [nom](https://github.com/Geal/nom) from 5.1.1 to 5.1.2.
- [Release notes](https://github.com/Geal/nom/releases)
- [Changelog](https://github.com/Geal/nom/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Geal/nom/compare/5.1.1...5.1.2)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-04 20:52:47 -05:00
dependabot-preview[bot]
146998f331 chore(deps): bump derive_more from 0.99.7 to 0.99.9
Bumps [derive_more](https://github.com/JelteF/derive_more) from 0.99.7 to 0.99.9.
- [Release notes](https://github.com/JelteF/derive_more/releases)
- [Changelog](https://github.com/JelteF/derive_more/blob/master/CHANGELOG.md)
- [Commits](https://github.com/JelteF/derive_more/compare/v0.99.7...v0.99.9)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-04 20:52:47 -05:00