Neubauer, Sebastian
76ec666970
feat(dict): Add more corrections
...
I encountered these when going through a codebase with another tool.
2021-11-12 23:02:08 +01:00
Ed Page
4f17586d08
chore: Update MSRV
2021-11-08 11:56:01 -06:00
Ed Page
a8ae8a5c26
chore: Update boiletplate
2021-11-08 10:11:02 -06:00
Ed Page
153f570ec9
chore: Release
2021-11-03 11:48:12 -05:00
Ed Page
fcac819478
fix: Address false positives
...
Hard to say how to handle `doen't` since we don't handle contractions.
For now, I've gone ahead and added corrections to the part of the
contraction. Hopefully that doesn't confuse people
Part of #362
2021-10-23 08:21:53 -05:00
Ed Page
efae838e5c
perf: Remove some function overhead
...
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
3cd24f5cca
chore: Release
2021-09-14 10:03:34 -05:00
Ed Page
e20879dae1
fix: Reduce false positives from ordinals
...
Just ignoring them since our focus is on programmer typos and these
can't be identifiers. This is simpler and is less work at runtime.
Fixes #331
2021-09-14 08:53:31 -05:00
Ed Page
92e46848a3
chore: Update dependencies
2021-09-01 06:38:52 -05:00
Ed Page
dbea7ab1e0
chore: Release
2021-08-30 09:16:40 -05:00
Ville Skyttä
4fcd7ba16f
feat(dict): Suggest surrounded
for surrouned
too
2021-08-29 21:22:24 +03:00
Nick Mathewson
739d1a2f7c
Ignore hexadecimal "hashes" of length 32 or greater.
...
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text. By treating such strings as ignored, we can resist a larger
category of false positives.
Closes #326 .
2021-08-20 12:34:59 -04:00
Ed Page
613a0cba4b
chore: Iterate on release process
2021-08-16 11:23:25 -05:00
mendess
5747aba05d
Add instantialed as a typo for instantiated
2021-08-06 14:33:50 +01:00
Ed Page
2dce866937
chore: Release
2021-08-02 09:55:25 -05:00
Ed Page
a5f0dd8ee9
fix(token): Continue parsing on c-escape
2021-08-02 09:29:10 -05:00
Ed Page
3e5d2e0620
Merge pull request #324 from epage/escape
...
fix(token): Continue parsing on c-escape
2021-08-02 09:23:42 -05:00
Ed Page
fdeba0e71b
fix(token): Continue parsing on c-escape
2021-08-02 09:11:54 -05:00
dependabot[bot]
febcee3332
chore(deps): Bump env_logger from 0.8.4 to 0.9.0
...
Bumps [env_logger](https://github.com/env-logger-rs/env_logger ) from 0.8.4 to 0.9.0.
- [Release notes](https://github.com/env-logger-rs/env_logger/releases )
- [Changelog](https://github.com/env-logger-rs/env_logger/blob/main/CHANGELOG.md )
- [Commits](https://github.com/env-logger-rs/env_logger/compare/v0.8.4...v0.9.0 )
---
updated-dependencies:
- dependency-name: env_logger
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-08-01 07:05:08 +00:00
Ed Page
2304fc6735
chore: Release
2021-07-30 12:12:07 -05:00
Ed Page
9a8d41fcb2
chore: Release
2021-07-30 12:09:59 -05:00
Ed Page
2202b7f661
fix(parser): Handle c-escape/printf
...
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.
With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).
Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
3049852bfd
fix(dict): Avoid contraction false positive
...
Fixes #317
2021-07-30 10:42:57 -05:00
Ed Page
f60e798a2a
chore: Release
2021-07-27 15:31:01 -05:00
Ed Page
3486c23bdb
chore: Release
2021-07-27 15:29:18 -05:00
Ed Page
49459cede7
feat(dict): Add more corrections
2021-07-27 14:53:13 -05:00
Ed Page
6037eebfdc
style: Clippy
2021-07-27 14:28:16 -05:00
Ed Page
70fbd63b00
fix: Update dictionary
2021-07-27 14:21:00 -05:00
Ed Page
960471ae23
fix: Prevent old typos from coming back
2021-07-27 14:16:13 -05:00
Ed Page
4e99217896
test: Ensure words are stored lowercase
2021-07-27 14:16:12 -05:00
Ed Page
0008713395
test: Ensure words.csv stays sorted
2021-07-27 14:16:12 -05:00
Ed Page
41048d15b3
test: Prevent correcting corrections
2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1
fix: Correcting to typos
2021-07-27 13:58:57 -05:00
Ed Page
5b29113ec8
refactor(typos): Remove unused calculations
...
In #293 , we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
7a2a5042a1
refactor(dict): Remove useless entries
2021-07-02 10:24:59 -05:00
Ed Page
4c2f2c434a
feat(dict): Shared PHF support
2021-07-01 11:14:30 -05:00
Ed Page
3b43272724
refactor(dict): Separate dictgen concerns
2021-07-01 11:00:33 -05:00
Ed Page
c8d1058a71
refactor(dict): Change typos-dict to trie
...
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
bbbf985777
perf(dict): Switch varcon to a burst-trie
...
This cuts varcon lookup times in half but I still suspect slower than
phf. Like with bsearch and unlike, the cost is consistent between hits
and misses.
At least this doesn't have the compile hit of PHF + unicase. Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb
refactor(dict): Be more cache concious
2021-06-30 19:56:03 -05:00
Ed Page
f176055834
refactor(dict): Make room for trie logic
2021-06-30 19:56:03 -05:00
Ed Page
a1e95bc7c0
refactor(dict): Pull out table-lookup logic
...
Before, only some dicts did we guarentee were pre-sorted. Now, all are
for-sure pre-sorted.
This also gives each dict the size-check to avoid lookup.
But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82
chore: Skip more releases
2021-06-29 15:39:28 -05:00
Ed Page
9149c4765d
chore: Release
2021-06-29 15:05:18 -05:00
Ed Page
c83f655109
feat(parser): Ignore URLs
...
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146
fix(parser): Ensure we get full base64
...
We greedily matched separators, including ones that might be part of
base64. This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b
feat(parser): Ignore emails
...
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).
This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6
feat(parser): Ignore base64
...
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).
Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796
feat(parser): Ignore SHA-1+
...
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b
fix(parser): Go ahead and do lower UUIDs
...
I need this for hash support anyways
2021-06-29 12:13:21 -05:00