Commit graph

130 commits

Author SHA1 Message Date
Neubauer, Sebastian
3fc6089660 fix: Fix multiple escape sequences
If escape sequences follow straight after each other, there is no
delimiter in-between.
In such a case, parsing previously stopped and did not find any
typos further in the file.
2021-11-15 11:31:53 +01:00
Ed Page
4f17586d08 chore: Update MSRV 2021-11-08 11:56:01 -06:00
Ed Page
a8ae8a5c26 chore: Update boiletplate 2021-11-08 10:11:02 -06:00
Ed Page
153f570ec9 chore: Release 2021-11-03 11:48:12 -05:00
Ed Page
fcac819478 fix: Address false positives
Hard to say how to handle `doen't` since we don't handle contractions.
For now, I've gone ahead and added corrections to the part of the
contraction.  Hopefully that doesn't confuse people

Part of #362
2021-10-23 08:21:53 -05:00
Ed Page
efae838e5c perf: Remove some function overhead
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
3cd24f5cca chore: Release 2021-09-14 10:03:34 -05:00
Ed Page
e20879dae1 fix: Reduce false positives from ordinals
Just ignoring them since our focus is on programmer typos and these
can't be identifiers.  This is simpler and is less work at runtime.

Fixes #331
2021-09-14 08:53:31 -05:00
Ed Page
92e46848a3 chore: Update dependencies 2021-09-01 06:38:52 -05:00
Ed Page
dbea7ab1e0 chore: Release 2021-08-30 09:16:40 -05:00
Ville Skyttä
4fcd7ba16f feat(dict): Suggest surrounded for surrouned too 2021-08-29 21:22:24 +03:00
Nick Mathewson
739d1a2f7c Ignore hexadecimal "hashes" of length 32 or greater.
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text.  By treating such strings as ignored, we can resist a larger
category of false positives.

Closes #326.
2021-08-20 12:34:59 -04:00
Ed Page
613a0cba4b chore: Iterate on release process 2021-08-16 11:23:25 -05:00
mendess
5747aba05d Add instantialed as a typo for instantiated 2021-08-06 14:33:50 +01:00
Ed Page
2dce866937 chore: Release 2021-08-02 09:55:25 -05:00
Ed Page
a5f0dd8ee9 fix(token): Continue parsing on c-escape 2021-08-02 09:29:10 -05:00
Ed Page
3e5d2e0620
Merge pull request #324 from epage/escape
fix(token): Continue parsing on c-escape
2021-08-02 09:23:42 -05:00
Ed Page
fdeba0e71b fix(token): Continue parsing on c-escape 2021-08-02 09:11:54 -05:00
dependabot[bot]
febcee3332
chore(deps): Bump env_logger from 0.8.4 to 0.9.0
Bumps [env_logger](https://github.com/env-logger-rs/env_logger) from 0.8.4 to 0.9.0.
- [Release notes](https://github.com/env-logger-rs/env_logger/releases)
- [Changelog](https://github.com/env-logger-rs/env_logger/blob/main/CHANGELOG.md)
- [Commits](https://github.com/env-logger-rs/env_logger/compare/v0.8.4...v0.9.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-01 07:05:08 +00:00
Ed Page
2304fc6735 chore: Release 2021-07-30 12:12:07 -05:00
Ed Page
9a8d41fcb2 chore: Release 2021-07-30 12:09:59 -05:00
Ed Page
2202b7f661 fix(parser): Handle c-escape/printf
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.

With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).

Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
3049852bfd fix(dict): Avoid contraction false positive
Fixes #317
2021-07-30 10:42:57 -05:00
Ed Page
f60e798a2a chore: Release 2021-07-27 15:31:01 -05:00
Ed Page
3486c23bdb chore: Release 2021-07-27 15:29:18 -05:00
Ed Page
49459cede7 feat(dict): Add more corrections 2021-07-27 14:53:13 -05:00
Ed Page
6037eebfdc style: Clippy 2021-07-27 14:28:16 -05:00
Ed Page
70fbd63b00 fix: Update dictionary 2021-07-27 14:21:00 -05:00
Ed Page
960471ae23 fix: Prevent old typos from coming back 2021-07-27 14:16:13 -05:00
Ed Page
4e99217896 test: Ensure words are stored lowercase 2021-07-27 14:16:12 -05:00
Ed Page
0008713395 test: Ensure words.csv stays sorted 2021-07-27 14:16:12 -05:00
Ed Page
41048d15b3 test: Prevent correcting corrections 2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1 fix: Correcting to typos 2021-07-27 13:58:57 -05:00
Ed Page
5b29113ec8 refactor(typos): Remove unused calculations
In #293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
7a2a5042a1 refactor(dict): Remove useless entries 2021-07-02 10:24:59 -05:00
Ed Page
4c2f2c434a feat(dict): Shared PHF support 2021-07-01 11:14:30 -05:00
Ed Page
3b43272724 refactor(dict): Separate dictgen concerns 2021-07-01 11:00:33 -05:00
Ed Page
c8d1058a71 refactor(dict): Change typos-dict to trie
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
bbbf985777 perf(dict): Switch varcon to a burst-trie
This cuts varcon lookup times in half but I still suspect slower than
phf.  Like with bsearch and unlike, the cost is consistent between hits
and misses.

At least this doesn't have the compile hit of PHF + unicase.  Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb refactor(dict): Be more cache concious 2021-06-30 19:56:03 -05:00
Ed Page
f176055834 refactor(dict): Make room for trie logic 2021-06-30 19:56:03 -05:00
Ed Page
a1e95bc7c0 refactor(dict): Pull out table-lookup logic
Before, only some dicts did we guarentee were pre-sorted.  Now, all are
for-sure pre-sorted.

This also gives each dict the size-check to avoid lookup.

But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82 chore: Skip more releases 2021-06-29 15:39:28 -05:00
Ed Page
9149c4765d chore: Release 2021-06-29 15:05:18 -05:00
Ed Page
c83f655109 feat(parser): Ignore URLs
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146 fix(parser): Ensure we get full base64
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b feat(parser): Ignore emails
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).

This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6 feat(parser): Ignore base64
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).

Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796 feat(parser): Ignore SHA-1+
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b fix(parser): Go ahead and do lower UUIDs
I need this for hash support anyways
2021-06-29 12:13:21 -05:00