Commit graph

188 commits

Author SHA1 Message Date
Ed Page
c3bb4adfa1 fix(parser): Allow commas in urls
Got us closer to https://www.ietf.org/rfc/rfc3986.txt

Fixes #433
2022-02-14 08:49:55 -06:00
Ed Page
09203fd592 fix(parser): Recognize URLs with passwords 2022-02-14 08:21:56 -06:00
Ed Page
5b7fe620ec chore: Release 2022-01-26 14:32:31 -06:00
Ed Page
a39074fc7f fix(parser): Detect shorter base64 values
This is part of the way to #413.  In that case, they aren't providing
padding though.
2022-01-26 14:18:01 -06:00
Ed Page
2c5f2ecedd chore: Release 2022-01-26 10:01:15 -06:00
Ed Page
3c78d65462 fix(parser): Don't stop on almost-printfs
When we added support for printf interopolation, we had to adjust our
separator matching to not eat the start of printf interpolation.

When doing so, I overlooked the need to still eat it in the catch-all.
If we don't, we then try to read `%` as part of the identifier and bail
out early.

Fixes #411
2022-01-26 09:39:23 -06:00
Ed Page
4b2e66487c chore: Release 2022-01-24 20:35:08 -06:00
Ed Page
0c49c3ea2b fix(parser): Allow markdown formatting around ordinals
Fixes #409
2022-01-24 20:01:06 -06:00
Ed Page
71b53cb23e chore: Release 2021-12-18 17:52:11 -06:00
Ed Page
5c83dec07b style: Remove unused variable 2021-12-14 15:41:52 -06:00
Ed Page
469a9aedc2 chore: Release 2021-12-14 12:58:03 -06:00
Ed Page
c0e8a2c932 chore: Release 2021-11-16 07:46:33 -06:00
Neubauer, Sebastian
3fc6089660 fix: Fix multiple escape sequences
If escape sequences follow straight after each other, there is no
delimiter in-between.
In such a case, parsing previously stopped and did not find any
typos further in the file.
2021-11-15 11:31:53 +01:00
Ed Page
4f17586d08 chore: Update MSRV 2021-11-08 11:56:01 -06:00
Ed Page
a8ae8a5c26 chore: Update boiletplate 2021-11-08 10:11:02 -06:00
Ed Page
153f570ec9 chore: Release 2021-11-03 11:48:12 -05:00
Ed Page
efae838e5c perf: Remove some function overhead
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
3cd24f5cca chore: Release 2021-09-14 10:03:34 -05:00
Ed Page
e20879dae1 fix: Reduce false positives from ordinals
Just ignoring them since our focus is on programmer typos and these
can't be identifiers.  This is simpler and is less work at runtime.

Fixes #331
2021-09-14 08:53:31 -05:00
Ed Page
92e46848a3 chore: Update dependencies 2021-09-01 06:38:52 -05:00
Ed Page
dbea7ab1e0 chore: Release 2021-08-30 09:16:40 -05:00
Nick Mathewson
739d1a2f7c Ignore hexadecimal "hashes" of length 32 or greater.
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text.  By treating such strings as ignored, we can resist a larger
category of false positives.

Closes #326.
2021-08-20 12:34:59 -04:00
Ed Page
613a0cba4b chore: Iterate on release process 2021-08-16 11:23:25 -05:00
Ed Page
2dce866937 chore: Release 2021-08-02 09:55:25 -05:00
Ed Page
a5f0dd8ee9 fix(token): Continue parsing on c-escape 2021-08-02 09:29:10 -05:00
Ed Page
fdeba0e71b fix(token): Continue parsing on c-escape 2021-08-02 09:11:54 -05:00
Ed Page
2304fc6735 chore: Release 2021-07-30 12:12:07 -05:00
Ed Page
2202b7f661 fix(parser): Handle c-escape/printf
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.

With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).

Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
5b29113ec8 refactor(typos): Remove unused calculations
In #293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
9149c4765d chore: Release 2021-06-29 15:05:18 -05:00
Ed Page
c83f655109 feat(parser): Ignore URLs
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146 fix(parser): Ensure we get full base64
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b feat(parser): Ignore emails
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).

This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6 feat(parser): Ignore base64
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).

Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796 feat(parser): Ignore SHA-1+
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b fix(parser): Go ahead and do lower UUIDs
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1 feat(parser): Ignore UUIDs
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase.  RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case.  The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682 refactor(typos)!: Bake ignores into parser
This is prep for other items to be ignored

BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens.  Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
ded90f2387 perf(parser): Auto-detect unicode
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
95417f3a41 refactor(parser): Consolidate utf8/ascii logic 2021-06-29 05:10:02 -05:00
Ed Page
3e66a99674 chore: Release 2021-05-21 20:41:02 -05:00
Ed Page
7c803681c4 chore: Release 2021-05-13 09:58:09 -05:00
Ed Page
f40ed5a328 style: Address clippy 2021-04-30 11:37:16 -05:00
Ed Page
517da7ecd2 perf(parser): Allow people to bypass unicode cost 2021-04-29 21:07:59 -05:00
Ed Page
09d2124d0f perf(parser): Limit inner-loop assers 2021-04-29 18:31:05 -05:00
Ed Page
287c4cbfe9 refactor(parser): Give more impl flexibility 2021-04-29 18:31:05 -05:00
Ed Page
9cbc7410a4 fix(parser)!: Defer to Unicode XID for identifiers
This saves us from having to have configuration for every detail.  If
people need more control, we can offer it later.

Fixes #225
2021-04-29 18:30:57 -05:00
Ed Page
f15cc58f71 fix(parser): Flip leading digits to work correctly 2021-04-29 18:30:14 -05:00
Ed Page
4b94352b7a perf(parser): Try hand-rolled number parsing 2021-04-29 18:30:14 -05:00
Ed Page
6b92e345cc perf(parser): Speed up UTF-8 validation 2021-04-27 21:17:46 -05:00
Ed Page
819702c82f refactor(parser): Unify str/bytes code paths
The main goal is to support replacing the parser with `nom` where I need
access to `str` only functionality.

With crates like simdutf8, this might also offer up performance gains
since they see the biggest benefit when doing large blocks of
validation.
2021-04-27 21:17:43 -05:00
Ed Page
fce11d6c35 refactor(parser)!: Allow short-circuiting word splitting
This is prep for experiments with getting this information ahead of
time.

See #224
2021-04-27 21:17:38 -05:00
Ed Page
9bfb506c6d fix(typos)!: Clarify Case::Uppers name
`Scream` was referrin to `SCREAMING_CASE` but outside of that context, I
think `Upper` is more accurate.
2021-04-21 20:36:35 -05:00
Ed Page
1f4c587692 chore({{crate_name}}): Release {{version}} 2021-04-14 19:13:25 -05:00
Ed Page
b4459bef33 chore: Fix readme paths in Cargo.toml 2021-04-13 21:36:47 -05:00
Ed Page
d7978658d4 test(cli): Ensure we apply corrections 2021-04-10 19:13:48 -05:00
Ed Page
b5f606f201 refactor(typos): Simplify the top-level API 2021-03-01 11:50:23 -06:00
Ed Page
1010d2ffe5 refactor(tokenizer): Remove stale function 2021-03-01 11:50:23 -06:00
dependabot-preview[bot]
b8d3190ce9
chore(deps): bump itertools from 0.9.0 to 0.10.0
Bumps [itertools](https://github.com/bluss/rust-itertools) from 0.9.0 to 0.10.0.
- [Release notes](https://github.com/bluss/rust-itertools/releases)
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bluss/rust-itertools/compare/v0.9.0...v0.10.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2021-01-03 03:40:45 +00:00
Ed Page
67222e9338 style: Address clippy 2021-01-02 13:49:28 -06:00
Ed Page
692f0ac095 refactor(typos): Focus API on primary use case 2021-01-02 13:10:40 -06:00
Ed Page
aba85df435 docs(typos): Clarify intent 2021-01-02 13:10:40 -06:00
Ed Page
48112a47e9 refactor(parser): Abstract over lifetimes 2021-01-02 13:10:30 -06:00
Ed Page
bc90bacff2 refactor(typos): Pull out file logic 2021-01-02 13:10:30 -06:00
Ed Page
e741f96de3 refactor(typos): Decouple parsing from checks 2021-01-02 13:10:22 -06:00
Ed Page
1e64080c05 refactor(typos): Open up the name Parser 2021-01-02 12:58:33 -06:00
Ed Page
7fdd0dee16 style(typos): Make parser ordering clearer 2021-01-02 12:58:33 -06:00
dependabot-preview[bot]
7fa5a9eadf
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases)
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:18:35 +00:00
Ed Page
d96de581f3 fix(report): Rendering issues with errors
- We aren't consistent in quoting words
- We used byte offsets rather than column counts
- We mixed styles between disallowed and corrections

Fixes #165
2020-11-24 18:52:24 -06:00
Ed Page
9b0cd5b5f0 fix(report): Show path for errors 2020-11-23 11:20:12 -06:00
Ed Page
869b916ca6 fix: Handle broken pipe 2020-11-21 21:57:12 -06:00
Ed Page
7a1fac7fab refactor(report): Use native types 2020-11-11 18:44:27 -06:00
Ed Page
b7700fa214 refactor: Don't special case --files 2020-11-10 06:30:27 -06:00
Ed Page
628c011f77 fix(report): Ensure json output is clean 2020-11-10 06:30:27 -06:00
Ed Page
e12cd8ed55 refactor: Layer files/filenames on buffer processing 2020-11-10 06:30:27 -06:00
Ed Page
eb20ba9f11 refactor(report): Make Parse consistent with Typos 2020-11-10 06:30:27 -06:00
Ed Page
97f90da9bc refactor: Move off of lazy_static 2020-11-10 06:30:27 -06:00
Ed Page
3bcd8a130e refactor(report): Merge the typos types 2020-11-10 06:30:23 -06:00
Ed Page
fe282a0aea refactor: Pull out common policy 2020-11-07 20:04:58 -06:00
Ed Page
736db10708 fix(format): Clarify message types 2020-10-28 21:01:33 -05:00
Ed Page
527b9837b4 feat: Custom dictionary support
Switching `valid-*` to just `*` where you map typo to correction, with
support for always-valid and never-valid.

Fixes #9
2020-10-27 21:15:25 -05:00
dependabot-preview[bot]
84e56b22b5
chore(deps): bump derive_more from 0.99.9 to 0.99.11
Bumps [derive_more](https://github.com/JelteF/derive_more) from 0.99.9 to 0.99.11.
- [Release notes](https://github.com/JelteF/derive_more/releases)
- [Changelog](https://github.com/JelteF/derive_more/blob/master/CHANGELOG.md)
- [Commits](https://github.com/JelteF/derive_more/compare/v0.99.9...v0.99.11)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-10-01 08:11:46 +00:00
Ed Page
a63dfa0f8c perf: Faster binary-file detection
This switches us from a homegrown implementation to `context_inspector`
- Adds some optimizations by looking for the BoM.
- We used the same algorithm for finding Null bytes
- `context_inspector` caps how much of the buffer is searche though

Besides performance, `content_inspector` also has some known-binary
magic numbers to avoid bad detections.

Fixes #34
2020-08-21 16:29:11 -05:00
Ed Page
5d7e91d214 fix(ci): Report more failures 2020-07-04 20:52:48 -05:00
Ed Page
bc1302f01b feat: Support multiple, valid corrections
Some of the other spell checkers already do this. While I've not checked
where we might need it for our dictionary, this will be important for
dialects.
2020-07-04 20:52:48 -05:00
Ed Page
a5ed18ee46 fix(replace): Don't error on successful replacement 2020-07-04 20:52:47 -05:00
dependabot-preview[bot]
146998f331 chore(deps): bump derive_more from 0.99.7 to 0.99.9
Bumps [derive_more](https://github.com/JelteF/derive_more) from 0.99.7 to 0.99.9.
- [Release notes](https://github.com/JelteF/derive_more/releases)
- [Changelog](https://github.com/JelteF/derive_more/blob/master/CHANGELOG.md)
- [Commits](https://github.com/JelteF/derive_more/compare/v0.99.7...v0.99.9)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-04 20:52:47 -05:00
Ed Page
814ff82aff refactor: Follow monorepo pattern elsewhere 2020-07-04 20:52:47 -05:00