Commit graph

98 commits

Author SHA1 Message Date
Ed Page
9efaabdf84 style: Make clippy happy 2024-12-02 11:33:06 -06:00
Ed Page
bf98193204 perf(token): Don't allow unbounded backtrackable parsing
In some test data for rinja, they check some parsing corner cases.
Unfortunately for us, also hit a performance corner case.
The entire file was a valid email username but without an `@`.
This mean for every byte, we checked that every byte after it was a
valid username but then backtracked at the end, repeating this until the
whole file was read.

Fixes #1088
2024-08-30 14:52:13 -05:00
Ed Page
855ac08da3 refactor(token): Resolve deprecations 2024-08-23 09:16:46 -05:00
Ed Page
267121b5d6 style: Make clippy happy 2024-07-26 16:08:02 -05:00
Ed Page
6047fba1fe feat(tokens): Ignore JWTs
Fixes #1057
2024-07-10 11:57:46 -05:00
Ed Page
5eab324cdd refactor(tokens): Simplify parser logic 2024-07-10 11:51:54 -05:00
Ed Page
8c8f52fe6a test(tokens): Show JWT behavior 2024-07-10 11:47:56 -05:00
Ed Page
dc42232bba test(tokens): Use snapshot testing 2024-07-10 11:47:12 -05:00
Ed Page
b6c895ea49 chore: Update from _rust/main template 2024-04-30 11:28:23 -05:00
Ed Page
de3014ef3c chore: Update winnow 2024-02-13 08:46:25 -06:00
Ed Page
fd7ab4a789 refactor: Prep for winnow 0.6 upgrade 2024-02-12 20:27:33 -06:00
Ed Page
2bf326a3c6 chore: Update winnow 2024-02-12 20:22:47 -06:00
Ed Page
8879269b0d fix(token): Don't crash on parsing unicode 2024-02-08 07:10:28 -06:00
Ed Page
b221e9ce56 chore: Clean up dependencies 2023-10-16 12:58:00 -05:00
Ed Page
0c05b217d4 style: Make clippy happy 2023-09-01 10:20:03 -05:00
Ed Page
b6c78eb8ac refactor(typos): Upgrade to winnow 0.5 2023-07-14 13:29:24 -05:00
Ed Page
6f40717c8f refactor(typos): Switch to BStr for better debugging 2023-07-14 13:28:36 -05:00
Ed Page
e98fc52b0d chore(typos): Add parse tracing 2023-07-14 12:32:07 -05:00
figsoda
ef1907fb1e chore: Remove usages of deprecated functions 2023-06-22 11:22:46 -04:00
Ed Page
808e862bfb chore: Resolve deprecations 2023-04-27 23:26:55 -05:00
Ed Page
243b4efc9e chore: Update winnow 2023-03-18 04:11:55 -05:00
Ed Page
ac46a6ba54 feat(config): Custom ignores
Typos primarily works off of identifiers and words.  We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc.  This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.

This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes.  Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.

While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question.  Ideally, we'd implement document settings / directives for
these cases (#316).
2023-03-18 01:25:39 -05:00
Ed Page
d99eb1601b refactor: Resolve deprecations 2023-02-21 11:11:24 -06:00
Ed Page
15e748d0e5 refactor: Switch to winnow 2023-02-21 10:41:45 -06:00
Ed Page
cb91b89080 fix(parse): Ignore CSS hex values that start with digits
Fixes #542
2022-08-25 16:05:57 -05:00
Ed Page
fd5398316f fix(parser): Better short base64 detection
Previously, we bailed out if the string is too short (<90) and there
weren't non-alpha-base64 bytes present.  What we ignored were the
padding bytes.

We key off of padding bytes to detect that a string is in fact base64
encoded.  Like the other cases, there can be false positives but those
strings should show up elsewhere or the compiler will fail.

This was called out in #485
2022-05-10 14:02:59 -05:00
Ed Page
bd5048def5 fix(parser): Allow backslashes after ignore items
To allow `\\` to start a token, we couldn't let it end a token.  By
switching the termiantor to a peek, we can now make it end a token
**and** start a token, allowing us to work better with windows paths.

Fixes #481
2022-05-10 14:02:54 -05:00
Ed Page
1720e7d65e fix(parser): Ignore items at end of input 2022-05-10 13:38:03 -05:00
Ed Page
7e15afe81f test(parser): Add reproduction of #481 2022-05-10 12:58:19 -05:00
Ed Page
4869764f7b test(parser): Remove unclear test case
Unsure why this case is here and it causes difficulties
2022-05-10 12:58:13 -05:00
Ed Page
ad89736832 refactor(parser): Clarify precedence levels 2022-05-10 12:58:08 -05:00
SeongChan Lee
4e4f136ec6 Fix tokenizer for uppercase UUID
Microsoft toolchains usually emit UUID/GUID in UPPERCASE
2022-04-25 11:12:25 +09:00
Ed Page
e63659c208 fix: Ignore CSS colors
Fixes #462
2022-04-18 09:19:44 -05:00
Ed Page
c3bb4adfa1 fix(parser): Allow commas in urls
Got us closer to https://www.ietf.org/rfc/rfc3986.txt

Fixes #433
2022-02-14 08:49:55 -06:00
Ed Page
09203fd592 fix(parser): Recognize URLs with passwords 2022-02-14 08:21:56 -06:00
Ed Page
a39074fc7f fix(parser): Detect shorter base64 values
This is part of the way to #413.  In that case, they aren't providing
padding though.
2022-01-26 14:18:01 -06:00
Ed Page
3c78d65462 fix(parser): Don't stop on almost-printfs
When we added support for printf interopolation, we had to adjust our
separator matching to not eat the start of printf interpolation.

When doing so, I overlooked the need to still eat it in the catch-all.
If we don't, we then try to read `%` as part of the identifier and bail
out early.

Fixes #411
2022-01-26 09:39:23 -06:00
Ed Page
0c49c3ea2b fix(parser): Allow markdown formatting around ordinals
Fixes #409
2022-01-24 20:01:06 -06:00
Ed Page
5c83dec07b style: Remove unused variable 2021-12-14 15:41:52 -06:00
Neubauer, Sebastian
3fc6089660 fix: Fix multiple escape sequences
If escape sequences follow straight after each other, there is no
delimiter in-between.
In such a case, parsing previously stopped and did not find any
typos further in the file.
2021-11-15 11:31:53 +01:00
Ed Page
4f17586d08 chore: Update MSRV 2021-11-08 11:56:01 -06:00
Ed Page
efae838e5c perf: Remove some function overhead
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
e20879dae1 fix: Reduce false positives from ordinals
Just ignoring them since our focus is on programmer typos and these
can't be identifiers.  This is simpler and is less work at runtime.

Fixes #331
2021-09-14 08:53:31 -05:00
Nick Mathewson
739d1a2f7c Ignore hexadecimal "hashes" of length 32 or greater.
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text.  By treating such strings as ignored, we can resist a larger
category of false positives.

Closes #326.
2021-08-20 12:34:59 -04:00
Ed Page
a5f0dd8ee9 fix(token): Continue parsing on c-escape 2021-08-02 09:29:10 -05:00
Ed Page
fdeba0e71b fix(token): Continue parsing on c-escape 2021-08-02 09:11:54 -05:00
Ed Page
2202b7f661 fix(parser): Handle c-escape/printf
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.

With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).

Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
5b29113ec8 refactor(typos): Remove unused calculations
In #293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
c83f655109 feat(parser): Ignore URLs
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146 fix(parser): Ensure we get full base64
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00