renovate[bot]
5131fb8167
chore(deps): update compatible
2023-05-01 15:30:51 +00:00
Ed Page
808e862bfb
chore: Resolve deprecations
2023-04-27 23:26:55 -05:00
Ed Page
d17ca898d9
chore: Upgrade to 0.4.3
2023-04-27 23:24:25 -05:00
renovate[bot]
e1a138b637
chore(deps): update compatible
2023-04-01 07:05:05 +00:00
Ed Page
53e2855fa0
chore: Release
2023-03-18 04:19:19 -05:00
Ed Page
243b4efc9e
chore: Update winnow
2023-03-18 04:11:55 -05:00
Ed Page
e15de8b72e
chore: Release
2023-03-18 02:09:49 -05:00
Ed Page
ac46a6ba54
feat(config): Custom ignores
...
Typos primarily works off of identifiers and words. We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc. This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.
This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes. Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.
While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question. Ideally, we'd implement document settings / directives for
these cases (#316 ).
2023-03-18 01:25:39 -05:00
Ed Page
f4293b58c5
chore: Release
2023-02-28 06:30:27 -06:00
Ed Page
d752626069
chore: Update dependencies
2023-02-27 23:34:02 -06:00
Ed Page
ed8683ab81
chore: Release
2023-02-22 11:26:17 -06:00
Ed Page
d99eb1601b
refactor: Resolve deprecations
2023-02-21 11:11:24 -06:00
Ed Page
15e748d0e5
refactor: Switch to winnow
2023-02-21 10:41:45 -06:00
Ed Page
adce192ca3
chore: Update dependencies
2023-02-01 09:31:38 -06:00
Ed Page
12c6491895
chore: Release
2023-01-16 08:43:06 -06:00
renovate[bot]
4f6f07b904
chore(deps): update compatible
2023-01-01 02:13:39 +00:00
Ed Page
39b28c3010
chore: Release
2022-11-03 22:28:10 -05:00
Ed Page
87a02e2a2a
chore: Switch to workspace inheritance
2022-11-01 14:20:38 -05:00
Ed Page
1cd8a74031
chore: Upgrade dependencies
2022-11-01 14:14:35 -05:00
Ed Page
16ca0accbb
chore: Release
2022-10-06 08:26:11 -05:00
Ed Page
f78135acd2
chore: Bump MSRV to 1.64.0
2022-10-04 10:51:03 -05:00
Ed Page
32485c4bad
chore: Upgrade dependencies
2022-10-03 11:36:25 -05:00
Ed Page
4c2445fb57
chore: Release
2022-08-25 16:24:58 -05:00
Ed Page
cb91b89080
fix(parse): Ignore CSS hex values that start with digits
...
Fixes #542
2022-08-25 16:05:57 -05:00
Ed Page
c7e576614e
chore: Release
2022-08-16 07:56:56 -05:00
Ed Page
2fce5f7f09
fix: Remove unused log dependency
2022-08-16 07:56:31 -05:00
Ed Page
4d9c507595
chore: Release
2022-08-13 12:03:26 -05:00
Ed Page
80f1ed0290
chore: Bump MSRV to 1.60
2022-08-03 09:32:45 -05:00
Ed Page
ea38677643
chore: Update dependencies
2022-08-03 09:29:38 -05:00
dependabot[bot]
4ffa72dac1
chore(deps): Bump once_cell from 1.12.0 to 1.13.0
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.12.0...v1.13.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-08-01 07:02:51 +00:00
Ed Page
aff7161142
chore: Release
2022-06-15 16:11:53 -05:00
Ed Page
7c953a71ec
chore: Upgrade to 2021 edition
2022-06-01 06:53:10 -05:00
Ed Page
b15558f0f3
chore: Set rust-version
2022-06-01 06:51:59 -05:00
Ed Page
778fd7a53d
chore: Release
2022-05-10 14:24:11 -05:00
Ed Page
fd5398316f
fix(parser): Better short base64 detection
...
Previously, we bailed out if the string is too short (<90) and there
weren't non-alpha-base64 bytes present. What we ignored were the
padding bytes.
We key off of padding bytes to detect that a string is in fact base64
encoded. Like the other cases, there can be false positives but those
strings should show up elsewhere or the compiler will fail.
This was called out in #485
2022-05-10 14:02:59 -05:00
Ed Page
bd5048def5
fix(parser): Allow backslashes after ignore items
...
To allow `\\` to start a token, we couldn't let it end a token. By
switching the termiantor to a peek, we can now make it end a token
**and** start a token, allowing us to work better with windows paths.
Fixes #481
2022-05-10 14:02:54 -05:00
Ed Page
1720e7d65e
fix(parser): Ignore items at end of input
2022-05-10 13:38:03 -05:00
Ed Page
7e15afe81f
test(parser): Add reproduction of #481
2022-05-10 12:58:19 -05:00
Ed Page
4869764f7b
test(parser): Remove unclear test case
...
Unsure why this case is here and it causes difficulties
2022-05-10 12:58:13 -05:00
Ed Page
ad89736832
refactor(parser): Clarify precedence levels
2022-05-10 12:58:08 -05:00
Ed Page
dcc3c0b11e
chore: Release
2022-04-25 11:49:02 -05:00
SeongChan Lee
4e4f136ec6
Fix tokenizer for uppercase UUID
...
Microsoft toolchains usually emit UUID/GUID in UPPERCASE
2022-04-25 11:12:25 +09:00
Ed Page
7d3e9bb070
chore: Release
2022-04-18 09:39:53 -05:00
Ed Page
e63659c208
fix: Ignore CSS colors
...
Fixes #462
2022-04-18 09:19:44 -05:00
Ed Page
9c273c6cfb
Merge pull request #451 from crate-ci/dependabot/cargo/nom-7.1.1
...
chore(deps): Bump nom from 7.1.0 to 7.1.1
2022-04-01 09:34:31 -05:00
dependabot[bot]
0281c7023e
chore(deps): Bump nom from 7.1.0 to 7.1.1
...
Bumps [nom](https://github.com/Geal/nom ) from 7.1.0 to 7.1.1.
- [Release notes](https://github.com/Geal/nom/releases )
- [Changelog](https://github.com/Geal/nom/blob/main/CHANGELOG.md )
- [Commits](https://github.com/Geal/nom/compare/7.1.0...7.1.1 )
---
updated-dependencies:
- dependency-name: nom
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-04-01 07:02:37 +00:00
dependabot[bot]
40080cb01e
chore(deps): Bump once_cell from 1.9.0 to 1.10.0
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.9.0...v1.10.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-04-01 07:02:26 +00:00
Ed Page
1d16086495
chore: Release
2022-03-09 08:59:49 -06:00
dependabot[bot]
a58b735e5e
chore(deps): Bump unicode-segmentation from 1.8.0 to 1.9.0
...
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation ) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases )
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits )
---
updated-dependencies:
- dependency-name: unicode-segmentation
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-03-01 07:03:33 +00:00
Ed Page
b686760935
chore: Release
2022-02-14 09:05:09 -06:00
Ed Page
c3bb4adfa1
fix(parser): Allow commas in urls
...
Got us closer to https://www.ietf.org/rfc/rfc3986.txt
Fixes #433
2022-02-14 08:49:55 -06:00
Ed Page
09203fd592
fix(parser): Recognize URLs with passwords
2022-02-14 08:21:56 -06:00
Ed Page
5b7fe620ec
chore: Release
2022-01-26 14:32:31 -06:00
Ed Page
a39074fc7f
fix(parser): Detect shorter base64 values
...
This is part of the way to #413 . In that case, they aren't providing
padding though.
2022-01-26 14:18:01 -06:00
Ed Page
2c5f2ecedd
chore: Release
2022-01-26 10:01:15 -06:00
Ed Page
3c78d65462
fix(parser): Don't stop on almost-printfs
...
When we added support for printf interopolation, we had to adjust our
separator matching to not eat the start of printf interpolation.
When doing so, I overlooked the need to still eat it in the catch-all.
If we don't, we then try to read `%` as part of the identifier and bail
out early.
Fixes #411
2022-01-26 09:39:23 -06:00
Ed Page
4b2e66487c
chore: Release
2022-01-24 20:35:08 -06:00
Ed Page
0c49c3ea2b
fix(parser): Allow markdown formatting around ordinals
...
Fixes #409
2022-01-24 20:01:06 -06:00
Ed Page
71b53cb23e
chore: Release
2021-12-18 17:52:11 -06:00
Ed Page
5c83dec07b
style: Remove unused variable
2021-12-14 15:41:52 -06:00
Ed Page
469a9aedc2
chore: Release
2021-12-14 12:58:03 -06:00
Ed Page
c0e8a2c932
chore: Release
2021-11-16 07:46:33 -06:00
Neubauer, Sebastian
3fc6089660
fix: Fix multiple escape sequences
...
If escape sequences follow straight after each other, there is no
delimiter in-between.
In such a case, parsing previously stopped and did not find any
typos further in the file.
2021-11-15 11:31:53 +01:00
Ed Page
4f17586d08
chore: Update MSRV
2021-11-08 11:56:01 -06:00
Ed Page
a8ae8a5c26
chore: Update boiletplate
2021-11-08 10:11:02 -06:00
Ed Page
153f570ec9
chore: Release
2021-11-03 11:48:12 -05:00
Ed Page
efae838e5c
perf: Remove some function overhead
...
Unfortunately, almost all of this is for corrections.
2021-09-14 21:09:30 -05:00
Ed Page
3cd24f5cca
chore: Release
2021-09-14 10:03:34 -05:00
Ed Page
e20879dae1
fix: Reduce false positives from ordinals
...
Just ignoring them since our focus is on programmer typos and these
can't be identifiers. This is simpler and is less work at runtime.
Fixes #331
2021-09-14 08:53:31 -05:00
Ed Page
92e46848a3
chore: Update dependencies
2021-09-01 06:38:52 -05:00
Ed Page
dbea7ab1e0
chore: Release
2021-08-30 09:16:40 -05:00
Nick Mathewson
739d1a2f7c
Ignore hexadecimal "hashes" of length 32 or greater.
...
By experimentation (see ticket), it seems that same-case hexadecimal
strings of 32 characters or longer are almost never intended to hold
text. By treating such strings as ignored, we can resist a larger
category of false positives.
Closes #326 .
2021-08-20 12:34:59 -04:00
Ed Page
613a0cba4b
chore: Iterate on release process
2021-08-16 11:23:25 -05:00
Ed Page
2dce866937
chore: Release
2021-08-02 09:55:25 -05:00
Ed Page
a5f0dd8ee9
fix(token): Continue parsing on c-escape
2021-08-02 09:29:10 -05:00
Ed Page
fdeba0e71b
fix(token): Continue parsing on c-escape
2021-08-02 09:11:54 -05:00
Ed Page
2304fc6735
chore: Release
2021-07-30 12:12:07 -05:00
Ed Page
2202b7f661
fix(parser): Handle c-escape/printf
...
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.
With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).
Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
5b29113ec8
refactor(typos): Remove unused calculations
...
In #293 , we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
9149c4765d
chore: Release
2021-06-29 15:05:18 -05:00
Ed Page
c83f655109
feat(parser): Ignore URLs
...
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146
fix(parser): Ensure we get full base64
...
We greedily matched separators, including ones that might be part of
base64. This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b
feat(parser): Ignore emails
...
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).
This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6
feat(parser): Ignore base64
...
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).
Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796
feat(parser): Ignore SHA-1+
...
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b
fix(parser): Go ahead and do lower UUIDs
...
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1
feat(parser): Ignore UUIDs
...
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase. RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case. The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682
refactor(typos)!: Bake ignores into parser
...
This is prep for other items to be ignored
BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens. Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
ded90f2387
perf(parser): Auto-detect unicode
...
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
95417f3a41
refactor(parser): Consolidate utf8/ascii logic
2021-06-29 05:10:02 -05:00
Ed Page
3e66a99674
chore: Release
2021-05-21 20:41:02 -05:00
Ed Page
7c803681c4
chore: Release
2021-05-13 09:58:09 -05:00
Ed Page
f40ed5a328
style: Address clippy
2021-04-30 11:37:16 -05:00
Ed Page
517da7ecd2
perf(parser): Allow people to bypass unicode cost
2021-04-29 21:07:59 -05:00
Ed Page
09d2124d0f
perf(parser): Limit inner-loop assers
2021-04-29 18:31:05 -05:00
Ed Page
287c4cbfe9
refactor(parser): Give more impl flexibility
2021-04-29 18:31:05 -05:00
Ed Page
9cbc7410a4
fix(parser)!: Defer to Unicode XID for identifiers
...
This saves us from having to have configuration for every detail. If
people need more control, we can offer it later.
Fixes #225
2021-04-29 18:30:57 -05:00
Ed Page
f15cc58f71
fix(parser): Flip leading digits to work correctly
2021-04-29 18:30:14 -05:00
Ed Page
4b94352b7a
perf(parser): Try hand-rolled number parsing
2021-04-29 18:30:14 -05:00
Ed Page
6b92e345cc
perf(parser): Speed up UTF-8 validation
2021-04-27 21:17:46 -05:00