Ed Page
2202b7f661
fix(parser): Handle c-escape/printf
...
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.
With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).
Fixes #3
2021-07-30 11:30:05 -05:00
Ed Page
5b29113ec8
refactor(typos): Remove unused calculations
...
In #293 , we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
9149c4765d
chore: Release
2021-06-29 15:05:18 -05:00
Ed Page
c83f655109
feat(parser): Ignore URLs
...
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146
fix(parser): Ensure we get full base64
...
We greedily matched separators, including ones that might be part of
base64. This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b
feat(parser): Ignore emails
...
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).
This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6
feat(parser): Ignore base64
...
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).
Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796
feat(parser): Ignore SHA-1+
...
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b
fix(parser): Go ahead and do lower UUIDs
...
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1
feat(parser): Ignore UUIDs
...
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase. RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case. The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682
refactor(typos)!: Bake ignores into parser
...
This is prep for other items to be ignored
BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens. Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
ded90f2387
perf(parser): Auto-detect unicode
...
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
95417f3a41
refactor(parser): Consolidate utf8/ascii logic
2021-06-29 05:10:02 -05:00
Ed Page
3e66a99674
chore: Release
2021-05-21 20:41:02 -05:00
Ed Page
7c803681c4
chore: Release
2021-05-13 09:58:09 -05:00
Ed Page
f40ed5a328
style: Address clippy
2021-04-30 11:37:16 -05:00
Ed Page
517da7ecd2
perf(parser): Allow people to bypass unicode cost
2021-04-29 21:07:59 -05:00
Ed Page
09d2124d0f
perf(parser): Limit inner-loop assers
2021-04-29 18:31:05 -05:00
Ed Page
287c4cbfe9
refactor(parser): Give more impl flexibility
2021-04-29 18:31:05 -05:00
Ed Page
9cbc7410a4
fix(parser)!: Defer to Unicode XID for identifiers
...
This saves us from having to have configuration for every detail. If
people need more control, we can offer it later.
Fixes #225
2021-04-29 18:30:57 -05:00
Ed Page
f15cc58f71
fix(parser): Flip leading digits to work correctly
2021-04-29 18:30:14 -05:00
Ed Page
4b94352b7a
perf(parser): Try hand-rolled number parsing
2021-04-29 18:30:14 -05:00
Ed Page
6b92e345cc
perf(parser): Speed up UTF-8 validation
2021-04-27 21:17:46 -05:00
Ed Page
819702c82f
refactor(parser): Unify str/bytes code paths
...
The main goal is to support replacing the parser with `nom` where I need
access to `str` only functionality.
With crates like simdutf8, this might also offer up performance gains
since they see the biggest benefit when doing large blocks of
validation.
2021-04-27 21:17:43 -05:00
Ed Page
fce11d6c35
refactor(parser)!: Allow short-circuiting word splitting
...
This is prep for experiments with getting this information ahead of
time.
See #224
2021-04-27 21:17:38 -05:00
Ed Page
9bfb506c6d
fix(typos)!: Clarify Case::Upper
s name
...
`Scream` was referrin to `SCREAMING_CASE` but outside of that context, I
think `Upper` is more accurate.
2021-04-21 20:36:35 -05:00
Ed Page
1f4c587692
chore({{crate_name}}): Release {{version}}
2021-04-14 19:13:25 -05:00
Ed Page
b4459bef33
chore: Fix readme paths in Cargo.toml
2021-04-13 21:36:47 -05:00
Ed Page
d7978658d4
test(cli): Ensure we apply corrections
2021-04-10 19:13:48 -05:00
Ed Page
b5f606f201
refactor(typos): Simplify the top-level API
2021-03-01 11:50:23 -06:00
Ed Page
1010d2ffe5
refactor(tokenizer): Remove stale function
2021-03-01 11:50:23 -06:00
dependabot-preview[bot]
b8d3190ce9
chore(deps): bump itertools from 0.9.0 to 0.10.0
...
Bumps [itertools](https://github.com/bluss/rust-itertools ) from 0.9.0 to 0.10.0.
- [Release notes](https://github.com/bluss/rust-itertools/releases )
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/bluss/rust-itertools/compare/v0.9.0...v0.10.0 )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2021-01-03 03:40:45 +00:00
Ed Page
67222e9338
style: Address clippy
2021-01-02 13:49:28 -06:00
Ed Page
692f0ac095
refactor(typos): Focus API on primary use case
2021-01-02 13:10:40 -06:00
Ed Page
aba85df435
docs(typos): Clarify intent
2021-01-02 13:10:40 -06:00
Ed Page
48112a47e9
refactor(parser): Abstract over lifetimes
2021-01-02 13:10:30 -06:00
Ed Page
bc90bacff2
refactor(typos): Pull out file logic
2021-01-02 13:10:30 -06:00
Ed Page
e741f96de3
refactor(typos): Decouple parsing from checks
2021-01-02 13:10:22 -06:00
Ed Page
1e64080c05
refactor(typos): Open up the name Parser
2021-01-02 12:58:33 -06:00
Ed Page
7fdd0dee16
style(typos): Make parser ordering clearer
2021-01-02 12:58:33 -06:00
dependabot-preview[bot]
7fa5a9eadf
chore(deps): bump unicode-segmentation from 1.7.0 to 1.7.1
...
Bumps [unicode-segmentation](https://github.com/unicode-rs/unicode-segmentation ) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/unicode-rs/unicode-segmentation/releases )
- [Commits](https://github.com/unicode-rs/unicode-segmentation/commits )
Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-12-01 08:18:35 +00:00
Ed Page
d96de581f3
fix(report): Rendering issues with errors
...
- We aren't consistent in quoting words
- We used byte offsets rather than column counts
- We mixed styles between disallowed and corrections
Fixes #165
2020-11-24 18:52:24 -06:00
Ed Page
9b0cd5b5f0
fix(report): Show path for errors
2020-11-23 11:20:12 -06:00
Ed Page
869b916ca6
fix: Handle broken pipe
2020-11-21 21:57:12 -06:00
Ed Page
7a1fac7fab
refactor(report): Use native types
2020-11-11 18:44:27 -06:00
Ed Page
b7700fa214
refactor: Don't special case --files
2020-11-10 06:30:27 -06:00
Ed Page
628c011f77
fix(report): Ensure json output is clean
2020-11-10 06:30:27 -06:00
Ed Page
e12cd8ed55
refactor: Layer files/filenames on buffer processing
2020-11-10 06:30:27 -06:00
Ed Page
eb20ba9f11
refactor(report): Make Parse consistent with Typos
2020-11-10 06:30:27 -06:00
Ed Page
97f90da9bc
refactor: Move off of lazy_static
2020-11-10 06:30:27 -06:00