Commit graph

277 commits

Author SHA1 Message Date
Ed Page
2a7bd5b046 feat(config): Add new file types
Fixes #220
2021-04-06 21:18:18 -05:00
Ed Page
c71c9f4f84 refactor(config): Allow extending type matcher 2021-04-06 21:14:47 -05:00
Ed Page
6729bf9f7c refactor(config): Open door for other mutable operations 2021-04-06 21:14:09 -05:00
Ed Page
38a3007c56 fix(config): Properly layer type and override settings 2021-04-06 20:53:34 -05:00
Ed Page
aa21439502 style: Clippy 2021-04-06 10:30:02 -05:00
Ed Page
8f365ee155 feat(config): Show available type definitions 2021-04-05 21:15:41 -05:00
Ed Page
a101df95c2 feat(config): Per-file type settings
Fixes #14
2021-04-05 21:03:49 -05:00
Ed Page
3fd90b09f8 fix(cli): Allow CLI to override walking config 2021-04-05 07:34:05 -05:00
Ed Page
78330ba9c1 refactor(cli): Drop the traits from layering 2021-03-31 21:23:30 -05:00
Ed Page
13617fa9d0 refactor(cli): Decouple walk and engine policies 2021-03-31 20:19:52 -05:00
Ed Page
47eb554052 refactor(cli): Clarify role of file config 2021-03-31 20:06:33 -05:00
Ed Page
d51725b2a4 style: Address clippy 2021-03-30 21:33:39 -05:00
Ed Page
8365351dba perf(cli): Reuse configs across runs 2021-03-29 20:27:12 -05:00
Ed Page
a76ddd42ce refactor(cli): Pull out policy creation 2021-03-29 20:27:12 -05:00
Ed Page
f402d3ee77 refactor(config): Clarify config is not file-specific
This is prep for the config being reused in other contexts, like commit
messages.
2021-03-29 20:27:12 -05:00
Ed Page
4bbc59facf refactor(config)!: Detect when no dict config
In preparing for smarter handling of config, we need to be able to tell
what is present and what isn't.

BREAKING CHANGE: `--hex` was removed, the value didn't seem high enough.
2021-03-29 20:27:12 -05:00
Ed Page
8bcacf3ca6 refactor(cli): Break out config->policy
This is prep for having many policies floating around
2021-03-29 20:27:12 -05:00
Ed Page
b17f9c3a12 feat: Const some fns 2021-03-29 20:27:06 -05:00
Ed Page
75ba4ac535 perf(config): Get small-string optimization 2021-03-01 12:25:51 -06:00
Ed Page
b5827004a2 refactor(config): Simplify 2021-03-01 12:19:56 -06:00
Ed Page
0ea6de6019 refactor(cli): Clarify role of checks 2021-03-01 11:50:23 -06:00
Ed Page
b5f606f201 refactor(typos): Simplify the top-level API 2021-03-01 11:50:23 -06:00
Ed Page
e1e4ce8b61 refactor: Clarify roles 2021-03-01 11:50:23 -06:00
Ed Page
ddeee94cf8 refactor(checks): Make all state dynamic 2021-03-01 11:50:23 -06:00
Ed Page
1c3acd747a fix(config)!: Move binary to file
Seems like it would make sense to allow varying this by directory (when
supporting layering) and by file type.
2021-03-01 11:50:12 -06:00
Ed Page
dbac2eff4a feat(config): Use '-' to dump config to stdout 2021-01-04 19:29:43 -06:00
Ed Page
13a93ee8d1 fix(config): Provide all field defaults 2021-01-04 19:16:02 -06:00
Ed Page
ecb32a674a fix(config): Merge custom config over repo config
Custom config, like args, is more mutable, so it should be respected
more.
2021-01-04 19:16:02 -06:00
Ed Page
5db9a8e1c9 docs(config): Make config more discoverable 2021-01-04 19:16:02 -06:00
Ed Page
f27282fbc0 docs(args): Clarify what args are esclusive 2021-01-04 19:15:45 -06:00
Ed Page
70163fae61 docs(args): Clarify command line arguments 2021-01-04 14:54:50 -06:00
Ed Page
1c4d2ac32b feat: Support '-' for stdin
This helps with tool integration.

Fixes #195
2021-01-02 22:17:08 -06:00
Ed Page
998fad4390 feat: Check and replace UTF-16 files
We don't have good detection for non-UTF encodings and don't have
encoding support for UTF-32, so limiting it to just UTF-16.

Fixes #17
2021-01-02 19:25:00 -06:00
Ed Page
67222e9338 style: Address clippy 2021-01-02 13:49:28 -06:00
Ed Page
e6a4f49eb5 refactor: Clarify names 2021-01-02 13:10:40 -06:00
Ed Page
692f0ac095 refactor(typos): Focus API on primary use case 2021-01-02 13:10:40 -06:00
Ed Page
5f82dd6017 fix: Arg diff reports immediately 2021-01-02 13:10:36 -06:00
Ed Page
c900e48593 fix: Arg write-changes reports immediately 2021-01-02 13:10:30 -06:00
Ed Page
48112a47e9 refactor(parser): Abstract over lifetimes 2021-01-02 13:10:30 -06:00
Ed Page
663eb94d32 refactor: Switch Typos to check_file 2021-01-02 13:10:30 -06:00
Ed Page
6e53d7e719 refactor: Switch Words/Identifiers to check_file 2021-01-02 13:10:30 -06:00
Ed Page
d28174439b refactor: Switch FoundFiles to check_file 2021-01-02 13:10:30 -06:00
Ed Page
6c28376e50 refactor: Give checks full control 2021-01-02 13:10:30 -06:00
Ed Page
220a79ff30 refactor: Make room for parent function 2021-01-02 13:10:30 -06:00
Ed Page
bc90bacff2 refactor(typos): Pull out file logic 2021-01-02 13:10:30 -06:00
Ed Page
1e64080c05 refactor(typos): Open up the name Parser 2021-01-02 12:58:33 -06:00
Ed Page
e9b3378913 fix: Be friendlier with panics 2020-11-23 12:40:55 -06:00
Ed Page
b03df3aeae fix: Return more precise errors 2020-11-23 10:08:38 -06:00
Ed Page
869b916ca6 fix: Handle broken pipe 2020-11-21 21:57:12 -06:00
Ed Page
4ddbdcf5dd fix(cli): Define an error code policy
The main goal is to make spelling errors differentiated from other
errors.

Fixes #170
2020-11-14 21:17:29 -06:00
Ed Page
ce16d38cfd perf(dict): Skip checking numbers 2020-11-11 18:52:23 -06:00
Ed Page
d258e62f43 feat(report): Diff output mode 2020-11-11 18:52:23 -06:00
Ed Page
7a1fac7fab refactor(report): Use native types 2020-11-11 18:44:27 -06:00
Ed Page
482d320407 fix(dict): Ensure we fall through to built-in dict 2020-11-11 12:22:29 -06:00
Ed Page
6bdbd821e3 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change
```
real    0m24.432s
user    0m32.492s
sys     0m4.190s
```
2020-11-10 20:57:04 -06:00
Ed Page
beaa0f4091 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change:
```
real    0m24.060s
user    0m31.559s
sys     0m4.258s
```
2020-11-10 20:57:00 -06:00
Ed Page
18e31fa578 perf: Avoid hashing withut custom dict
`HashMap::get` (at least hashbrown) hashes before getting and doesn't
check if dict is empty.  For the custom dict, a common use case will
have the dict be empty.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

Bypassing `HashMap::get`
```
real    0m16.415s
user    0m14.519s
sys     0m4.118s
```

On a moderately sized repo.
2020-11-10 20:56:54 -06:00
Ed Page
150c5bfdc1 perf: Hash faster for custom dicts
If we have to hash for the custom dict, we might as well be fast about
it.  We do not need a cryptographically secure algorithm since the
content is fixed for the user.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With ahash:
```
real    0m23.993s
user    0m30.800s
sys     0m4.440s
```
2020-11-10 20:56:49 -06:00
Ed Page
b7700fa214 refactor: Don't special case --files 2020-11-10 06:30:27 -06:00
Ed Page
e12cd8ed55 refactor: Layer files/filenames on buffer processing 2020-11-10 06:30:27 -06:00
Ed Page
3bcd8a130e refactor(report): Merge the typos types 2020-11-10 06:30:23 -06:00
Ed Page
2ef1d02164 Revert "feat(ignore): Typos-specific ignores"
This reverts commit 0052617fcd.

The fix for #134 was backwards.  It turns out `overrides` is for
including rather than ignoring.  Will need to look at this further.
2020-11-03 19:55:45 -06:00
Ed Page
f0c24b0afa feat(config): Allow separating config from source 2020-10-30 08:33:43 -05:00
Ed Page
736db10708 fix(format): Clarify message types 2020-10-28 21:01:33 -05:00
Ed Page
2e6cd39781 fix(config): Respect file's defaults 2020-10-28 20:58:48 -05:00
Ed Page
78d76bcbc6 fix: Be friendlier about error messages 2020-10-28 20:47:16 -05:00
Ed Page
527b9837b4 feat: Custom dictionary support
Switching `valid-*` to just `*` where you map typo to correction, with
support for always-valid and never-valid.

Fixes #9
2020-10-27 21:15:25 -05:00
Ed Page
043692afe0 feat(dict): Override builtin dictionary
Sometimes you just have to live with a typo or its done intentionally
(like weird company names).  With this commit, a user can now identifier
blessed identifiers and words.

This is ostly what is needed for #9 but sometimes people will have
common typos that they'll want to provide corrections for.
2020-09-02 20:24:54 -05:00
Ed Page
0052617fcd feat(ignore): Typos-specific ignores
THis is to help with cases like a monorepo with vendored dependencies.
A user might want to search (`.ignore`) them but not hold the code to
the same standards as first-party code.

Fixes #134
2020-08-25 21:09:42 -05:00
Ed Page
ab4a5bbdaf feat: Support english dialects
The goal is to be as accepting and unobtrusive to new code bases as
possible.  To this end, we correct typos into the closest english
dialect.

If someone wants to opt-in, they can have typos correct to a specific
english dialect.

Fixes #52
Fixes #22
2020-08-20 19:37:37 -05:00
Ed Page
5d7e91d214 fix(ci): Report more failures 2020-07-04 20:52:48 -05:00
Ed Page
bc1302f01b feat: Support multiple, valid corrections
Some of the other spell checkers already do this. While I've not checked
where we might need it for our dictionary, this will be important for
dialects.
2020-07-04 20:52:48 -05:00
Ed Page
a5ed18ee46 fix(replace): Don't error on successful replacement 2020-07-04 20:52:47 -05:00
Ed Page
d1be9c1944 feat: Replacement support
Now can fix typos!

Fixes #4
2020-07-04 20:52:46 -05:00
Ed Page
94ee49b068 refactor: Re-order main 2020-07-04 20:52:46 -05:00
Ed Page
5cfe913d03 refactor: Split out checks 2020-07-04 20:52:46 -05:00
Ed Page
79d9a4d801 refactor: Split out args 2020-07-04 20:52:46 -05:00
Ed Page
b7d412c20e refactor: Calculate threading where it is needed 2020-07-04 20:52:46 -05:00
Ed Page
2e1b95fec1 refactor: Collpase cases 2020-07-04 20:52:46 -05:00
Ed Page
8732d24f53 refactor: Use a single reporter instance 2020-07-04 20:52:46 -05:00
Ed Page
575971a5c5 refactor: Turn reports into a trait 2020-07-04 20:52:46 -05:00
Ed Page
8af7c47fe5 refactor: SImplify init 2020-03-21 14:33:51 -05:00
Ed Page
6b8047ee44 perf: Multi-threaded spell checking
Fixes #7
2020-03-21 14:22:53 -05:00
Ed Page
333762f55c refactor: Prepare for threads 2020-03-21 13:28:38 -05:00
Ed Page
b21db206d2 chore: Update env_logger 2019-12-02 09:50:06 -07:00
Ed Page
b74258a43c refactor: Consolidate paths 2019-11-15 07:48:07 -07:00
Ed Page
59baa36327 refactor!: Delay populating of Checks 2019-11-14 20:20:29 -07:00
Ed Page
107308a655 perf: Use standard identifier rules to avoid doing umber checks 2019-11-02 19:40:06 -06:00
Ed Page
ed00f3ceae docs: Fix typo 2019-11-02 08:57:07 -06:00
Ed Page
cc4b53a1b4
Merge pull request #64 from epage/debug
feat: Dump files, identifiers, and words
2019-10-31 11:40:43 -06:00
Ed Page
ce365ae12e feat: Dump files, identifiers, and words
This will help people debug their configurations.

Fixes #41
2019-10-31 10:44:23 -06:00
Ed Page
a48a457cc3 fix: Improve the organization of --help 2019-10-30 11:02:02 -06:00
Ed Page
975dab8514 chore(benches): Fix compilation errors 2019-10-30 07:20:52 -06:00
Ed Page
06db6fc693 refactor!: Move off of failure 2019-10-29 11:36:50 -06:00
Ed Page
ce1ef2ca30 refactor!: Move dict implementation into CLI 2019-10-28 11:00:47 -06:00
Ed Page
0a2f865d0f refactor: Change error strategy for future thread use 2019-10-26 20:31:10 -06:00
Ed Page
5e6e4b9ad7 chore: Upgrade structopt 2019-10-17 20:49:26 -06:00
Ed Page
1bdd1c928a refactor: Split out typos-dict 2019-08-08 10:24:50 -05:00
Ed Page
164ee9cb84 refactor: Split bin/lib into separate crates 2019-08-08 10:04:51 -05:00
Ed Page
6fc61966cc feat(parser): Give control over identifier detection 2019-08-08 08:58:37 -05:00
Ed Page
709446821b refactor(cli): Remove dead code 2019-08-08 08:58:36 -05:00
Ed Page
8ea31b5e1d refactor(cli): Re-order code to make diffing easier 2019-08-08 08:58:36 -05:00
Ed Page
26787df50d refactor(checks): Implement traits for easier debugging 2019-08-08 08:58:36 -05:00
Ed Page
a2cf3b7cc9 feat(config): Configure checking logic
Later we can add the per-filetype checks

Fixes #37
2019-08-08 08:58:36 -05:00
Ed Page
29ff040fd1 feat(config): Expose binary in config file 2019-08-08 08:58:35 -05:00
Ed Page
77603daab5 refactor(cli): Rename Options struct 2019-08-08 08:58:35 -05:00
Ed Page
a923f93ec5 fix(config): Move file-based config into a table 2019-08-08 08:58:35 -05:00
Ed Page
f9a1600513 refactor( Push out options 2019-08-08 08:58:34 -05:00
Ed Page
87015d3522 feat(config): Find config for each path passed in 2019-08-08 08:58:34 -05:00
Ed Page
ad4c6dcd77 refactor(config): Centralize loading logic 2019-08-08 08:58:34 -05:00
Ed Page
3d4da686ad feat: Accept config on command-line 2019-08-08 08:58:34 -05:00
Ed Page
8d96a2ad1d refactor(cli): Prepare for merging im config file 2019-08-08 08:58:33 -05:00
Ed Page
f15191de14 refactor(report): Leverage derive_more, more 2019-08-07 08:25:55 -05:00
Ed Page
e90a89ef93 refactor(report): Leverage derive_more 2019-08-07 08:20:18 -05:00
Ed Page
a129fb3d65 refactor(report): Switch to swrde derive feature 2019-08-07 08:16:22 -05:00
Ed Page
3419a8df85 feat(parse): Make identifier symbols configurable 2019-08-07 07:36:49 -05:00
Ed Page
e093135ac1 feat(parse): Make digits in identifier optional 2019-08-07 07:28:25 -05:00
Ed Page
50c89ef761 fix(parse): Change ignore_hex default 2019-08-07 07:24:54 -05:00
Ed Page
6ae42b4c1e refactor(parse): Explicit Default 2019-08-07 07:24:28 -05:00
Ed Page
750005e971 fix(parse): Don't skip binary files when explicitly requested
Fixes #35
2019-07-31 21:01:58 -06:00
Ed Page
adcbe68621 refactor(dict): Split out a trait 2019-07-27 19:50:36 -06:00
Ed Page
834b9f77f2 refactor(checks): Separate out the logic 2019-07-27 19:50:35 -06:00
Ed Page
3e678cca1e refactor(parser): Share a parser across calls 2019-07-27 19:50:34 -06:00
Ed Page
36fefc166e refactor(parser): Add more traits to builder 2019-07-27 19:50:34 -06:00
Ed Page
039664339d refactor(parser): Switch to by-ref builder
Since nothing is being moved into `Parser`, we don't get any performance
benefit with a moving builder, so switching to a by-ref builder.
2019-07-27 19:50:34 -06:00
Ed Page
3cf9d8672c refactor(parser): Move hex handling to parser 2019-07-27 19:50:33 -06:00
Ed Page
d0b9979c36 refactor(parser): Split out parser creation 2019-07-27 19:50:33 -06:00
Ed Page
8e4708dfdf refactor(parser): Split out into struct 2019-07-27 19:50:33 -06:00
Ed Page
81f20bb293 feat: Set exit code on typos being found
Fixes #45
2019-07-23 10:37:05 -06:00
Ed Page
8b90debfa5 fix: Remove threads flag
Don't give the user a false sense of hope.  It will be brought back in
as part of #7.
2019-07-20 08:05:54 -06:00
Ed Page
469ae14181 feat: Log debug information
Fixes #39
2019-07-19 21:45:51 -06:00
Ed Page
95c0aea484 feat: Give control over verifying file content 2019-07-19 07:28:17 -06:00
Ed Page
ec307dffdd feat: Check file names
Fixes #24
2019-07-19 07:28:17 -06:00
Ed Page
6da830572a refactor(parser): Rename bytes-parser 2019-07-19 07:28:16 -06:00
Ed Page
d247d68c37 fix: Report binary files to user
Fixes #38
2019-07-19 07:28:10 -06:00
Ed Page
da156e3f23 feat: Ignore binary files
Fixes #29
2019-07-13 22:41:31 -06:00
Ed Page
4ce7303fc2 refactor(parser): Switch to bstr for line splitting 2019-07-13 22:41:31 -06:00
Ed Page
92a2560c9a feat(parser): Support C++ hex literal separators 2019-07-13 20:15:23 -06:00
Ed Page
b6ab968478 feat(parser): Treat contractions as a word
This should be safe.  Rarely is `'` used as syntax in a language that
separates literals.

- `'` is used within hex literals in C++ but we want to treat them as
  one word
- `'` is used for lifetimes in Rust but there are other symbols on the
  left side.
2019-07-13 20:15:23 -06:00
Ed Page
006204e66a feat(parser): Ignore hex literals
Trying to avoid accidentally correcting something that looks like a word
inside a hex number, like `0xBEAF`.

Fixes #19
2019-07-13 20:15:22 -06:00
Ed Page
73054cca9e feat: VCS ignore flag 2019-07-12 21:43:18 -06:00
Ed Page
6bbf8390ff feat: Ignore parents flag 2019-07-12 21:39:38 -06:00
Ed Page
1bd4ca8288 feat: Git global flag 2019-07-12 21:36:32 -06:00
Ed Page
27edfc6e02 feat: Global ignore file flag 2019-07-11 21:56:27 -06:00
Ed Page
e6d29070fc feat: Expose control over .ignore 2019-07-10 20:12:14 -06:00
Ed Page
867c53043b feat: Give control over ignoring hidden files 2019-07-10 20:04:14 -06:00
Ed Page
166e2630c0 fix(parse): Don't assume boundary characters are one byte
This was inspired by heck.  They have an invariant to ensure this isn't
a problem (only accept `_` as boundary) while on the other hand we
accept a lot of things as boundaries.
2019-07-06 21:54:45 -06:00
Ed Page
377c911328 fix: Rename to typos 2019-07-03 19:22:36 -06:00
Ed Page
953064e7d1 fix(dict): Fix should match typo's case
Fixes #10
2019-06-26 07:22:59 -06:00
Ed Page
a5b8636bdb refactor(dict): Allow for owned corrections 2019-06-24 21:46:40 -06:00