Commit graph

274 commits

Author SHA1 Message Date
Ed Page
482d320407 fix(dict): Ensure we fall through to built-in dict 2020-11-11 12:22:29 -06:00
Ed Page
6bdbd821e3 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change
```
real    0m24.432s
user    0m32.492s
sys     0m4.190s
```
2020-11-10 20:57:04 -06:00
Ed Page
beaa0f4091 perf(dict): Avoid hashing unknwon words
Bypass hashing when we know (through str::len) that a word won't be in
the dict.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With this change:
```
real    0m24.060s
user    0m31.559s
sys     0m4.258s
```
2020-11-10 20:57:00 -06:00
Ed Page
18e31fa578 perf: Avoid hashing withut custom dict
`HashMap::get` (at least hashbrown) hashes before getting and doesn't
check if dict is empty.  For the custom dict, a common use case will
have the dict be empty.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

Bypassing `HashMap::get`
```
real    0m16.415s
user    0m14.519s
sys     0m4.118s
```

On a moderately sized repo.
2020-11-10 20:56:54 -06:00
Ed Page
150c5bfdc1 perf: Hash faster for custom dicts
If we have to hash for the custom dict, we might as well be fast about
it.  We do not need a cryptographically secure algorithm since the
content is fixed for the user.

Master:
```
real    0m26.675s
user    0m33.683s
sys     0m4.535s
```

With ahash:
```
real    0m23.993s
user    0m30.800s
sys     0m4.440s
```
2020-11-10 20:56:49 -06:00
Ed Page
b7700fa214 refactor: Don't special case --files 2020-11-10 06:30:27 -06:00
Ed Page
e12cd8ed55 refactor: Layer files/filenames on buffer processing 2020-11-10 06:30:27 -06:00
Ed Page
3bcd8a130e refactor(report): Merge the typos types 2020-11-10 06:30:23 -06:00
Ed Page
2ef1d02164 Revert "feat(ignore): Typos-specific ignores"
This reverts commit 0052617fcd.

The fix for #134 was backwards.  It turns out `overrides` is for
including rather than ignoring.  Will need to look at this further.
2020-11-03 19:55:45 -06:00
Ed Page
f0c24b0afa feat(config): Allow separating config from source 2020-10-30 08:33:43 -05:00
Ed Page
736db10708 fix(format): Clarify message types 2020-10-28 21:01:33 -05:00
Ed Page
2e6cd39781 fix(config): Respect file's defaults 2020-10-28 20:58:48 -05:00
Ed Page
78d76bcbc6 fix: Be friendlier about error messages 2020-10-28 20:47:16 -05:00
Ed Page
527b9837b4 feat: Custom dictionary support
Switching `valid-*` to just `*` where you map typo to correction, with
support for always-valid and never-valid.

Fixes #9
2020-10-27 21:15:25 -05:00
Ed Page
043692afe0 feat(dict): Override builtin dictionary
Sometimes you just have to live with a typo or its done intentionally
(like weird company names).  With this commit, a user can now identifier
blessed identifiers and words.

This is ostly what is needed for #9 but sometimes people will have
common typos that they'll want to provide corrections for.
2020-09-02 20:24:54 -05:00
Ed Page
0052617fcd feat(ignore): Typos-specific ignores
THis is to help with cases like a monorepo with vendored dependencies.
A user might want to search (`.ignore`) them but not hold the code to
the same standards as first-party code.

Fixes #134
2020-08-25 21:09:42 -05:00
Ed Page
ab4a5bbdaf feat: Support english dialects
The goal is to be as accepting and unobtrusive to new code bases as
possible.  To this end, we correct typos into the closest english
dialect.

If someone wants to opt-in, they can have typos correct to a specific
english dialect.

Fixes #52
Fixes #22
2020-08-20 19:37:37 -05:00
Ed Page
5d7e91d214 fix(ci): Report more failures 2020-07-04 20:52:48 -05:00
Ed Page
bc1302f01b feat: Support multiple, valid corrections
Some of the other spell checkers already do this. While I've not checked
where we might need it for our dictionary, this will be important for
dialects.
2020-07-04 20:52:48 -05:00
Ed Page
a5ed18ee46 fix(replace): Don't error on successful replacement 2020-07-04 20:52:47 -05:00
Ed Page
d1be9c1944 feat: Replacement support
Now can fix typos!

Fixes #4
2020-07-04 20:52:46 -05:00
Ed Page
94ee49b068 refactor: Re-order main 2020-07-04 20:52:46 -05:00
Ed Page
5cfe913d03 refactor: Split out checks 2020-07-04 20:52:46 -05:00
Ed Page
79d9a4d801 refactor: Split out args 2020-07-04 20:52:46 -05:00
Ed Page
b7d412c20e refactor: Calculate threading where it is needed 2020-07-04 20:52:46 -05:00
Ed Page
2e1b95fec1 refactor: Collpase cases 2020-07-04 20:52:46 -05:00
Ed Page
8732d24f53 refactor: Use a single reporter instance 2020-07-04 20:52:46 -05:00
Ed Page
575971a5c5 refactor: Turn reports into a trait 2020-07-04 20:52:46 -05:00
Ed Page
8af7c47fe5 refactor: SImplify init 2020-03-21 14:33:51 -05:00
Ed Page
6b8047ee44 perf: Multi-threaded spell checking
Fixes #7
2020-03-21 14:22:53 -05:00
Ed Page
333762f55c refactor: Prepare for threads 2020-03-21 13:28:38 -05:00
Ed Page
b21db206d2 chore: Update env_logger 2019-12-02 09:50:06 -07:00
Ed Page
b74258a43c refactor: Consolidate paths 2019-11-15 07:48:07 -07:00
Ed Page
59baa36327 refactor!: Delay populating of Checks 2019-11-14 20:20:29 -07:00
Ed Page
107308a655 perf: Use standard identifier rules to avoid doing umber checks 2019-11-02 19:40:06 -06:00
Ed Page
ed00f3ceae docs: Fix typo 2019-11-02 08:57:07 -06:00
Ed Page
cc4b53a1b4
Merge pull request #64 from epage/debug
feat: Dump files, identifiers, and words
2019-10-31 11:40:43 -06:00
Ed Page
ce365ae12e feat: Dump files, identifiers, and words
This will help people debug their configurations.

Fixes #41
2019-10-31 10:44:23 -06:00
Ed Page
a48a457cc3 fix: Improve the organization of --help 2019-10-30 11:02:02 -06:00
Ed Page
975dab8514 chore(benches): Fix compilation errors 2019-10-30 07:20:52 -06:00
Ed Page
06db6fc693 refactor!: Move off of failure 2019-10-29 11:36:50 -06:00
Ed Page
ce1ef2ca30 refactor!: Move dict implementation into CLI 2019-10-28 11:00:47 -06:00
Ed Page
0a2f865d0f refactor: Change error strategy for future thread use 2019-10-26 20:31:10 -06:00
Ed Page
5e6e4b9ad7 chore: Upgrade structopt 2019-10-17 20:49:26 -06:00
Ed Page
1bdd1c928a refactor: Split out typos-dict 2019-08-08 10:24:50 -05:00
Ed Page
164ee9cb84 refactor: Split bin/lib into separate crates 2019-08-08 10:04:51 -05:00
Ed Page
6fc61966cc feat(parser): Give control over identifier detection 2019-08-08 08:58:37 -05:00
Ed Page
709446821b refactor(cli): Remove dead code 2019-08-08 08:58:36 -05:00
Ed Page
8ea31b5e1d refactor(cli): Re-order code to make diffing easier 2019-08-08 08:58:36 -05:00
Ed Page
26787df50d refactor(checks): Implement traits for easier debugging 2019-08-08 08:58:36 -05:00
Ed Page
a2cf3b7cc9 feat(config): Configure checking logic
Later we can add the per-filetype checks

Fixes #37
2019-08-08 08:58:36 -05:00
Ed Page
29ff040fd1 feat(config): Expose binary in config file 2019-08-08 08:58:35 -05:00
Ed Page
77603daab5 refactor(cli): Rename Options struct 2019-08-08 08:58:35 -05:00
Ed Page
a923f93ec5 fix(config): Move file-based config into a table 2019-08-08 08:58:35 -05:00
Ed Page
f9a1600513 refactor( Push out options 2019-08-08 08:58:34 -05:00
Ed Page
87015d3522 feat(config): Find config for each path passed in 2019-08-08 08:58:34 -05:00
Ed Page
ad4c6dcd77 refactor(config): Centralize loading logic 2019-08-08 08:58:34 -05:00
Ed Page
3d4da686ad feat: Accept config on command-line 2019-08-08 08:58:34 -05:00
Ed Page
8d96a2ad1d refactor(cli): Prepare for merging im config file 2019-08-08 08:58:33 -05:00
Ed Page
f15191de14 refactor(report): Leverage derive_more, more 2019-08-07 08:25:55 -05:00
Ed Page
e90a89ef93 refactor(report): Leverage derive_more 2019-08-07 08:20:18 -05:00
Ed Page
a129fb3d65 refactor(report): Switch to swrde derive feature 2019-08-07 08:16:22 -05:00
Ed Page
3419a8df85 feat(parse): Make identifier symbols configurable 2019-08-07 07:36:49 -05:00
Ed Page
e093135ac1 feat(parse): Make digits in identifier optional 2019-08-07 07:28:25 -05:00
Ed Page
50c89ef761 fix(parse): Change ignore_hex default 2019-08-07 07:24:54 -05:00
Ed Page
6ae42b4c1e refactor(parse): Explicit Default 2019-08-07 07:24:28 -05:00
Ed Page
750005e971 fix(parse): Don't skip binary files when explicitly requested
Fixes #35
2019-07-31 21:01:58 -06:00
Ed Page
adcbe68621 refactor(dict): Split out a trait 2019-07-27 19:50:36 -06:00
Ed Page
834b9f77f2 refactor(checks): Separate out the logic 2019-07-27 19:50:35 -06:00
Ed Page
3e678cca1e refactor(parser): Share a parser across calls 2019-07-27 19:50:34 -06:00
Ed Page
36fefc166e refactor(parser): Add more traits to builder 2019-07-27 19:50:34 -06:00
Ed Page
039664339d refactor(parser): Switch to by-ref builder
Since nothing is being moved into `Parser`, we don't get any performance
benefit with a moving builder, so switching to a by-ref builder.
2019-07-27 19:50:34 -06:00
Ed Page
3cf9d8672c refactor(parser): Move hex handling to parser 2019-07-27 19:50:33 -06:00
Ed Page
d0b9979c36 refactor(parser): Split out parser creation 2019-07-27 19:50:33 -06:00
Ed Page
8e4708dfdf refactor(parser): Split out into struct 2019-07-27 19:50:33 -06:00
Ed Page
81f20bb293 feat: Set exit code on typos being found
Fixes #45
2019-07-23 10:37:05 -06:00
Ed Page
8b90debfa5 fix: Remove threads flag
Don't give the user a false sense of hope.  It will be brought back in
as part of #7.
2019-07-20 08:05:54 -06:00
Ed Page
469ae14181 feat: Log debug information
Fixes #39
2019-07-19 21:45:51 -06:00
Ed Page
95c0aea484 feat: Give control over verifying file content 2019-07-19 07:28:17 -06:00
Ed Page
ec307dffdd feat: Check file names
Fixes #24
2019-07-19 07:28:17 -06:00
Ed Page
6da830572a refactor(parser): Rename bytes-parser 2019-07-19 07:28:16 -06:00
Ed Page
d247d68c37 fix: Report binary files to user
Fixes #38
2019-07-19 07:28:10 -06:00
Ed Page
da156e3f23 feat: Ignore binary files
Fixes #29
2019-07-13 22:41:31 -06:00
Ed Page
4ce7303fc2 refactor(parser): Switch to bstr for line splitting 2019-07-13 22:41:31 -06:00
Ed Page
92a2560c9a feat(parser): Support C++ hex literal separators 2019-07-13 20:15:23 -06:00
Ed Page
b6ab968478 feat(parser): Treat contractions as a word
This should be safe.  Rarely is `'` used as syntax in a language that
separates literals.

- `'` is used within hex literals in C++ but we want to treat them as
  one word
- `'` is used for lifetimes in Rust but there are other symbols on the
  left side.
2019-07-13 20:15:23 -06:00
Ed Page
006204e66a feat(parser): Ignore hex literals
Trying to avoid accidentally correcting something that looks like a word
inside a hex number, like `0xBEAF`.

Fixes #19
2019-07-13 20:15:22 -06:00
Ed Page
73054cca9e feat: VCS ignore flag 2019-07-12 21:43:18 -06:00
Ed Page
6bbf8390ff feat: Ignore parents flag 2019-07-12 21:39:38 -06:00
Ed Page
1bd4ca8288 feat: Git global flag 2019-07-12 21:36:32 -06:00
Ed Page
27edfc6e02 feat: Global ignore file flag 2019-07-11 21:56:27 -06:00
Ed Page
e6d29070fc feat: Expose control over .ignore 2019-07-10 20:12:14 -06:00
Ed Page
867c53043b feat: Give control over ignoring hidden files 2019-07-10 20:04:14 -06:00
Ed Page
166e2630c0 fix(parse): Don't assume boundary characters are one byte
This was inspired by heck.  They have an invariant to ensure this isn't
a problem (only accept `_` as boundary) while on the other hand we
accept a lot of things as boundaries.
2019-07-06 21:54:45 -06:00
Ed Page
377c911328 fix: Rename to typos 2019-07-03 19:22:36 -06:00
Ed Page
953064e7d1 fix(dict): Fix should match typo's case
Fixes #10
2019-06-26 07:22:59 -06:00
Ed Page
a5b8636bdb refactor(dict): Allow for owned corrections 2019-06-24 21:46:40 -06:00
Ed Page
b12e90c141 refactor(report): Rename source field 2019-06-24 21:46:39 -06:00
Ed Page
859769b835 refactor: Rename Symbol to Identifier
This is more descriptive
2019-06-24 21:46:39 -06:00
Ed Page
5bbd6f530a chore: Fix typo 2019-06-24 21:46:38 -06:00
Ed Page
881fce5114 feat(parse): Track the case of each word 2019-06-24 21:46:38 -06:00
Ed Page
80aeed1b43 fix(report): Align text when tabs are used
Ideally we would provide for more than a space per tab but this at least
gets us better alignment.

Fixes #11
2019-06-22 09:18:06 -06:00
Ed Page
a082207283 perf(report): Reduce grabbing of locks 2019-06-22 09:12:54 -06:00
Ed Page
3d1fb3b1ae feat(parse): Process words composing symbols 2019-06-15 22:21:40 -06:00
Ed Page
d78713dba1 fix: Improve the quality of symbols being reported 2019-06-14 15:57:41 -06:00
Ed Page
34c922509a chore(CI): Push the regex lint under a rug 2019-06-14 15:14:42 -06:00
Ed Page
905de9bd8d chore(CI): Fighting clippy 2019-06-14 14:53:34 -06:00
Ed Page
f1e3163ba2 fix: Clippy 2019-06-14 07:04:58 -06:00
Ed Page
9ccfc9c27d fix: Clippy 2019-06-14 06:51:22 -06:00
Ed Page
9f198c973d chore: Run cargo fmt 2019-06-14 06:45:39 -06:00
Ed Page
af66072272 feat(dict): Perform case-insensitive comparisons 2019-06-13 19:55:21 -06:00
Ed Page
719cc7d43b refactor: Restore str processing
This is in prep for using unicase.
2019-04-16 20:24:31 -06:00
Ed Page
5992ba110d refactor: Clarify intent of Token 2019-04-16 20:22:01 -06:00
Ed Page
f8d42116da refactor: Rename module 2019-04-16 20:16:31 -06:00
Ed Page
b6aabc9392 refactor: Switch to bytes for symbol lookup 2019-04-16 18:15:12 -06:00
Ed Page
779db94ecb chore: Document a case 2019-04-16 17:41:07 -06:00
Ed Page
af196a9561 fix: Settle on a name? 2019-04-16 17:27:55 -06:00
Ed Page
85ee5cfac9 fix(api): Split lib 2019-01-24 08:24:20 -07:00
Ed Page
d8ca9f9d5a fix: Limit words to just identifiers 2019-01-23 16:48:59 -07:00
Ed Page
c0c99ef3ad test: Basic tokenization testing 2019-01-23 16:48:59 -07:00
Ed Page
98b6c075f9 feat: Current dir is default 2019-01-23 16:48:58 -07:00
Ed Page
2ddd7d93df feat: Control over output format 2019-01-23 16:48:58 -07:00
Ed Page
e59d7817b4 fix(api): Clarify lifetimes 2019-01-22 16:59:49 -07:00
Ed Page
0cdd64c038 Initial commit 2019-01-22 15:01:33 -07:00