Commit graph

1442 commits

Author SHA1 Message Date
Ed Page
70fbd63b00 fix: Update dictionary 2021-07-27 14:21:00 -05:00
Ed Page
960471ae23 fix: Prevent old typos from coming back 2021-07-27 14:16:13 -05:00
Ed Page
4e99217896 test: Ensure words are stored lowercase 2021-07-27 14:16:12 -05:00
Ed Page
0008713395 test: Ensure words.csv stays sorted 2021-07-27 14:16:12 -05:00
Ed Page
41048d15b3 test: Prevent correcting corrections 2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1 fix: Correcting to typos 2021-07-27 13:58:57 -05:00
Ed Page
18626e5d2d chore(ci): Lower server load on PR 2021-07-07 15:38:16 -05:00
Ed Page
a92ae9b6f9 chore(gh): Be friendlier to contributors 2021-07-07 15:37:10 -05:00
Ed Page
2441504881
Merge pull request #309 from epage/clean
refactor(typos): Remove unused calculations
2021-07-06 09:25:08 -07:00
Ed Page
5b29113ec8 refactor(typos): Remove unused calculations
In #293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
1559bc74bf
Merge pull request #308 from epage/print
Improve behavior for diff mode
2021-07-06 07:43:26 -07:00
Ed Page
10a2486163 perf(diff): Don't lock on every line 2021-07-06 09:27:31 -05:00
Ed Page
1b3f1f6b46 fix(diff): Handle broken pipe 2021-07-06 09:26:08 -05:00
Ed Page
26ad06e961 chore(ci): Add missing committed workflow 2021-07-03 12:03:52 -05:00
Ed Page
6fc9eab101 chore(ci): Migrate completely to GH Actions 2021-07-02 14:04:23 -05:00
Ed Page
5c92dc6f8c chore(ci): Migrate post-release 2021-07-02 14:04:07 -05:00
Ed Page
cce1e2a538 chore: Remove stale file 2021-07-02 14:01:20 -05:00
Ed Page
2898cc6605 fix(docker): Ensure using latest version 2021-07-02 10:44:44 -05:00
Ed Page
56cf2e17b6
Merge pull request #306 from epage/dict
refactor(dict): Remove useless entries
2021-07-02 10:41:13 -05:00
Ed Page
a6ad5c0a0b chore(ci): Fix codegen verify 2021-07-02 10:28:31 -05:00
Ed Page
7a2a5042a1 refactor(dict): Remove useless entries 2021-07-02 10:24:59 -05:00
Ed Page
c917ed845a
Merge pull request #305 from epage/test
test: Only run tests relevant for features
2021-07-01 19:49:14 -05:00
Ed Page
fb31288607 test: Only run tests relevant for features 2021-07-01 19:33:32 -05:00
Ed Page
ca1d06bf02 chore(gh): Migrate codegen checks 2021-07-01 19:32:36 -05:00
Ed Page
28002901c4 chore(gh): Fix toolchain versions 2021-07-01 16:23:13 -05:00
Ed Page
7f9602fbc4 chore(gh): Fix MSRV 2021-07-01 15:49:34 -05:00
Ed Page
4254f47a79 chore(gh): Automate Github 2021-07-01 15:45:56 -05:00
Ed Page
fc05aa9633
Merge pull request #303 from epage/phf
feat(dict): Shared PHF support
2021-07-01 11:55:03 -05:00
Ed Page
4c2f2c434a feat(dict): Shared PHF support 2021-07-01 11:14:30 -05:00
Ed Page
3b43272724 refactor(dict): Separate dictgen concerns 2021-07-01 11:00:33 -05:00
Ed Page
97015b3a95
Merge pull request #302 from epage/trie
refactor(dict): Change typos-dict to trie
2021-07-01 10:59:59 -05:00
Ed Page
c8d1058a71 refactor(dict): Change typos-dict to trie
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
fa1119aa47
Merge pull request #295 from epage/trie
perf(dict): Switch varcon to a burst-trie
2021-06-30 19:21:39 -07:00
Ed Page
bbbf985777 perf(dict): Switch varcon to a burst-trie
This cuts varcon lookup times in half but I still suspect slower than
phf.  Like with bsearch and unlike, the cost is consistent between hits
and misses.

At least this doesn't have the compile hit of PHF + unicase.  Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb refactor(dict): Be more cache concious 2021-06-30 19:56:03 -05:00
Ed Page
f176055834 refactor(dict): Make room for trie logic 2021-06-30 19:56:03 -05:00
Ed Page
0e6d683ebe test(dict): Bench more varcon cases 2021-06-30 19:56:00 -05:00
Ed Page
0144f4521f
Merge pull request #294 from epage/codegen
refactor(dict): Pull out table-lookup logic
2021-06-30 08:32:15 -07:00
Ed Page
a1e95bc7c0 refactor(dict): Pull out table-lookup logic
Before, only some dicts did we guarentee were pre-sorted.  Now, all are
for-sure pre-sorted.

This also gives each dict the size-check to avoid lookup.

But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82 chore: Skip more releases 2021-06-29 15:39:28 -05:00
Ed Page
8f3f5b90ad chore: Release 2021-06-29 15:34:25 -05:00
Ed Page
9149c4765d chore: Release 2021-06-29 15:05:18 -05:00
Ed Page
effc21ed10
Merge pull request #293 from epage/parse
Detect non-identifiers to ignore
2021-06-29 15:03:56 -05:00
Ed Page
9a0d754862 docs(parser): Note new features 2021-06-29 14:43:05 -05:00
Ed Page
c83f655109 feat(parser): Ignore URLs
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146 fix(parser): Ensure we get full base64
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b feat(parser): Ignore emails
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).

This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6 feat(parser): Ignore base64
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).

Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796 feat(parser): Ignore SHA-1+
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b fix(parser): Go ahead and do lower UUIDs
I need this for hash support anyways
2021-06-29 12:13:21 -05:00