Commit graph

1289 commits

Author SHA1 Message Date
Ed Page
0008713395 test: Ensure words.csv stays sorted 2021-07-27 14:16:12 -05:00
Ed Page
41048d15b3 test: Prevent correcting corrections 2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1 fix: Correcting to typos 2021-07-27 13:58:57 -05:00
Ed Page
18626e5d2d chore(ci): Lower server load on PR 2021-07-07 15:38:16 -05:00
Ed Page
a92ae9b6f9 chore(gh): Be friendlier to contributors 2021-07-07 15:37:10 -05:00
Ed Page
2441504881
Merge pull request #309 from epage/clean
refactor(typos): Remove unused calculations
2021-07-06 09:25:08 -07:00
Ed Page
5b29113ec8 refactor(typos): Remove unused calculations
In #293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
1559bc74bf
Merge pull request #308 from epage/print
Improve behavior for diff mode
2021-07-06 07:43:26 -07:00
Ed Page
10a2486163 perf(diff): Don't lock on every line 2021-07-06 09:27:31 -05:00
Ed Page
1b3f1f6b46 fix(diff): Handle broken pipe 2021-07-06 09:26:08 -05:00
Ed Page
26ad06e961 chore(ci): Add missing committed workflow 2021-07-03 12:03:52 -05:00
Ed Page
6fc9eab101 chore(ci): Migrate completely to GH Actions 2021-07-02 14:04:23 -05:00
Ed Page
5c92dc6f8c chore(ci): Migrate post-release 2021-07-02 14:04:07 -05:00
Ed Page
cce1e2a538 chore: Remove stale file 2021-07-02 14:01:20 -05:00
Ed Page
2898cc6605 fix(docker): Ensure using latest version 2021-07-02 10:44:44 -05:00
Ed Page
56cf2e17b6
Merge pull request #306 from epage/dict
refactor(dict): Remove useless entries
2021-07-02 10:41:13 -05:00
Ed Page
a6ad5c0a0b chore(ci): Fix codegen verify 2021-07-02 10:28:31 -05:00
Ed Page
7a2a5042a1 refactor(dict): Remove useless entries 2021-07-02 10:24:59 -05:00
Ed Page
c917ed845a
Merge pull request #305 from epage/test
test: Only run tests relevant for features
2021-07-01 19:49:14 -05:00
Ed Page
fb31288607 test: Only run tests relevant for features 2021-07-01 19:33:32 -05:00
Ed Page
ca1d06bf02 chore(gh): Migrate codegen checks 2021-07-01 19:32:36 -05:00
Ed Page
28002901c4 chore(gh): Fix toolchain versions 2021-07-01 16:23:13 -05:00
Ed Page
7f9602fbc4 chore(gh): Fix MSRV 2021-07-01 15:49:34 -05:00
Ed Page
4254f47a79 chore(gh): Automate Github 2021-07-01 15:45:56 -05:00
Ed Page
fc05aa9633
Merge pull request #303 from epage/phf
feat(dict): Shared PHF support
2021-07-01 11:55:03 -05:00
Ed Page
4c2f2c434a feat(dict): Shared PHF support 2021-07-01 11:14:30 -05:00
Ed Page
3b43272724 refactor(dict): Separate dictgen concerns 2021-07-01 11:00:33 -05:00
Ed Page
97015b3a95
Merge pull request #302 from epage/trie
refactor(dict): Change typos-dict to trie
2021-07-01 10:59:59 -05:00
Ed Page
c8d1058a71 refactor(dict): Change typos-dict to trie
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
fa1119aa47
Merge pull request #295 from epage/trie
perf(dict): Switch varcon to a burst-trie
2021-06-30 19:21:39 -07:00
Ed Page
bbbf985777 perf(dict): Switch varcon to a burst-trie
This cuts varcon lookup times in half but I still suspect slower than
phf.  Like with bsearch and unlike, the cost is consistent between hits
and misses.

At least this doesn't have the compile hit of PHF + unicase.  Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb refactor(dict): Be more cache concious 2021-06-30 19:56:03 -05:00
Ed Page
f176055834 refactor(dict): Make room for trie logic 2021-06-30 19:56:03 -05:00
Ed Page
0e6d683ebe test(dict): Bench more varcon cases 2021-06-30 19:56:00 -05:00
Ed Page
0144f4521f
Merge pull request #294 from epage/codegen
refactor(dict): Pull out table-lookup logic
2021-06-30 08:32:15 -07:00
Ed Page
a1e95bc7c0 refactor(dict): Pull out table-lookup logic
Before, only some dicts did we guarentee were pre-sorted.  Now, all are
for-sure pre-sorted.

This also gives each dict the size-check to avoid lookup.

But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82 chore: Skip more releases 2021-06-29 15:39:28 -05:00
Ed Page
8f3f5b90ad chore: Release 2021-06-29 15:34:25 -05:00
Ed Page
9149c4765d chore: Release 2021-06-29 15:05:18 -05:00
Ed Page
effc21ed10
Merge pull request #293 from epage/parse
Detect non-identifiers to ignore
2021-06-29 15:03:56 -05:00
Ed Page
9a0d754862 docs(parser): Note new features 2021-06-29 14:43:05 -05:00
Ed Page
c83f655109 feat(parser): Ignore URLs
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146 fix(parser): Ensure we get full base64
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b feat(parser): Ignore emails
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).

This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6 feat(parser): Ignore base64
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).

Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796 feat(parser): Ignore SHA-1+
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b fix(parser): Go ahead and do lower UUIDs
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1 feat(parser): Ignore UUIDs
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase.  RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case.  The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682 refactor(typos)!: Bake ignores into parser
This is prep for other items to be ignored

BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens.  Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
a46cc76bae
Merge pull request #292 from epage/unicode
perf(parser): Auto-detect unicode
2021-06-29 03:46:20 -07:00