Ed Page
41048d15b3
test: Prevent correcting corrections
2021-07-27 13:58:57 -05:00
Ed Page
fc4ec0e4a1
fix: Correcting to typos
2021-07-27 13:58:57 -05:00
Ed Page
18626e5d2d
chore(ci): Lower server load on PR
2021-07-07 15:38:16 -05:00
Ed Page
a92ae9b6f9
chore(gh): Be friendlier to contributors
2021-07-07 15:37:10 -05:00
Ed Page
2441504881
Merge pull request #309 from epage/clean
...
refactor(typos): Remove unused calculations
2021-07-06 09:25:08 -07:00
Ed Page
5b29113ec8
refactor(typos): Remove unused calculations
...
In #293 , we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
2021-07-06 11:08:05 -05:00
Ed Page
1559bc74bf
Merge pull request #308 from epage/print
...
Improve behavior for diff mode
2021-07-06 07:43:26 -07:00
Ed Page
10a2486163
perf(diff): Don't lock on every line
2021-07-06 09:27:31 -05:00
Ed Page
1b3f1f6b46
fix(diff): Handle broken pipe
2021-07-06 09:26:08 -05:00
Ed Page
26ad06e961
chore(ci): Add missing committed workflow
2021-07-03 12:03:52 -05:00
Ed Page
6fc9eab101
chore(ci): Migrate completely to GH Actions
2021-07-02 14:04:23 -05:00
Ed Page
5c92dc6f8c
chore(ci): Migrate post-release
2021-07-02 14:04:07 -05:00
Ed Page
cce1e2a538
chore: Remove stale file
2021-07-02 14:01:20 -05:00
Ed Page
2898cc6605
fix(docker): Ensure using latest version
2021-07-02 10:44:44 -05:00
Ed Page
56cf2e17b6
Merge pull request #306 from epage/dict
...
refactor(dict): Remove useless entries
2021-07-02 10:41:13 -05:00
Ed Page
a6ad5c0a0b
chore(ci): Fix codegen verify
2021-07-02 10:28:31 -05:00
Ed Page
7a2a5042a1
refactor(dict): Remove useless entries
2021-07-02 10:24:59 -05:00
Ed Page
c917ed845a
Merge pull request #305 from epage/test
...
test: Only run tests relevant for features
2021-07-01 19:49:14 -05:00
Ed Page
fb31288607
test: Only run tests relevant for features
2021-07-01 19:33:32 -05:00
Ed Page
ca1d06bf02
chore(gh): Migrate codegen checks
2021-07-01 19:32:36 -05:00
Ed Page
28002901c4
chore(gh): Fix toolchain versions
2021-07-01 16:23:13 -05:00
Ed Page
7f9602fbc4
chore(gh): Fix MSRV
2021-07-01 15:49:34 -05:00
Ed Page
4254f47a79
chore(gh): Automate Github
2021-07-01 15:45:56 -05:00
Ed Page
fc05aa9633
Merge pull request #303 from epage/phf
...
feat(dict): Shared PHF support
2021-07-01 11:55:03 -05:00
Ed Page
4c2f2c434a
feat(dict): Shared PHF support
2021-07-01 11:14:30 -05:00
Ed Page
3b43272724
refactor(dict): Separate dictgen concerns
2021-07-01 11:00:33 -05:00
Ed Page
97015b3a95
Merge pull request #302 from epage/trie
...
refactor(dict): Change typos-dict to trie
2021-07-01 10:59:59 -05:00
Ed Page
c8d1058a71
refactor(dict): Change typos-dict to trie
...
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
fa1119aa47
Merge pull request #295 from epage/trie
...
perf(dict): Switch varcon to a burst-trie
2021-06-30 19:21:39 -07:00
Ed Page
bbbf985777
perf(dict): Switch varcon to a burst-trie
...
This cuts varcon lookup times in half but I still suspect slower than
phf. Like with bsearch and unlike, the cost is consistent between hits
and misses.
At least this doesn't have the compile hit of PHF + unicase. Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb
refactor(dict): Be more cache concious
2021-06-30 19:56:03 -05:00
Ed Page
f176055834
refactor(dict): Make room for trie logic
2021-06-30 19:56:03 -05:00
Ed Page
0e6d683ebe
test(dict): Bench more varcon cases
2021-06-30 19:56:00 -05:00
Ed Page
0144f4521f
Merge pull request #294 from epage/codegen
...
refactor(dict): Pull out table-lookup logic
2021-06-30 08:32:15 -07:00
Ed Page
a1e95bc7c0
refactor(dict): Pull out table-lookup logic
...
Before, only some dicts did we guarentee were pre-sorted. Now, all are
for-sure pre-sorted.
This also gives each dict the size-check to avoid lookup.
But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82
chore: Skip more releases
2021-06-29 15:39:28 -05:00
Ed Page
8f3f5b90ad
chore: Release
2021-06-29 15:34:25 -05:00
Ed Page
9149c4765d
chore: Release
2021-06-29 15:05:18 -05:00
Ed Page
effc21ed10
Merge pull request #293 from epage/parse
...
Detect non-identifiers to ignore
2021-06-29 15:03:56 -05:00
Ed Page
9a0d754862
docs(parser): Note new features
2021-06-29 14:43:05 -05:00
Ed Page
c83f655109
feat(parser): Ignore URLs
...
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146
fix(parser): Ensure we get full base64
...
We greedily matched separators, including ones that might be part of
base64. This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b
feat(parser): Ignore emails
...
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).
This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6
feat(parser): Ignore base64
...
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).
Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796
feat(parser): Ignore SHA-1+
...
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b
fix(parser): Go ahead and do lower UUIDs
...
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1
feat(parser): Ignore UUIDs
...
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase. RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case. The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682
refactor(typos)!: Bake ignores into parser
...
This is prep for other items to be ignored
BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens. Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
a46cc76bae
Merge pull request #292 from epage/unicode
...
perf(parser): Auto-detect unicode
2021-06-29 03:46:20 -07:00
Ed Page
ded90f2387
perf(parser): Auto-detect unicode
...
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00