Ed Page
26ad06e961
chore(ci): Add missing committed workflow
2021-07-03 12:03:52 -05:00
Ed Page
6fc9eab101
chore(ci): Migrate completely to GH Actions
2021-07-02 14:04:23 -05:00
Ed Page
5c92dc6f8c
chore(ci): Migrate post-release
2021-07-02 14:04:07 -05:00
Ed Page
cce1e2a538
chore: Remove stale file
2021-07-02 14:01:20 -05:00
Ed Page
2898cc6605
fix(docker): Ensure using latest version
2021-07-02 10:44:44 -05:00
Ed Page
56cf2e17b6
Merge pull request #306 from epage/dict
...
refactor(dict): Remove useless entries
2021-07-02 10:41:13 -05:00
Ed Page
a6ad5c0a0b
chore(ci): Fix codegen verify
2021-07-02 10:28:31 -05:00
Ed Page
7a2a5042a1
refactor(dict): Remove useless entries
2021-07-02 10:24:59 -05:00
Ed Page
c917ed845a
Merge pull request #305 from epage/test
...
test: Only run tests relevant for features
2021-07-01 19:49:14 -05:00
Ed Page
fb31288607
test: Only run tests relevant for features
2021-07-01 19:33:32 -05:00
Ed Page
ca1d06bf02
chore(gh): Migrate codegen checks
2021-07-01 19:32:36 -05:00
Ed Page
28002901c4
chore(gh): Fix toolchain versions
2021-07-01 16:23:13 -05:00
Ed Page
7f9602fbc4
chore(gh): Fix MSRV
2021-07-01 15:49:34 -05:00
Ed Page
4254f47a79
chore(gh): Automate Github
2021-07-01 15:45:56 -05:00
Ed Page
fc05aa9633
Merge pull request #303 from epage/phf
...
feat(dict): Shared PHF support
2021-07-01 11:55:03 -05:00
Ed Page
4c2f2c434a
feat(dict): Shared PHF support
2021-07-01 11:14:30 -05:00
Ed Page
3b43272724
refactor(dict): Separate dictgen concerns
2021-07-01 11:00:33 -05:00
Ed Page
97015b3a95
Merge pull request #302 from epage/trie
...
refactor(dict): Change typos-dict to trie
2021-07-01 10:59:59 -05:00
Ed Page
c8d1058a71
refactor(dict): Change typos-dict to trie
...
This is +/- 15%, depending on the benchmark.
2021-07-01 10:41:56 -05:00
Ed Page
fa1119aa47
Merge pull request #295 from epage/trie
...
perf(dict): Switch varcon to a burst-trie
2021-06-30 19:21:39 -07:00
Ed Page
bbbf985777
perf(dict): Switch varcon to a burst-trie
...
This cuts varcon lookup times in half but I still suspect slower than
phf. Like with bsearch and unlike, the cost is consistent between hits
and misses.
At least this doesn't have the compile hit of PHF + unicase. Maybe I
should experiment with integrating a non-const-fn variant of unicase
with PHF and give up on all of this extra complexity.
2021-06-30 21:03:57 -05:00
Ed Page
908f9d44eb
refactor(dict): Be more cache concious
2021-06-30 19:56:03 -05:00
Ed Page
f176055834
refactor(dict): Make room for trie logic
2021-06-30 19:56:03 -05:00
Ed Page
0e6d683ebe
test(dict): Bench more varcon cases
2021-06-30 19:56:00 -05:00
Ed Page
0144f4521f
Merge pull request #294 from epage/codegen
...
refactor(dict): Pull out table-lookup logic
2021-06-30 08:32:15 -07:00
Ed Page
a1e95bc7c0
refactor(dict): Pull out table-lookup logic
...
Before, only some dicts did we guarentee were pre-sorted. Now, all are
for-sure pre-sorted.
This also gives each dict the size-check to avoid lookup.
But this is really about refactoring in prep for playing with other
lookup options, like tries.
2021-06-30 10:12:17 -05:00
Ed Page
bfa7888f82
chore: Skip more releases
2021-06-29 15:39:28 -05:00
Ed Page
8f3f5b90ad
chore: Release
2021-06-29 15:34:25 -05:00
Ed Page
9149c4765d
chore: Release
2021-06-29 15:05:18 -05:00
Ed Page
effc21ed10
Merge pull request #293 from epage/parse
...
Detect non-identifiers to ignore
2021-06-29 15:03:56 -05:00
Ed Page
9a0d754862
docs(parser): Note new features
2021-06-29 14:43:05 -05:00
Ed Page
c83f655109
feat(parser): Ignore URLs
...
Fixes #288
2021-06-29 14:14:58 -05:00
Ed Page
b673b81146
fix(parser): Ensure we get full base64
...
We greedily matched separators, including ones that might be part of
base64. This impacts the length calculation, so we want as much as
possible.
2021-06-29 13:55:46 -05:00
Ed Page
6915d85c0b
feat(parser): Ignore emails
...
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).
This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
2021-06-29 13:42:27 -05:00
Ed Page
2a1e6ca0f6
feat(parser): Ignore base64
...
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).
Fixes #287
2021-06-29 13:25:10 -05:00
Ed Page
23b6ad5796
feat(parser): Ignore SHA-1+
...
Fixes #270
2021-06-29 12:20:08 -05:00
Ed Page
8566b31f7b
fix(parser): Go ahead and do lower UUIDs
...
I need this for hash support anyways
2021-06-29 12:13:21 -05:00
Ed Page
85082cdbb1
feat(parser): Ignore UUIDs
...
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase. RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case. The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
2021-06-29 12:11:50 -05:00
Ed Page
32f5e6c682
refactor(typos)!: Bake ignores into parser
...
This is prep for other items to be ignored
BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens. Related, we now ignore token-ignore config flags.
2021-06-29 11:41:25 -05:00
Ed Page
a46cc76bae
Merge pull request #292 from epage/unicode
...
perf(parser): Auto-detect unicode
2021-06-29 03:46:20 -07:00
Ed Page
ded90f2387
perf(parser): Auto-detect unicode
...
For smaller, ascii-only content, this seems to be taking ~30% less time
for parsing.
2021-06-29 05:28:17 -05:00
Ed Page
21231bfc4d
Merge pull request #291 from epage/parse
...
refactor(parser): Consolidate utf8/ascii logic
2021-06-29 03:27:39 -07:00
Ed Page
95417f3a41
refactor(parser): Consolidate utf8/ascii logic
2021-06-29 05:10:02 -05:00
Ed Page
3e5787c0e2
chore: Release
2021-06-28 14:56:13 -05:00
Ed Page
6ff1f15f56
docs: Update changelog
2021-06-28 14:53:43 -05:00
Ed Page
8382d3c9ae
Merge pull request #290 from epage/codegen
...
fix(ci): Don't fail codegen checks
2021-06-28 12:31:24 -07:00
Ed Page
83b2804623
fix(ci): Don't fail codegen checks
2021-06-28 14:06:47 -05:00
Ed Page
4066d21790
style: Address clippy
2021-06-28 13:51:06 -05:00
Ed Page
e01a34ad08
Merge pull request #289 from scop/feat/pre-commit-py
...
feat(pre-commit): add binary based install
2021-06-28 11:09:14 -07:00
Ville Skyttä
ef76d20c6a
fix(pre-commit): Update version in setup.py with release
2021-06-28 20:54:48 +03:00