Previously all the dictionary cleanup logic was in the function:
fn generate<W: std::io::Write>(file: &mut W, dict: &[u8])
which parsed the provided buffer as CSV and also took care of writing
the processed dictionary back as CSV. This commit factors out the CSV
handling, leaving a `process` function behind so that it can be easily
tested in the following commit.
This doesn't use `extend-exclude` which means that `typos typos.toml`
will stil be skipped
This doesn't just skip the currently loaded config but any file name
that looks like a config, which might be a big aggressive but allows us
to do layered config in the future.... We've been saying that for a
while.
Fixes#711
`.in` is typically used for build system template input files,
containing some placeholders to replace. In some cases, multiple rounds
of replacements are used, each with their own `.in`, so remove all
trailing instances of it before attempting a filename match.
Closes https://github.com/crate-ci/typos/issues/727
Two problems
- I thought we had a UTF-16 test but apparently we didn't
- I didn't read enough fine print in the `encoding_rs` API
These combined meant the last release completely broke UTF-16 support.
Typos primarily works off of identifiers and words. We have built-in
support to detect constructs that span identifiers that should not be
spell checked, like UUIDs, emails, domains, etc. This opens it up for
for user-defined identifier-spanning constructs using regexes via
`extend-ignore-re`.
This works differently than any of the previous ways of ignoring thing
because the regexes require extra parse passes. Under the assumption
that (1) actual typos are rare and (2) number of files relying on
`extend-ignore-re` are rare, we only do these extra parse passes when a
typo is found, causing almost no performance hit in the expected case.
While this could be used for more generic types of ignores, it isn't the
most maintainable because it is separate from the source files in
question. Ideally, we'd implement document settings / directives for
these cases (#316).
This opens the door for users to provide patterns for identifiers that
are always valid. The key limitation is "identifiers". Run `typos
--identifiers` to verify what you are trying to write the regex for.
Fixes#651
- `CLICOLOR=1` now works correctly
- `NO_COLOR=` now works correctly
- Auto-enable colors in CI
For running `typos` on the Linux kernel (176,210 typos to be printed), we went from 20.082s to
<20.450s. Where in that range is unclear due to jitter in my system.
```console
$ hyperfine -L typos ./typos-main,./typos-anstream "{typos} ../../../linux" -i
Benchmark 1: ./typos-main ../../../linux
Time (mean ± σ): 20.082 s ± 0.111 s [User: 39.668 s, System: 0.474 s]
Range (min … max): 19.961 s … 20.331 s 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: ./typos-anstream ../../../linux
Time (mean ± σ): 20.426 s ± 0.104 s [User: 40.301 s, System: 0.523 s]
Range (min … max): 20.316 s … 20.661 s 10 runs
Warning: Ignoring non-zero exit code.
Summary
'./typos-main ../../../linux' ran
1.02 ± 0.01 times faster than './typos-anstream ../../../linux'
$ CLICOLOR_FORCE=1 hyperfine -L typos ./typos-anstream "{typos} ../../../linux" -i
Benchmark 1: ./typos-anstream ../../../linux
Time (mean ± σ): 20.262 s ± 0.075 s [User: 39.961 s, System: 0.542 s]
Range (min … max): 20.154 s … 20.420 s 10 runs
Warning: Ignoring non-zero exit code.
$ CLICOLOR=0 hyperfine -L typos ./typos-anstream "{typos} ../../../linux" -i
Benchmark 1: ./typos-anstream ../../../linux
Time (mean ± σ): 20.296 s ± 0.065 s [User: 40.003 s, System: 0.565 s]
Range (min … max): 20.169 s … 20.383 s 10 runs
Warning: Ignoring non-zero exit code.
```
Using composite instead of docker to avoid building image,
it can make the action faster.
If the `typos` command doesn't exist, download and extract it.