docs(ref): Further clarify identifiers and words

This supersedes #648
This commit is contained in:
Ed Page 2023-01-03 07:06:02 -06:00
parent 6773b4caa2
commit 59c4713e8b
2 changed files with 20 additions and 2 deletions

View file

@ -32,3 +32,21 @@ Dictionary: A confidence rating is given for how close a word is to one in a dic
- Sensitive to false positives due to hex numbers and c-escapes - Sensitive to false positives due to hex numbers and c-escapes
- Used in word processors and other traditional spell checking applications - Used in word processors and other traditional spell checking applications
- Good when there is a UI to let the user know and override any decisions - Good when there is a UI to let the user know and override any decisions
## Identifiers and Words
With a focus on spell checking source code, most text will be in the form of
identifiers that are made up of words conjoined via `snake_case`, `CamelCase`,
etc. A typo at the word level might not be a typo as part of
an identifier, so identifiers get checked and, if not in a dictionary, will
then be split into words to be checked.
Identifiers are defined using
[unicode's `XID_Continue`](https://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers)
which includes `[a-zA-Z0-9_]`.
Words are split from identifiers on case changes as well as breaks in
`[a-zA-Z]` with a special case to handle acronyms. For example,
`First10HTMLTokens` would be split as `first`, `html`, `tokens`.
To see this in action, run `typos --identifiers` or `typos --words`.

View file

@ -27,7 +27,7 @@ Configuration is read from the following (in precedence order)
| default.check-file | \- | bool | Verifying spelling in files. | | default.check-file | \- | bool | Verifying spelling in files. |
| default.unicode | --unicode | bool | Allow unicode characters in identifiers (and not just ASCII) | | default.unicode | --unicode | bool | Allow unicode characters in identifiers (and not just ASCII) |
| default.locale | --locale | en, en-us, en-gb, en-ca, en-au | English dialect to correct to. | | default.locale | --locale | en, en-us, en-gb, en-ca, en-au | English dialect to correct to. |
| default.extend-identifiers | \- | table of strings | Corrections for identifiers (as defined by [unicode's `XID_Continue`](https://www.unicode.org/reports/tr31/)). When the correction is blank, the identifier is never valid. When the correction is the key, the identifier is always valid. | | default.extend-identifiers | \- | table of strings | Corrections for [identifiers](./design.md#identifiers-and-words). When the correction is blank, the identifier is never valid. When the correction is the key, the identifier is always valid. |
| default.extend-words | \- | table of strings | Corrections for words (split from identifiers). When the correction is blank, the word is never valid. When the correction is the key, the word is always valid. | | default.extend-words | \- | table of strings | Corrections for [words](./design.md#identifiers-and-words). When the correction is blank, the word is never valid. When the correction is the key, the word is always valid. |
| type.\<name>.\<field> | \<varied> | \<varied> | See `default.` for child keys. Run with `--type-list` to see available `<name>`s | | type.\<name>.\<field> | \<varied> | \<varied> | See `default.` for child keys. Run with `--type-list` to see available `<name>`s |
| type.\<name>.extend_globs | \- | list of strings | File globs for matching `<name>` | | type.\<name>.extend_globs | \- | list of strings | File globs for matching `<name>` |