docs(ref): Further clarify identifiers and words

This supersedes #648
2024-11-22 09:01:04 -05:00 · 2023-01-03 07:06:02 -06:00 · 2023-01-03 07:06:02 -06:00 · 59c4713e8b
commit 59c4713e8b
parent 6773b4caa2
2 changed files with 20 additions and 2 deletions
--- a/docs/design.md
+++ b/docs/design.md
@ -32,3 +32,21 @@ Dictionary: A confidence rating is given for how close a word is to one in a dic
 - Sensitive to false positives due to hex numbers and c-escapes
 - Used in word processors and other traditional spell checking applications
 - Good when there is a UI to let the user know and override any decisions
 ## Identifiers and Words
 With a focus on spell checking source code, most text will be in the form of
 identifiers that are made up of words conjoined via `snake_case`, `CamelCase`,
 etc.  A typo at the word level might not be a typo as part of
 an identifier, so identifiers get checked and, if not in a dictionary, will
 then be split into words to be checked.
 Identifiers are defined using
 [unicode's `XID_Continue`](https://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers)
 which includes `[a-zA-Z0-9_]`.
 Words are split from identifiers on case changes as well as breaks in
 `[a-zA-Z]` with a special case to handle acronyms.  For example,
 `First10HTMLTokens` would be split as `first`, `html`, `tokens`.
 To see this in action, run `typos --identifiers` or `typos --words`.
--- a/docs/reference.md
+++ b/docs/reference.md
@ -27,7 +27,7 @@ Configuration is read from the following (in precedence order)
 | default.check-file     | \-                | bool   | Verifying spelling in files. |
 | default.unicode        | --unicode         | bool   | Allow unicode characters in identifiers (and not just ASCII) |
 | default.locale         | --locale          | en, en-us, en-gb, en-ca, en-au   | English dialect to correct to. |
-| default.extend-identifiers | \-            | table of strings | Corrections for identifiers (as defined by [unicode's `XID_Continue`](https://www.unicode.org/reports/tr31/)). When the correction is blank, the identifier is never valid. When the correction is the key, the identifier is always valid. |
+| default.extend-identifiers | \-            | table of strings | Corrections for [identifiers](./design.md#identifiers-and-words). When the correction is blank, the identifier is never valid. When the correction is the key, the identifier is always valid. |
-| default.extend-words       | \-            | table of strings | Corrections for words (split from identifiers). When the correction is blank, the word is never valid. When the correction is the key, the word is always valid. |
+| default.extend-words       | \-            | table of strings | Corrections for [words](./design.md#identifiers-and-words). When the correction is blank, the word is never valid. When the correction is the key, the word is always valid. |
 | type.\<name>.\<field>      | \<varied>     | \<varied>  | See `default.` for child keys.  Run with `--type-list` to see available `<name>`s |
 | type.\<name>.extend_globs  | \-            | list of strings  | File globs for matching `<name>` |