From 59c4713e8bec05eed6f9dfce8ecf95a84d066a5b Mon Sep 17 00:00:00 2001 From: Ed Page Date: Tue, 3 Jan 2023 07:06:02 -0600 Subject: [PATCH] docs(ref): Further clarify identifiers and words This supersedes #648 --- docs/design.md | 18 ++++++++++++++++++ docs/reference.md | 4 ++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/design.md b/docs/design.md index b85c5a1..8a25e77 100644 --- a/docs/design.md +++ b/docs/design.md @@ -32,3 +32,21 @@ Dictionary: A confidence rating is given for how close a word is to one in a dic - Sensitive to false positives due to hex numbers and c-escapes - Used in word processors and other traditional spell checking applications - Good when there is a UI to let the user know and override any decisions + +## Identifiers and Words + +With a focus on spell checking source code, most text will be in the form of +identifiers that are made up of words conjoined via `snake_case`, `CamelCase`, +etc. A typo at the word level might not be a typo as part of +an identifier, so identifiers get checked and, if not in a dictionary, will +then be split into words to be checked. + +Identifiers are defined using +[unicode's `XID_Continue`](https://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers) +which includes `[a-zA-Z0-9_]`. + +Words are split from identifiers on case changes as well as breaks in +`[a-zA-Z]` with a special case to handle acronyms. For example, +`First10HTMLTokens` would be split as `first`, `html`, `tokens`. + +To see this in action, run `typos --identifiers` or `typos --words`. diff --git a/docs/reference.md b/docs/reference.md index 84d1ea6..24e422a 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -27,7 +27,7 @@ Configuration is read from the following (in precedence order) | default.check-file | \- | bool | Verifying spelling in files. | | default.unicode | --unicode | bool | Allow unicode characters in identifiers (and not just ASCII) | | default.locale | --locale | en, en-us, en-gb, en-ca, en-au | English dialect to correct to. | -| default.extend-identifiers | \- | table of strings | Corrections for identifiers (as defined by [unicode's `XID_Continue`](https://www.unicode.org/reports/tr31/)). When the correction is blank, the identifier is never valid. When the correction is the key, the identifier is always valid. | -| default.extend-words | \- | table of strings | Corrections for words (split from identifiers). When the correction is blank, the word is never valid. When the correction is the key, the word is always valid. | +| default.extend-identifiers | \- | table of strings | Corrections for [identifiers](./design.md#identifiers-and-words). When the correction is blank, the identifier is never valid. When the correction is the key, the identifier is always valid. | +| default.extend-words | \- | table of strings | Corrections for [words](./design.md#identifiers-and-words). When the correction is blank, the word is never valid. When the correction is the key, the word is always valid. | | type.\.\ | \ | \ | See `default.` for child keys. Run with `--type-list` to see available ``s | | type.\.extend_globs | \- | list of strings | File globs for matching `` |