superlint/lib/functions/experimental-batch-workers/README.md
Kin Fai Tse c3ac3aa5d9
Batched & parallel support for cfn-lint, eslint, gitleaks (#4088)
* faster linter for cfn-lint and eslint

* workaround shfmt error

* fix xargs interleave large outputs

* parallel gitleaks

* fix exec bit, shfmt, bash linter

* show parallel --citation

* refactor a common interface using named pipe

* add readme for the experimental impl

* fix readme format

* minimize change in worker.sh

* will cite, showed once

* remove junk comment

* explicitly set EXPERIMENTAL_BATCH_WORKER=false

* fix: errors from github/super-linter:v5
2023-10-31 00:13:33 +00:00

2.7 KiB

Parallel / Batched Workers

Running linters in parallel, and if possible in batch to speed up the linting process.

This is an experimental feature, and is not enabled by default, but it is really fast if you enable it and have the linter support implemented.

Since it is a parallel version, it might not be possible to reproduce a line-by-line match of serial output.

In order to maximize compatibility to programs using output of super-linter, the following are guarenteed:

  • Every linter error reported in serial version is reported by parallel version 1;
  • Super-linter log-level WARN and above that appears in serial version should appear in this version;
  • Super-linter log-level INFO and above do not interleave between linter output;
  • Failed file count logged at the end of super-linter matches serial version;

Motivation

Some linter might have a high startup cost, e.g.

  • eslint with some popular frontend framework plugins requires reading thousands of js files to init
  • cfn-lint which requires reading the whole cloudformation spec to run

A lot of linter supports linting multiple files per invocation, i.e. <linter-name> file1 file2 file3 ..., which can be leveraged to reduce the startup overhead.

Modern CI/CD might be on a multi-core machine, so running multiple linters in parallel can also speed up linting process, shorten the time taken from push to deploy.

Shift-left paradigm encourages running linters in the IDE, for example in .githooks/pre-commit, linting need to be fast for good Developer experience.

Supported linters

Linter Batch Parallel Notes
cfn-lint o o
ESLint o o
gitleaks o Batch unsupported by linter

Architecture

By setting EXPERIMENTAL_BATCH_WORKER=true, the following code path will be enabled:

# ../worker.sh
LintCodebase
  # ./${LinterName}.sh
  # TASK: Modify linter command for batch, parallelization and batching parameters suitable for the linter
  ParallelLintCodebase${LinterName}
    # ./base.sh
    # gnu parallel run
    ParallelLintCodebaseImpl
      # ./${LinterName}.sh
      # TASK: see ./base.sh
      LintCodebase${LinterName}StdoutParser
      # ./${LinterName}.sh
      # TASK: see ./base.sh
      LintCodebase${LinterName}StderrParser
      # ./base.sh if the default works for you
      LintCodebaseBaseStderrParser

  1. Statistics are almost impossible to reproduce, e.g. was always 1 file linted, K errors but now M files linted, K errors, I guess it is fine as the stat for linting 1 file produced by linter is not very useful. ↩︎