
Developers behind the widely-used GNU Awk text processing utility today released Gawk 5.4.
Gawk 5.4 is now using the new MinRX regular expression matcher as the default regexp engine. The old regex and DFA engines remain available but the MinRX engine is now used by default. This new matcher was written by Mike Haertel who was the original developer behind GNU grep. The MinRX matcher is fully POSIX compliant unlike the existing GNU matchers.
Separately, Gawk 5.4 should be faster for reading regular disk input files. Gawk no longer checks for timeouts on such files and with large files it was found to be roughly 9% faster.
Gawk 5.4 also improves its MinGW Windows port to support UTF-8 encoded non-ASCII text. The Cygwin port of Gawk also now fully supports UTF-8.
Gawk 5.4 also alters the usage of persistent memory, support for multi-byte characters with the ordchr extension, POSIX 2024 spec handling changes, assertions in the C code are now enabled, improved BSD support, and a “–enable-o3” build option to use -O3 compiler optimizations when building Gawk. This is also the first release of Gawk with Arabic translations.
There is also updated manual/documentation to explicitly forbid ad hominem attacks on the mailing lists and to strongly discourage the discussion of proprietary software.
Lastly, Gawk 5.4 improves support for OpenVMS.

Downloads and more details on today’s Gawk 5.4 release via GNU.org.