The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings. Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8. This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.
This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.
Fixes https://github.com/moby/moby/issues/37747
Signed-off-by: Samuel Karp <skarp@amazon.com>
(cherry picked from commit 1e8ef386279e2e28aff199047e798fad660efbdd)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 757650e8dcca87f95ba083a80639769a0b6ca1cc
Component: engine