this is some seriously thorough benchmarking work. the way you isolated each storage pattern and measured the overhead so precisely is impressive. benchmark #4 results are wild. mawk and nawk just completely falling apart on string concatenation while gawk barely breaks a sweat.
the "structure penalty" finding is the kind of thing you only learn from actually measuring it. 5x to 8x more memory just for splitting fields vs storing raw lines. easy to overlook until it blows up in production.
good stuff.