All functions measured by a line %timeit some_function(l) are measured incorrectly. A time in about hundreds of nanoseconds to get a list of 9 * 1000 numbers is a nonsense. You will see that the result doesn't depend on a number of items.
It is because any code after a yield command is postponed until you enumerate the output values. If you put a slow sleep command after a yield line then you will see no difference because it run never. Even a generator expression command %timeit (x for x in range(1000000000)) is similarly fast as the same for range(1).
A correct measurement is by %timeit list(some_function(l)) or %timeit for _ in some_function(l): pass
The fastest code for flattening is flatten_chain(l) because the expression itertools.chain(*lst) uses only a library function call and no Python loops.