Well in theory they just track all events of the user and if you have starting even and the last tracked event you have a timeline.
you then stream these events via a listener/websocket to your endpoint and record it there.
this would be 1 solution :) the interesting part would be do they store the data and just apply it? or do they actually create the videos instantly?
both is possible but 1 is less storage.