
IMO, the current list of output actions that you make use of is sufficient. For the time being, I'd put more attention to the inputs to the first layer though, as those might have a crucial impact on any efficient training.
I haven't yet had the time to take a look at your source code but do you consider stuff like the HPs of the current unit, distance to each enemy unit, whether positioned on high / low ground, etc.?
Wrt the expirations of each command, I think you're on the right track when considering the duration of each command. So If I were you, I'd just not touch the unit for X frames after an issued command, where X varies per command type, e.g. 4 frames for AttackEnemyUnitInRange, 12 frames for retreat, etc.
Also I am not sure if you already do this, but perhaps it makes sense to only train the network once the whole battle has finished (or a timeout of Y frames has occurred) and for the fitness function then consider something like total frames passed (a lower count is better if our side has won; more is better if the enemy side has won), own units' remaining HPs; enemy units' remaining HPs and so on. Perhaps, you already do this?
Keep us updated on any progress. I'll be very interested to follow the project (you've already got the star on GitHub
