You seem to have a basic misunderstanding of how normal Multiplayer games work and what in this instance is being referred to as “cloud”.
To simplify this somewhat let’s take Counter Strike and a bunch of players playing on a server.
There’s a Client and a Server and there are two basic steps that happen at the same time.
The Client does something (like fire his gun) or move around or changes state from standing to crouch or similar. The Client now sends this data as basic updated x/y/z coordinates and values to the Server.
At the same time the Server has received the last batch of updates from all connected Clients the same way and proceeds to send it out to everyone. This also includes if it has registered a weapon hit and how much damage has been done. The latency is effectively only your Ping time to the Server.
Most of the net code is also held rather simple (player character positions and states, hit/no hit and in some rare instances also bullet trajectories and speed or impact points for decals. You will not see many games that try and transfer physics, debris, particles or anything like that to players (even if the final representation is computed Client-side and you see these things), since depending on how many players are involved the amount of data and bandwidth required goes up exponentially. At best you will also see some position coordinates of a certain event having happened there and all the computations for debris and particle stuff usually happens Client-side and doesn’t look exactly the same for every player (which leads to synch issues between players, but they are usually minor and don’t impact gameplay).
What would happen in this “cloud” situation as proposed by Microsoft is different though.
First you have a level being rendered locally on your machine/console and let’s say you would want to offload something like physics or lighting to said Server.
What happens here is that first the Client has to send specific data connected to a certain event (in considerably larger amounts than simple positions or state changes) to the Server, for instance meshes for the involved objects (if you want to dent a barrel with an axe it needs the meshes for at least these two objects and the collision point and angle), it would also require material details or similar and possibly lighting data and other stuff like various maps used on the object. This is all being sent to the Server, the Server then has to process all that data, compute the event that just happened and send the results back to the Client and the Client basically has no other choice than to wait till the Server is done with whatever it is doing and sends its results in a similar data-intensive way (like a malformed mesh of said object(s) and whatever map changes might be required etc.) so it can update the rendering on your local machine.
At 60 Frames per second the entire process (both back and forth) would require a latency of 16.6ms at maximum in the totality of this process, even at 30 FPS this is still a stretch with 33.3ms and to be quite frank not really possible.
The process would potentially be even more complicated than “OnLive”, since in that case you are only sending control commands from your controller directly to the server instead of any game data and it does all the computing based on those commands Server-side, then a lossy/fast encoding process takes place and it sends back the picture to you.
At most Microsoft will offer "pretend tasks/jobs" outsourced to their servers like in the already oft-mentioned Sim City or even Diablo III to make it seem like what they are referring to as the "cloud" is needed, but offering resource, bandwidth and latency-intensive tasks for rendering over it isn't really feasible.