Perhaps you were expecting the elements that influence the behavior differences between different flavors of PUSH should be intuitive and do not have branchy behavior.
Basically, the operand size of a given instruction (on a per opcode basis, vs. per mnemonic) can be opcode-specific. However, the organization of the instruction reference pages are either per-mnemonic or grouping several mnemonic with the same general processing capabilities. It is quite often that the operand size of a given instruction is associated with the default osize attribute of the current code segment but there are several nuances that deviates from that generality and making things a bit complicated.
If you look at the operation section, you'll see the somewhat complicated if/else structure illustrating the mode-specific behavior with a common pattern, the size granularity of data that are push/pop'ed either follow the default or at a different size via the use of Osize override prefix 66H. So in 64-bit segment, you have 8 byte (default) vs. 2 byte; in 32-bit segment you have 4 bytes (default) vs. 2 byte; in 16-bit segment, you have 2 byte (default) vs. 4 byte.
The immediate flavor of PUSH adds a small twist (the operand size of the instruction can be different from the number bytes that are encoded as a immediate) to these behavior so that PUSH and POP can work together with either 64-bit, 32-bit, or 16-bit segments.Since the number of bytes being encoded in a PUSH instruction can be smaller than the data size that gets push on the stack, Sign-extension is applied to make up the difference is size.
sjkuo