/QxS specifies generation of code for Penryn family CPUs only. There are a few cases where it is possible to vectorize effectively with /QxS but not /QxT, but I'm not surprised you didn't see a difference. /fast would set /O3 /Qipo /QxT. /QxT is unlikely to show any difference from /QxP or /QxO, which cover a wider range of CPUs.
The section on optimization in the Windows .chm help file recommends comparison of performance among default (/O2), /O1, /O3, and /fast. Depending on the application, any of those could prove best. In real applications, it may be necessary to set /Qprec-div /Qprec-sqrt /assume:protect_parens to avoid errors; the Penryn family processors are unlikely to lose performance with those options.