Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
982 views
in Technique[技术] by (71.8m points)

assembly - Performance of x86 rep instructions on modern (pipelined/superscalar) processors

I've been writing in x86 assembly lately (for fun) and was wondering whether or not rep prefixed string instructions actually have a performance edge on modern processors or if they're just implemented for back compatibility.

I can understand why Intel would have originally implemented the rep instructions back when processors only ran one instruction at a time, but is there a benefit to using them now?

With a loop that compiles to more instructions, there is more to fill up the pipeline and/or be issued out-of-order. Are modern processors built to optimize for these rep-prefixed instructions, or are rep instructions used so rarely in modern code that they're not important to the manufacturers?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There is a lot of space given to questions like this in both AMD and Intel's optimization guides. Validity of advice given in this area has a "half life" - different CPU generations behave differently, for example:

The Intel Architecture Optimization Manual gives performance comparison figures for various block copy techniques (including rep stosd) on Table 7-2. Relative Performance of Memory Copy Routines, pg. 7-37f., for different CPUs, and again what's fastest on one might not be fastest on others.

For many cases, recent x86 CPUs (which have the "string" SSE4.2 operations) can do string operations via the SIMD unit, see this investigation.

To follow up on all this (and/or keep yourself updated when things change again, inevitably), read Agner Fog's Optimization guides/blogs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...