![]() So, in short: it doesn't wait for a decision to start reading the target address once the read of the low byte is in progress it lets that finish. The 8080 and z80 have probably allowed the second byte read to begin, and quite probably can't increment the PC without reading from it in general, so they follow through on that and then discard the entire target address. That'll account for decoding and making the decision and, because they're pipelined, the read of the first byte of the target address will have begun elsewhere.Īt that point I'd imagine the 8085 is smart enough that if the branch isn't going to be taken it can decline to read the second byte of the the target address and just increment the PC again. However, I think your confusion is because you imagine that not reading the address is somehow free but there's still the PC to increment.īoth branches of the processor family have a baseline two-cycle cost. Compare and contrast with the z80 where it's always three machine cycles and ten clock cycles, whether taken or not. JC takes two machine cycles and seven clock cycles if the condition isn't met but it takes three machine cycles and ten clock cycles if it is. In case there is some confusion over machine cycle and clock cycle, please feel free to answer in any.Īlthough its counterintuitive to those of us used to the 8080 or z80, checking this documentation confirms your belief. Should it be 1 machine cycle or 2 machine cycles? if condition doesn't satisfy: there should be no read cycles (after the fetch cycle) and so the number of machine cycles needed must be 1 ie the fetch cycle only.īut the material I am refering to to learn microprocessors says it will need 2 machine cycles but doesn't tell why and hence my confusion. of machine cycles needed = 1 (for fetch) + 2 (for reading the address which is 16 bits) = 3 If condition satisfies then the address is read else it is not.Ĭhecking whether the condition satisfies should not take any significant clock cycles since it is determined from the state of the flag bits. ![]() Opcode - 11CCC010 (where CCC is the state of the flag bit used to set the condition) 8 bits andĪddress for the jump - let's say a 16 bit address.įirst there is a fetch operation of the opcode so it needs 1 machine cycle. That's because when it correctly predicts the branch, it can start speculatively executing the next instruction using the predicted value of num, before the result of the compare is available (whereas in the MUL case, subsequent use of num will have a data dependency on the result of the MUL - it won't be able to execute until that result is retired).The machine code for the JMP instruction comprises of: On the other hand, if the branch can be well predicted and you immediately use num in a subsequent calculation, then it's possible for the branching version to perform better in the average case. If the branch cannot be predicted well, then the branch misprediction penalty will swamp other factors and the MUL version will perform significantly better. However, modern x86 processors are also pipelined, superscalar and out-of-order, which means that the performance depends on more than just this underlying cycle cost alone. Typically, on a modern x86 processor, both the CMP and the MUL instruction will occupy an integer execution unit for one cycle ( CMP is essentially a SUB that throws away the result and just modifies the flags register). ![]() If the actual clock cycles consumed is dependent on the machine, is a single MUL typically faster than the branching approach on most processors, since it requires fewer total instructions? is the number of clock cycles it takes to complete an instruction that same for any other instruction? It seems that a single MUL (4 total instructions) is more efficient than a CMP + JE (6 total instructions), but are clock cycles consumed equally for instructions - i.e. Method 2: MUL MOV EAX, 1 FOO = 1 here, but can be set to 0 Method 1: CMP + JE MOV EAX, 1 FOO = 1 here, but can be set to 0 What I'm actually comparing are the following two sequences of instructions: In C pseudocode: unsigned foo = 1 /* must be 0 or 1 */ĭon't look too deeply into the pseudocode, it's purely there to illuminate the mathematical logic behind the two methods. I'm curious about the theoretical difference in clock cycles consumed by a CMP + JE sequence versus a single MUL operation. ![]() I'm running an x86 processor, but I believe my question is pretty general.
0 Comments
Leave a Reply. |