BASM for beginners, lesson 1C
This is the third lesson in the BASM for beginner’s series. The first two
lessons introduced a number of integer instructions and this lesson will
continue doing that. The example function is this
function StrLen1(const Str: PChar) : Cardinal;
begin
Result := 0;
while (Str[Result] <> #0) do
Inc(Result);
end;
Its functionality is the same as the RTL function of the same name. It
searches a PChar string for the zero terminator and returns the length of
the string. As usual we copy the assembler code generated by the compiler
from the cpu view.
function StrLen2(const Str: PChar) : Cardinal;
begin
Result := 0;
{
xor edx,edx
jmp +$01
}
while (Str[Result] <> #0) do
{
inc edx
cmp byte ptr [eax+edx],$00
jnz -$07
}
Inc(Result);
{
}
{
mov eax,edx
ret
}
end;
At first this code listing looks confusing. This is because the optimizer
relocated code. An example of this is the inc edx instruction that
increments the Result variable. It is not located under the Pascal line
Inc(Result) where we expected it to be.
Let us go through the code line by line and see what it does. The first
line looks like this.
Result := 0;
{
xor edx,edx
It clears the edx register. All bits are set to zero. Edx is allocated
for the result variable.
The result of a function is returned in the eax register if it is an
integer value. Therefore we would expect eax being allocated for the Result
variable, but eax is in use for the input parameter Str. The compiler
allocated edx, which can be used freely, for Result temporarily and just
before the functions exits it is copied into eax by the line
{
mov eax,edx
ret
}
end;
The second line of assembler is a jump instruction that jumps 1 byte
forward.
jmp +$01
}
This way the inc edx instruction that increments Result by one is
bypassed.
while (Str[Result] <> #0) do
{
inc edx
The inc edx instruction is a result of the Pascal code Inc(Result); and
it would have looked like this if the compiler did not relocate it.
Inc(Result);
{
inc edx
}
The while loop is compiled into three lines of assembler code of which
the inc edx line is the loop body and the two remaining lines are the loop
control code.
while (Str[Result] <> #0) do
{
inc edx
cmp byte ptr [eax+edx],$00
jnz -$07
}
The line
cmp byte ptr [eax+edx],$00
compares a byte of the PChar string with zero. The Pascal code
Str[Result] is generating this code
byte ptr [eax+edx]
Eax is a pointer to the beginning of the PChar and it is what the
function received as the Str parameter. Edx is the Result variable. In the
first loop iteration it is zero and the first character of the string is
compared to the immediate value $00, which is simply a complicated way of
writing 0. Because we only want to compare one character to zero at a time
it is necessary to express that the [eax+edx] pointer should be understood
as a pointer to a byte. The byte ptr code does this. A compare instruction
sets the flags in the EFLAGS register according to the result of the
compare. The jump instruction
jnz -$07
}
tests the zero flag and jumps 7 bytes back if the flag is not zero. Jnz
stands for Jump Not Zero. If the pointer [eax+edx] is not pointing at a zero
terminator the loop is iterated once more.
If we want to translate the function into a pure BASM function we have to
investigate where the two jumps are jumping to. This can be done by tracing
through the code with the cpu view open. We also saw earlier that the first
jump bypassed the one byte instruction inc edx. Therefore we need a label
right after this line. Because I had a day where my fantasy was sleeping I
simply named it L1 for Label 1 ;-) It is also possible to use our
understanding of the code to realize that the last jump jumps to the start
of the loop and the start of the loop is just before the single loop body
instruction inc edx. Then the function looks like this.
function StrLen3(const Str: PChar) : Cardinal;
asm
//Result := 0;
xor edx,edx
jmp @L1
//while (Str[Result] <> #0) do
//Inc(Result);
@LoopStart :
inc edx
@L1 :
cmp byte ptr [eax+edx],$00
jnz @LoopStart
mov eax,edx
//ret
end;
We can make it look a little nicer by writing a zero the simple way and
by removing the outcommented ret instruction.
function StrLen4(const Str: PChar) : Cardinal;
asm
//Result := 0;
xor edx,edx
jmp @L1
//while (Str[Result] <> #0) do
//Inc(Result);
@LoopStart :
inc edx
@L1 :
cmp byte ptr [eax+edx], 0
jnz @LoopStart
mov eax,edx
end;
The compare instruction works on 1 byte of data at a time and this issue
can be investigated a little further by rewriting this line
cmp byte ptr [eax+edx], 0
to those two lines
mov cl, byte ptr [eax+edx]
cmp cl, 0
Step through the function with the CPU view open and watch how the lowest
byte of the ecx register holds the ASCII value of the character from the
string under inspection.
The line
cmp cl, 0
can be coded as
test cl, cl
This is the simplest form of a peephole optimization. Changing one
instruction with another that performs the same logic.
Another one is this
xor edx,edx
"optimized" into
mov edx, 0
The preferred way of zeroing a register on P4 is the first one as
described at page 103 of the Intel Pentium 4 and Intel Xeon Processor
Optimization Reference Manual. This is also true for other processors.
What new instructions did we learn? Xor, jmp, inc, cmp, test and jnz. We
also learned how to implement a loop and how to work with one byte of data
at a time. The peephole optimization technique was also introduced.
|