BASM for Beginners

 

Word Version

BASM for beginners, lesson 1C

This is the third lesson in the BASM for beginner’s series. The first two lessons introduced a number of integer instructions and this lesson will continue doing that. The example function is this

function StrLen1(const Str: PChar) : Cardinal;

begin
Result := 0;
while (Str[Result] <> #0) do
Inc(Result);
end;

Its functionality is the same as the RTL function of the same name. It searches a PChar string for the zero terminator and returns the length of the string. As usual we copy the assembler code generated by the compiler from the cpu view.

function StrLen2(const Str: PChar) : Cardinal;

begin
Result := 0;
{
xor edx,edx
jmp +$01
}

while (Str[Result] <> #0) do

{
inc edx
cmp byte ptr [eax+edx],$00
jnz -$07
}

Inc(Result);

{
}
{
mov eax,edx
ret
}
end;

At first this code listing looks confusing. This is because the optimizer relocated code. An example of this is the inc edx instruction that increments the Result variable. It is not located under the Pascal line Inc(Result) where we expected it to be.

Let us go through the code line by line and see what it does. The first line looks like this.

Result := 0;

{
xor edx,edx

It clears the edx register. All bits are set to zero. Edx is allocated for the result variable.

The result of a function is returned in the eax register if it is an integer value. Therefore we would expect eax being allocated for the Result variable, but eax is in use for the input parameter Str. The compiler allocated edx, which can be used freely, for Result temporarily and just before the functions exits it is copied into eax by the line

{
mov eax,edx
ret
}
end;

The second line of assembler is a jump instruction that jumps 1 byte forward.

jmp +$01
}

This way the inc edx instruction that increments Result by one is bypassed.

while (Str[Result] <> #0) do

{
inc edx

The inc edx instruction is a result of the Pascal code Inc(Result); and it would have looked like this if the compiler did not relocate it.

Inc(Result);

{
inc edx
}

The while loop is compiled into three lines of assembler code of which the inc edx line is the loop body and the two remaining lines are the loop control code.

while (Str[Result] <> #0) do

{
inc edx
cmp byte ptr [eax+edx],$00
jnz -$07
}

The line

cmp byte ptr [eax+edx],$00

compares a byte of the PChar string with zero. The Pascal code Str[Result] is generating this code

byte ptr [eax+edx]

Eax is a pointer to the beginning of the PChar and it is what the function received as the Str parameter. Edx is the Result variable. In the first loop iteration it is zero and the first character of the string is compared to the immediate value $00, which is simply a complicated way of writing 0. Because we only want to compare one character to zero at a time it is necessary to express that the [eax+edx] pointer should be understood as a pointer to a byte. The byte ptr code does this. A compare instruction sets the flags in the EFLAGS register according to the result of the compare. The jump instruction

jnz -$07
}

tests the zero flag and jumps 7 bytes back if the flag is not zero. Jnz stands for Jump Not Zero. If the pointer [eax+edx] is not pointing at a zero terminator the loop is iterated once more.

If we want to translate the function into a pure BASM function we have to investigate where the two jumps are jumping to. This can be done by tracing through the code with the cpu view open. We also saw earlier that the first jump bypassed the one byte instruction inc edx. Therefore we need a label right after this line. Because I had a day where my fantasy was sleeping I simply named it L1 for Label 1 ;-) It is also possible to use our understanding of the code to realize that the last jump jumps to the start of the loop and the start of the loop is just before the single loop body instruction inc edx. Then the function looks like this.

function StrLen3(const Str: PChar) : Cardinal;

asm
//Result := 0;
xor edx,edx
jmp @L1
//while (Str[Result] <> #0) do
//Inc(Result);
@LoopStart :
inc edx
@L1 :
cmp byte ptr [eax+edx],$00
jnz @LoopStart
mov eax,edx
//ret
end;

We can make it look a little nicer by writing a zero the simple way and by removing the outcommented ret instruction.

function StrLen4(const Str: PChar) : Cardinal;

asm
//Result := 0;
xor edx,edx
jmp @L1
//while (Str[Result] <> #0) do
//Inc(Result);
@LoopStart :
inc edx
@L1 :
cmp byte ptr [eax+edx], 0
jnz @LoopStart
mov eax,edx
end;

The compare instruction works on 1 byte of data at a time and this issue can be investigated a little further by rewriting this line

cmp byte ptr [eax+edx], 0

to those two lines

mov cl, byte ptr [eax+edx]
cmp cl, 0

Step through the function with the CPU view open and watch how the lowest byte of the ecx register holds the ASCII value of the character from the string under inspection.

The line

cmp cl, 0

can be coded as

test cl, cl

This is the simplest form of a peephole optimization. Changing one instruction with another that performs the same logic.

Another one is this

xor edx,edx

"optimized" into

mov edx, 0

The preferred way of zeroing a register on P4 is the first one as described at page 103 of the Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual. This is also true for other processors.

What new instructions did we learn? Xor, jmp, inc, cmp, test and jnz. We also learned how to implement a loop and how to work with one byte of data at a time. The peephole optimization technique was also introduced.

 

 

Dennis Kjaer Christensen, Denmark