BASM for Beginners


Home

BASM for Beginners 1

BASM for Beginners 1B

BASM for Beginners 1C

BASM for Beginners 2

BASM for Beginners 3

BASM for Beginners 4

BASM for Beginners 5

BASM for Beginners 6

BASM for Beginners 7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Word Version

BASM for beginners, lesson 1

This series of lessons is dedicated to teach beginners BASM.

Instead of doing a book with chapters about instructions, Calling conventions & code optimization etc., we will be discussing the issues as examples bring them into focus.

The first little example gets us started. It is a simple function in Pascal with multiplies an integer with the constant 2.

function MulInt2(I : Integer) : Integer;

begin
 Result := I * 2;
end;

Let us steal the BASM from the CPU view. I compiled with optimizations turned on.

function MulInt2_BASM(I : Integer) : Integer;

begin
 Result := I * 2;
 {
 add eax,eax
 ret
 }
end;

From this we see that I is transferred to the function in eax and that the result is transferred back to the caller in eax too. This is the conventions for the register calling convention which is the default in Delphi. The actual code is very simple. The times 2 multiplication is obtained by adding I to itself, I+I = 2I. The ret instruction returns execution to the line after the one which called the function.

Let us code the function as a pure asm function.

function MulInt2_BASM2(I : Integer) : Integer;

asm
 //Result := I * 2;
 add eax,eax
 //ret
end;

Observe that the ret function is supplied by the inline assembler.

Let us take a look at the calling code.

This is the Pascal code

procedure TForm1.Button1Click(Sender: TObject);

var

 I, J : Integer;

begin
 I := StrToInt(IEdit.Text);
 J := MulInt2_BASM2(I);
 JEdit.Text := IntToStr(J);
end;

The important line is

J := MulInt2_BASM2(I);

from the cpu view

call StrToInt

call MulInt2_BASM2

mov esi,eax

After the call to StrToInt from the line before the one which calls our function, I is in eax. (StrToInt is also following the register calling convention). MulInt2_BASM2 is called and returns the result in eax which is copied to esi in the next line.

Optimization issues: Multiplication by 2 can be done in two more ways. Using the mul instruction or shifting left by one. In the IA32 Intel Architecture Software Developer’s Manual Volume 2 – Instruction Set Reference page 536 mul is described. It multiplies the value in eax by the value held in another register and the result is returned in the register pair edx:eax. A register pair is needed because a multiplication of two 32 bit numbers results in a 64 bit result, just like 9*9=81 - two one digit numbers (can) result in a two digit result.

Because edx gets into use the issue of which registers must be preserved by a function and which can be used freely is raised. This is explained in the Delphi help.

"An asm statement must preserve the EDI, ESI, ESP, EBP, and EBX registers, but can freely modify the EAX, ECX, and EDX registers."

We can conclude that it is no problem that edx is modified by the mul instruction and our function can also be implemented like this.

function MulInt2_BASM3(I : Integer) : Integer;

asm
 //Result := I * 2;
 mov ecx, 2
 mul ecx
end;

ecx is used also but this is also ok. As long as the result is less than the range of integer it is returned correctly in eax. If I is bigger than half the range of integer, overflow will occur and the result is incorrect.

Implementation with shift

function MulInt2_BASM4(I : Integer) : Integer;

asm
 //Result := I * 2;
 shl eax,1
end;

Timing can reveal which implementation is fastest. We can also consult Intel and AMD documents with latency and throughput tables. On P4 add & mov are 0.5 clock cycles latency and throughput, mul is 14-18 cycles latency and 5 cycles throughput. Shl is 4 clock cycles latency and 1 cycle throughput. The version chosen by Delphi is the most efficient on P4 and this will probably also be the case on Athlon and P3.

Issues not covered: mul versus imul and range checking, other calling conventions, benchmarking, clock count on other processors, clock count for call + ret, location of return address for ret etc. Later lessons will introduce these subjects.

 

Dennis Kjaer Christensen, Denmark