You are here: Home > Fastcode project > Year 2004 Status

The Fastcode Project

Year 2004 status


The 2004 competition

Has ended.

JOHN O'HARROW IS THE WINNER.

Place Name Points
1 John O'Harrow 853
2 Dennis Christensen 627
3 Aleksandr Sharahov 233

Complete winners list 2004

The 2004 New Year speech

I wish all fastcoders and followers a happy new-year. Thanks for working on the project, coming with comments or just lurking. I need you to keep me on the right track. I believe that the Fastcode project has matured and become generally known in the Delphi community in the year 2004.

This year's winner of the competition is, as last year, John O’Harrow. He wins a Delphi 2005 professional sponsored by Borland. Congratulations to John and thanks to Borland. In the second place came, like last year, me. I win a Nexus memory manager. Congratulations to me ;-) and thanks to Nexus. In third place, like last year, came Aleksandr Sharahov and he wins a TChart license sponsored by Steema. Congratulations to Aleksandr and thanks to Steema.

I am very pleased that 3 of our functions, CompareText, Int64Div and FillChar made it into the Delphi 2005 RTL. This is the biggest success this year and a big milestone for the project. I feel confident that we will see more Fastcode functions in the next major Delphi release.

Another commercial use of Fastcode functions, and the first, is the inclusion of Fastcode Move and FillChar in the Nexus memory manager.

We had 16 challenges in 2004 and one of them, the AnsiStringReplace, was run in two parts. First as a blind challenge and then as a normal challenge. Eric Grange won the blind part very clearly and John O’Harrow won the normal part also very clearly. We achieved to speed up the RTL function by nearly 17 times, but we only support AnsiStrings. MBCS is pending.

The blog of Hallvard Vassbotn about the AnsiStringReplace challenge generated a lot of extra hits on the homepage. Thanks Hallvard.

Generally this year’s challenges were too easy. 5 of them are complex number functions and they are very simple. Trunc32 is also in the very easy category. LowerCase, GCD32, StrComp, RGBAToBGRA, RoundToEX, Int64Mul, Int64Div and CharPosEY are medium level.

AnsiStringReplace and AES are the most hardcore challenges and AES has not reached the maximum speed yet. I expect that we in 2005 make the fastest known AES core implementation by using MMX, SSE, SSE2 and SSE3 instructions.

I intend to raise the difficulty level of challenges next year, but we will still need some easy ones, because the complex number function library has to be completed. This will bring us at least 5 more easy challenges. In 2005 we must make some challenges that are hard and some that are as useful to Delphi programmers as possible and also some fun ones. Hopefully we can make some challenges that are all 3 things at the same time.

The memory manager challenge has this potential. Pierre Le Riche took initiative to start it and it is the first challenge where I have not made the benchmark and validation tool. The challenge is in its very first immature state and we will have to put a lot more work into it in 2005. Especially the validation part is weak. The benchmark part is much better and the memory manager entries are showing impressive results. I have a secret dream that the winner will make it into the next Delphi version. The present memory manager in Delphi is flawed and needs to be replaced and it is also very slow and its multithread scaling ability is poor.

I have not managed the project well in 2004. There have been a lot of delays in release of functions. I hope this will improve in 2005.

The weakest part of the project is the libraries. We are way behind compiling winner functions into easy to use libraries. Early in the year Dennis Lauritzen made a lot of library units while he was working on the Fastcode PC Benchmark program. We have 22 units with flat function calls but only one with CPU ID based function selection. I hope we will catch up in Q1 2005 on library unit compilation.

Thanks go to Andriy Gerasika and John O’Harrow for making some unofficial libraries. They are only called unofficial because their layout does not conform with the official Fastcode standards.

The majority of Delphi users are not concerned about the performance of compiled code, but fastcoders are and we are looking forward to have a native 64 bit compiler. The Quality Central report made by Will DeWitt Jr. got the highest number of votes by any report in the lifetime of QC. The report is open which means that Borland is “working” on it, but I think that no real investigations have started yet. The market for native 64 bit applications is not yet big enough in the view of Borland. Perhaps it is also true that the Borland compiler team is too small to handle such a big task as creating a new compiler is. I would personally like to see more compiler optimizations in the Win32 compiler. The inlining feature in Delphi 2005 is a great thing and I hope we will see more of this kind in the next major Delphi release. Especially the FP code that the compiler emits is performing badly, because it has no optimizations at all, except for constant folding.

I wish that Borland would give us a way to plug in a code optimizer module in the Delphi compiler. This way the Fastcode project could have a challenge on automated code optimization and Borland could have a more aggressively optimizing compiler for free.

Fastcoders also have some whishes regarding new instructions in coming CPU’s from AMD and Intel. We often find ourselves in a situation were lack of proper instructions make it necessary to make clumsy and slow code. In the AES encryption/decryption core functions access to bytes within dwords are needed, but the IA32 instructions sets only allows us to access the individual bytes of the low dword of a register. Of the 8 available registers only four have byte access. Byte access to EAX, EBX, ECX and EDX are possible in the lowest 16 bit as Al, AH etc. If we could access the upper 16 bit without the need of a shift then AES could be sped up dramatically. It should be mentioned that neither the MMX, SSE, SSE2 or SSE3 instructionset have proper shuffle instructions for bytes either. Such a simple thing as swapping the R and B bytes in a RGB pixel is not possible with one instruction. We could mention many more examples of short comings and missing instructions in the Intel architecture, but we will end here with a wish for a CMOVcc al, bl instruction.

I would say that 2004 was a good Fastcode year, but 2005 has the potential to be better. Thanks go to you all for an interesting year. I am looking forward to an even more interesting next year.

Happy new-year.

Regards
Dennis