In the last few weeks/months I have seen a steep rise on people who wanted to give reverse engineering a try, so I figured I help a few people with a basic introduction to the assembly language.
IMPROTANT! This is NOT a reverse engineering tutorial!!! These will be just a few assembly tutorials, then at the end I might show you some basic things you can do with OllyDBG. This series is intended to help you understand that nightmare of code which appears in Olly, then you apply this to Conquer Online
Update : (Thanks to a friend of mine )Do you need an editor to work in 8086 assembly and don`t like Notepad or Notepad++? Use Sublime_text. You will need to make it work.
If you find any errors, and I`m sure you will, probably multiple times, please PM me or post it here and I`ll fix it. Thanks you in advance.
This thread will be updated with more tutorials, of course.
So yeah, let`s get started!
Lesson 0 - Compiling your first 8086 assembly program and set up the dev enviroment
Okay, so the first thing you might ask : why 8086 assembly and what is it? 8086 is a very old variant of the assembly language, which runs on 16 bit processors. This is not the assembly you will find in your OllyDBG output, but this is easier to explain for me. If you understand 8086 assembly I assure you that the transition to MASM or NASM won`t be hard at all.
In order to get started, we need a linker, a compiler, and a debug environment. I`m sure you`re familiar with these terms if you are reading this, if not, please learn some basic C or C++, or even Pascal or something first.
In this series we will use the Turbo ASM compiler, linker and debugger (it is the same used by Turbo Pascal, and if I recall it right, Borland C). This is ANCIENT, but it`s much simpler to use than OllyDBG, and is more suited for our purposes here.
Please note : if you have a 64 bit operating system, you MUST use DOSBOX to run these programs!
Okay, so please download the TASM.zip file attached to this thread. All done? Great. Create a folder and put all three .exe files in it, then create a new file in it called FIRST.ASM Please note that this template will be used for almost all of our next tutorials.
The FIRST.ASM will be the simplest ASM program we can write currently :
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later
data ENDS
code SEGMENT
start: ; this would be your int main() { return 0; } in C++
mov ax, data
mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this.
int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS
end start
PHP Code:
TLINK /v FIRST.ASM
TASM /zi FIRST.OBJ
TD FIRST.EXE
Okay, in TD you just press F7 to get to the next instruction, keep pressing it until your program ends.
Congratulations! You just compiled your first assembly program which does absolutely nothing!
Lesson 1 - Basic arithmetic operations
In this tutorial, we will learn a bit about the CPU registers, and also about the basic arithmetic operations in 8086 assembly.
First of all, what are registers? Registers could be thought of as variables declared in the CPU, there is a fixed amount of them. In the case of 8086 assembly, the register length is 16 bits.
In the image above, we can see the CPU registers. Let`s explain what they are :
I will explain other registers as soon as we need to deal with them.
Okay, now we know what registers we are using, and we know that they are "variables" in the CPU. How can we use them? In this lesson, we will learn how to use the following operators : =, +=, -=, ++, -- from C or C++.
We have the following ASM commands to help us :
Okay! So let`s transcribe the following expression to be computed by our program :
(a+b)-(c+d)+e
We will also declare byte variables using the DB keyword. See the source!
In order to check if the program is correct, compile it like in Lesson 0. Then run it with Turbo Debugger (TD). When you`re inside TD, go to View->Registers. This will open up the registers window, where you can keep track of your registers. Please not though, that the registers` values are shown in HEXADECIMAL.
Congratulations, you just added numbers in ASM! Genius.
First of all, what are registers? Registers could be thought of as variables declared in the CPU, there is a fixed amount of them. In the case of 8086 assembly, the register length is 16 bits.
In the image above, we can see the CPU registers. Let`s explain what they are :
PHP Code:
AH(8 bits high BYTE of AX) and AL(8 bits low BYTE of AX) form together the AX(16 bits WORD) register - the accumulator register. We will use this frequently, it`s usually used with arithmetic operations and interrupts.
BH and BL form the BX register - see a pattern here? This is the base register. It is usually used as a pointer to the base of something, but we will use it for other stuff as well, of course.
CH and CL form the CX register - the counter register. Usually used as a counter for loops, or operations like bit shifting. We will use it for other purposes as well.
DH and DL form the DX register - This is the data register. It is also used as an extension of AX to 32 bits lots of times.
The FLAGS register : This is a 16 bit register, where the bits each represent a flag. We will use these later, I will only show you Carry Flag and Overflow flag for now. These are set where there is a carry (transport) digit in addition or subtraction, or an overflow occurs (ex. we try to increment 255 with 1 in the AL byte).
Okay, now we know what registers we are using, and we know that they are "variables" in the CPU. How can we use them? In this lesson, we will learn how to use the following operators : =, +=, -=, ++, -- from C or C++.
We have the following ASM commands to help us :
PHP Code:
mov ax, bx ; moves the content of b to a. Note that a must either be a pointer or a register.
inc ax ; incerements the value of ax by 1
dec ax ; decrements the value of ax by 1
add ax, bx ; performs the ax += bx operation. If the operation doesn`t fit in the first operand (this time ax), the Carry Flag and Overflow Flag is set to one
sub ax, bx ; performs the ax -= bx operation. If the operation doesn`t fit in the first operand, the Carry Flag and overflow flag is set to one.
(a+b)-(c+d)+e
We will also declare byte variables using the DB keyword. See the source!
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later
a db 5
b db 6
c db 8
d db 9
e db 15 ; we declare a,b,c,d,e as bytes
data ENDS
code SEGMENT
start: ; this would be your int main() { return 0; } in C++
mov ax, data
mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov al, a ; we put the value of a in AL
add al, b ; we add b to al, AL = a+b
; an overflow could have happened! In this case there was none, but just to be sure, lets use AL as AX and add the transport digit to AX. If you don`t have a clue what a transport digit is, please refer to textbooks from the 4th grade.
mov ah, 0 ; unsigned conversion - ax = al
adc ah, 0 ; ADC - Add with carry - adds 0 with the carry value to AX
; don`t forget! AX = AH | AL, they form AX TOGETHER!
mov bl, c
add bl, d
mov bh, 0
adc bh, 0 ; bx = c+d
sub ax, bx ; ax = ax+bx = (a+b)+(c+d)
mov bl, e ; bl = d
mov bx, 0 ; bx = d
add ax, bx ; ax = (a+b) - (c+d) + e
; The result of the operation is in the AX register
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this.
int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS
end start
Congratulations, you just added numbers in ASM! Genius.
Lesson 2 - Advanced arithmetic operations. Signed representations.
We`ve worked with unsigned values so far, although addition and subtraction works the same way on unsigned and signed values. In order to understand why is this so, we need to understand how numbers are represented in our registers.
Okay, so I think it`s pretty clear to you if you`re at this point that the computer is processing binary values, 1s and 0s. That immediately means that our CPU processes binary numbers. How are they represented? How to convert between bases?
Conversion between bases is really really easy. Take the value 13 for example. We will convert it to binary and hexadecimal.
Okay, now let`s try to convert from base 16 to base 10.
Okay! We see that numbers are represented in binary. So the value of the AL (8-bit) register for 13 (which is 1101 in binary) would be : 00001101. This works for positive numbers, but how do we deal with negative ones? We use their complementer code. The complementer code of k on n bits
would be : 2^n-k. So how would -3 look in complementer code on 8 bits? It is the same as the complementer code for 256-3, which is 253 : 11111101, or FD in hexadecimal.
Okay, I`m sure you guessed what`s the problem with this. If 253 and -3 have the same representational value, how the hell do we know if it is -3 or 253? That`s our job! Do we want to use FD as -3? Use it as a signed value! Do we want to use 253? Use it as unsigned value! It sucks, I know, but it isn`t rocket science.
Division, multiplication, conversion from byte to word
Let`s make this theory useful! Remember how I told you that the AX register is AH and AL in this order? That means, that when we want to convert AL to AX, we just put 0 in AH. Seems logical. FOR UNSIGNED VALUES. The first bit of the register we`re working with represents the sign of the number, 1 is negative, 0 is positive. If we set AH to 0, we set the first bit (the sign bit) to 0 as well! We lose the sign!
Fortunately, there is a nice command in the Assembly language which allows us to deal with this using signed values.
Now that we know this, let`s learn how to use multiplication and division!
Example of usage :
Funny note : If the result of an operation puts 256 in AL or any other 8 bit register, you will get a division by zero error. Why is that? This happens when dividing by zero on the CPU as well, 256 on a byte is theoretically infinite, the undefined value.
Okay, you can use multiplication, division, and also learned something about binary and perhaps other bases. You`re technically ready to work at BitDefender.
Okay, so I think it`s pretty clear to you if you`re at this point that the computer is processing binary values, 1s and 0s. That immediately means that our CPU processes binary numbers. How are they represented? How to convert between bases?
Conversion between bases is really really easy. Take the value 13 for example. We will convert it to binary and hexadecimal.
PHP Code:
13 to binary
In order to convert into binary, we use consequent division by 2.
13 / 2 = 6, remainder : 1
6 / 2 = 3, remainder : 0
3 / 2 = 1, remainder : 1
1 / 2 = 0, remainder : 1
Now we take the remainders in INVERSE order. Our representation will be : 1101
13 to hexadecimal. (Hexadecimal digits : 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F)
13 / 16 = 0, remainder : 13
13 in hexaxdecimal is D. So 13 in hexa is equal to D.
Let`s see 234 in hexadecimal.
234 / 16 = 14, remainder : 10
14 / 16 = 0, remainder : 14
Now we take the numbers in reverse order, in their hexadecimal values. 14 is E, 10 is A. So 234 in hexadecimal is : EA.
PHP Code:
Let`s convert AB3D to decimal.
AB3D[base10] = D*(16^0) + 3*(16^1) + B*(16^2) + A*(16^3) = 13*1 + 3*16 + 11*256 + 10*4096 = 13 + 48 + 2816 + 40960 = 43837
would be : 2^n-k. So how would -3 look in complementer code on 8 bits? It is the same as the complementer code for 256-3, which is 253 : 11111101, or FD in hexadecimal.
Okay, I`m sure you guessed what`s the problem with this. If 253 and -3 have the same representational value, how the hell do we know if it is -3 or 253? That`s our job! Do we want to use FD as -3? Use it as a signed value! Do we want to use 253? Use it as unsigned value! It sucks, I know, but it isn`t rocket science.
Division, multiplication, conversion from byte to word
Let`s make this theory useful! Remember how I told you that the AX register is AH and AL in this order? That means, that when we want to convert AL to AX, we just put 0 in AH. Seems logical. FOR UNSIGNED VALUES. The first bit of the register we`re working with represents the sign of the number, 1 is negative, 0 is positive. If we set AH to 0, we set the first bit (the sign bit) to 0 as well! We lose the sign!
Fortunately, there is a nice command in the Assembly language which allows us to deal with this using signed values.
PHP Code:
Convert byte to word : cbw
This command converts AL to AX. AX will have the value of AL.
SIGNED CONVERSION.
PHP Code:
mul bl ; multiplies the AX register with the value of the BL register (or a constant or a variable)
imul bl ; same thing, we use this when we work with SIGNED values
mul bx ; multiplies the DX:AX register with the value in the BX register
imul bx ; same thing for signed
div bl ; divides AX with the value in BL, or any other byte value will do. Puts the result in AL, and puts the remainder in AH
idiv bl ; same thing with signed representation
div bx ; divides DX:AX with the value in BX, or any other word value. Puts the result in AX, and puts the remainder in DX
idiv bx ; same thing with signed representation
PHP Code:
; We will compute : 15*100+20/10
mov ax, 15
mov bl, 100
mul bl ; now ax = 15*100
mov cx, ax ; let`s save the value of AX in CX for now
mov ax, 20
mov bl, 10
div bl
; Now AL = 20/10 and AH = 20%10
; we need only AL in this example
mov ah, 0 ; set ah to 0
add ax, cx ; The value of AX : (20/10)+(15*100)
Okay, you can use multiplication, division, and also learned something about binary and perhaps other bases. You`re technically ready to work at BitDefender.
Lesson 3 - Loops, logical satements
Okay, now that we can add, subtract, multiply and divide, we want to do these things multiple times since it is so awesome to do these repeatedly. In order to start with loops, we must understand comparisons first.
In the 8086 assembly language there are two operations which are specifically designed for such purposes, but you can use some other ones as well, you will see how. These two are :
PHP Code:
CMP AX, BX ; Compares AX to BX and sets the appropriate flags. Basically it is a [B]nondestructive[/B] subtraction. Basically subtracts BX from AX without actually changing any of the registers.
TEST AX, BX ; Compares AX to BX and sets the appropriate flags. It is a nondestructive [B]bitwise[/B] AND operation (more on bitwise operations later). Equivalent to the & binary operator in c++ (NOT &&!)
PHP Code:
ZF - zero flag - set if last operations resulted in 0
SF - sign flag - last operations sign bit - used when dealing with signed values only
OF - overflow flag - set to 1 if last operation resulted in an overflow for signed numbers
CF - carry flag - set to 1 if last operation resulted in an overflow for unsigned numbers
PF - parity flag - set to 1 if last operations result contains an odd number of 1 bits
Okay, so what does this mean? For instance :
PHP Code:
mov ax, 5
cmp ax, 5 ; Compares AX to 5 - they are equal
sub ax, 5 ; ZF set to 1, since the result is 0
mov ah, 255
add ah, 1 ; CF is set to 1, since an unsigned overflow occured
mov ah, 128
add ah, 1 ; OF is set to 1, since a signed overflow occured
; et cetera
Okay, so we know how to compare stuff. This is useful, these combined with the jumps are the ifs and loops of assembly language.
What are jumps? Basically, a jump operation sets the IP register to the address of the result label. Sounds confusing? The IP register is the address of the next operation. A jump basically does what it name suggest : jumps in the code. Let`s see a very basic example :
PHP Code:
myLoop: ; this is a label. We use it to pin parts of the code
mov ax, 5
jmp myLoop
Okay, so there are LOTS of jump statements in assembly. Let`s list them here :
PHP Code:
JMP - unconditional jump
JZ - jumps if zero flag is set
JNZ - jumps if zero flag is not set
JE - jumps if the two things in the CMP or TEST are equal
JNE - almost the same, jumps if they are NOT equal
Jumps for dealing with unsigned interpretations :
JA - jumps if above, basically a > operator
JAE - jumps if above or equal, >=
JB - jumps if below
JBE - jumps if below or equal
Jumps for dealing with signed interpretations :
JG - jumps if greater, >
JGE - jumps if greater or equal, >=
JL - jumps if less, <
JLE - jumps if less or equal, <=
Okay, so let`s see an example!
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later
data ENDS
code SEGMENT
start: ; this would be your int main() { return 0; } in C++
mov ax, data
mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov ax, 100 ; AX = 100
MyLoop:
cmp AX, 0
je EndLoop ; when AX == 0, end the loop
; else do this
sub AX, 10 ; AX -= 10
jmp MyLoop ; back to the loop
EndLoop:
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this.
int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS
end start
Okay! Feel free to try the other jumps.
Now you can do logical statements and loops in assembly. You can basically do anything in the world, and conquer Mars.
Lesson 4 (really short one) - using arrays
Let`s use arrays! This will be a really short example, let`s dive right into the code because it`s easier to understand.
Okay, so basically what this does, is copying one array to another. To see if it works for you, use View->Variables in Turbo Debugger to see the arrays.
What does byte ptr mean?
Byte ptr memoryAddress[5] means that we want the byte value stored at the memoryAddress address + 5. It`s much like a byte pointer in C++ or even C#, please look up on pointers if you don`t understand this, this is outside the scope of this series.
Using arrays fortunately is really easy. Now you can do all kinds of crazy stuff! Guess what : you know to manipulate strings as well now.
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later
byteExample db 5 ; declare a byte named byteExample with the value 5
byteArray db 1, 2, 3, 5, 6 ;declare a byte array [1,2,3,5,6]
arrayLength db 5 ; lets save our length as well
newArray db arrayLength dup(?) ; sets up an empty byte array with length arrayLength. Every value in the array will be ?, undefined.
data ENDS
code SEGMENT
start: ; this would be your int main() { return 0; } in C++
mov ax, data
mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov cx, 0 ; set the counter to 0
mov si, 0 ; source index = 0
mov di, 0 ; destination index = 0
copyLoop:
cmp cx, arrayLength
jge endLoop
mov al, byte ptr byteArray[si]
mov byte ptr newArray[di], al
inc si
inc di
jmp copyLoop
endLoop:
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this.
int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS
end start
What does byte ptr mean?
Byte ptr memoryAddress[5] means that we want the byte value stored at the memoryAddress address + 5. It`s much like a byte pointer in C++ or even C#, please look up on pointers if you don`t understand this, this is outside the scope of this series.
Using arrays fortunately is really easy. Now you can do all kinds of crazy stuff! Guess what : you know to manipulate strings as well now.
Lesson 5 - some interrupts - print "Hello World"
Okay, so what are interrupts? Interrupts are basically functions, data passed to the BIOS or to the operating system requesting some functionality, for example, print a character on the screen, open a file, get current directory, et cetera. We already worked with interrupts in the previous example, when we requested DOS to exit the program with the 4C interrupt with return code 0. (mov ax, 4C00h, int 21h). There are LOTS of interrupts, and it would be impossible for me to show you all of them. Thankfully, there`s this ancient little application called Norton Guide which has most of them. I will attach the program to the thread, you can download it and browse it yourself.
Okay, let`s see an example. Let`s use a DOS interrupt from the 21h (int 21h) library to put a single character on the screen.
Okay, we iterated through a string and printed it. Let`s see another example.
Okay, as you can see, we printed a string again! How nice.
Okay, let`s see an example. Let`s use a DOS interrupt from the 21h (int 21h) library to put a single character on the screen.
PHP Code:
ASSUME CS:CODE, DS:DATA
data SEGMENT
helloString db 'Hello world!'
helloLength db 12
data ENDS
code SEGMENT
start:
mov ax, data
mov ds, ax ;remember, set up the data
mov ah, 02h ; 02h is 02 in hexa, which is what we need to put in AH in order to tell DOS we want to use the 02h interrupt - to print a character on the console
mov si, 0 ; let`s set the source index to 0
printLoop:
cmp si, helloLength
jge endLoop
mov dl, byte ptr helloString[si] ; put a character from helloString into the DL register - we need to put it here, this is what interrupt 02h from int 21h needs - check Norton Guide for further reference
int 21h ; Tell DOS we want to print on the screen
jmp printLoop ; continue looping
endLoop:
mov ax, 4C00h
int 21h ; yay, another interrupt!
code ENDS
end start
PHP Code:
ASSUME CS:CODE, DS:DATA
data SEGMENT
anotherString db 'Hello world again!$' ; this string needs to be ended with $
data ENDS
code SEGMENT
start:
mov ax, data
mov ds, ax
lea dx, anotherString ; lea is a kind of mov instruction. Instead of the value, it puts the memory address of anotherString to DX.
mov ah, 09h ; Tell DOS we want to use the 09h function from the int 21h interrupt - print a string (ending with $)
int 21h ; Tell DOS to do stuff
mov ax, 4C00h
int 21h ; yay, another interrupt!
code ENDS
end start