[Guide] ASM 101 - Introduction to the Assembly language
Hello, fellow epvpers.
In the last few weeks/months I have seen a steep rise on people who wanted to give reverse engineering a try, so I figured I help a few people with a basic introduction to the assembly language.
IMPROTANT! This is NOT a reverse engineering tutorial!!! These will be just a few assembly tutorials, then at the end I might show you some basic things you can do with OllyDBG. This series is intended to help you understand that nightmare of code which appears in Olly, then you apply this to Conquer Online
Update : (Thanks to a friend of mine )Do you need an editor to work in 8086 assembly and don`t like Notepad or Notepad++? Use Sublime_text. You will need to make it work.
If you find any errors, and I`m sure you will, probably multiple times, please PM me or post it here and I`ll fix it. Thanks you in advance.
This thread will be updated with more tutorials, of course.
So yeah, let`s get started!
Lesson 0 - Compiling your first 8086 assembly program and set up the dev enviroment
Okay, so the first thing you might ask : why 8086 assembly and what is it? 8086 is a very old variant of the assembly language, which runs on 16 bit processors. This is not the assembly you will find in your OllyDBG output, but this is easier to explain for me. If you understand 8086 assembly I assure you that the transition to MASM or NASM won`t be hard at all.
In order to get started, we need a linker, a compiler, and a debug environment. I`m sure you`re familiar with these terms if you are reading this, if not, please learn some basic C or C++, or even Pascal or something first.
In this series we will use the Turbo ASM compiler, linker and debugger (it is the same used by Turbo Pascal, and if I recall it right, Borland C). This is ANCIENT, but it`s much simpler to use than OllyDBG, and is more suited for our purposes here.
Please note : if you have a 64 bit operating system, you MUST use DOSBOX to run these programs!
Okay, so please download the TASM.zip file attached to this thread. All done? Great. Create a folder and put all three .exe files in it, then create a new file in it called FIRST.ASM Please note that this template will be used for almost all of our next tutorials.
The FIRST.ASM will be the simplest ASM program we can write currently :
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later
data ENDS
code SEGMENT start: ; this would be your int main() { return 0; } in C++ mov ax, data mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this. int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS end start
Save this, and open a command prompt in the given folder (or navigate to it with DOSBOX). Then write the following stuff:
This will tell our linket and compiler (tlink and tasm) to output the exe in such a way, that the source code can be seen in the turbo debugger. If we don``t use /v and /zi we would get the equivalent in machine code (not what we want for debugging).
Okay, in TD you just press F7 to get to the next instruction, keep pressing it until your program ends.
Congratulations! You just compiled your first assembly program which does absolutely nothing!
In this tutorial, we will learn a bit about the CPU registers, and also about the basic arithmetic operations in 8086 assembly.
First of all, what are registers? Registers could be thought of as variables declared in the CPU, there is a fixed amount of them. In the case of 8086 assembly, the register length is 16 bits.
In the image above, we can see the CPU registers. Let`s explain what they are :
PHP Code:
AH(8 bits high BYTE of AX) and AL(8 bits low BYTE of AX) form together the AX(16 bits WORD) register - the accumulator register. We will use this frequently, it`s usually used with arithmetic operations and interrupts.
BH and BL form the BX register - see a pattern here? This is the base register. It is usually used as a pointer to the base of something, but we will use it for other stuff as well, of course.
CH and CL form the CX register - the counter register. Usually used as a counter for loops, or operations like bit shifting. We will use it for other purposes as well.
DH and DL form the DX register - This is the data register. It is also used as an extension of AX to 32 bits lots of times.
The FLAGS register : This is a 16 bit register, where the bits each represent a flag. We will use these later, I will only show you Carry Flag and Overflow flag for now. These are set where there is a carry (transport) digit in addition or subtraction, or an overflow occurs (ex. we try to increment 255 with 1 in the AL byte).
I will explain other registers as soon as we need to deal with them.
Okay, now we know what registers we are using, and we know that they are "variables" in the CPU. How can we use them? In this lesson, we will learn how to use the following operators : =, +=, -=, ++, -- from C or C++.
We have the following ASM commands to help us :
PHP Code:
mov ax, bx ; moves the content of b to a. Note that a must either be a pointer or a register. inc ax ; incerements the value of ax by 1 dec ax ; decrements the value of ax by 1 add ax, bx ; performs the ax += bx operation. If the operation doesn`t fit in the first operand (this time ax), the Carry Flag and Overflow Flag is set to one sub ax, bx ; performs the ax -= bx operation. If the operation doesn`t fit in the first operand, the Carry Flag and overflow flag is set to one.
Okay! So let`s transcribe the following expression to be computed by our program :
(a+b)-(c+d)+e
We will also declare byte variables using the DB keyword. See the source!
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later a db 5 b db 6 c db 8 d db 9 e db 15 ; we declare a,b,c,d,e as bytes data ENDS
code SEGMENT start: ; this would be your int main() { return 0; } in C++ mov ax, data mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov al, a ; we put the value of a in AL add al, b ; we add b to al, AL = a+b ; an overflow could have happened! In this case there was none, but just to be sure, lets use AL as AX and add the transport digit to AX. If you don`t have a clue what a transport digit is, please refer to textbooks from the 4th grade.
mov ah, 0 ; unsigned conversion - ax = al adc ah, 0 ; ADC - Add with carry - adds 0 with the carry value to AX ; don`t forget! AX = AH | AL, they form AX TOGETHER!
mov bl, c add bl, d mov bh, 0 adc bh, 0 ; bx = c+d
sub ax, bx ; ax = ax+bx = (a+b)+(c+d) mov bl, e ; bl = d mov bx, 0 ; bx = d
add ax, bx ; ax = (a+b) - (c+d) + e
; The result of the operation is in the AX register
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this. int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS end start
In order to check if the program is correct, compile it like in Lesson 0. Then run it with Turbo Debugger (TD). When you`re inside TD, go to View->Registers. This will open up the registers window, where you can keep track of your registers. Please not though, that the registers` values are shown in HEXADECIMAL.
Congratulations, you just added numbers in ASM! Genius.
Lesson 2 - Advanced arithmetic operations. Signed representations.
We`ve worked with unsigned values so far, although addition and subtraction works the same way on unsigned and signed values. In order to understand why is this so, we need to understand how numbers are represented in our registers.
Okay, so I think it`s pretty clear to you if you`re at this point that the computer is processing binary values, 1s and 0s. That immediately means that our CPU processes binary numbers. How are they represented? How to convert between bases?
Conversion between bases is really really easy. Take the value 13 for example. We will convert it to binary and hexadecimal.
PHP Code:
13 to binary
In order to convert into binary, we use consequent division by 2.
Okay! We see that numbers are represented in binary. So the value of the AL (8-bit) register for 13 (which is 1101 in binary) would be : 00001101. This works for positive numbers, but how do we deal with negative ones? We use their complementer code. The complementer code of k on n bits
would be : 2^n-k. So how would -3 look in complementer code on 8 bits? It is the same as the complementer code for 256-3, which is 253 : 11111101, or FD in hexadecimal.
Okay, I`m sure you guessed what`s the problem with this. If 253 and -3 have the same representational value, how the hell do we know if it is -3 or 253? That`s our job! Do we want to use FD as -3? Use it as a signed value! Do we want to use 253? Use it as unsigned value! It sucks, I know, but it isn`t rocket science.
Division, multiplication, conversion from byte to word
Let`s make this theory useful! Remember how I told you that the AX register is AH and AL in this order? That means, that when we want to convert AL to AX, we just put 0 in AH. Seems logical. FOR UNSIGNED VALUES. The first bit of the register we`re working with represents the sign of the number, 1 is negative, 0 is positive. If we set AH to 0, we set the first bit (the sign bit) to 0 as well! We lose the sign!
Fortunately, there is a nice command in the Assembly language which allows us to deal with this using signed values.
PHP Code:
Convert byte to word : cbw
This command converts AL to AX. AX will have the value of AL.
SIGNED CONVERSION.
Now that we know this, let`s learn how to use multiplication and division!
PHP Code:
mul bl ; multiplies the AX register with the value of the BL register (or a constant or a variable) imul bl ; same thing, we use this when we work with SIGNED values
mul bx ; multiplies the DX:AX register with the value in the BX register imul bx ; same thing for signed
div bl ; divides AX with the value in BL, or any other byte value will do. Puts the result in AL, and puts the remainder in AH idiv bl ; same thing with signed representation
div bx ; divides DX:AX with the value in BX, or any other word value. Puts the result in AX, and puts the remainder in DX idiv bx ; same thing with signed representation
Example of usage :
PHP Code:
; We will compute : 15*100+20/10 mov ax, 15 mov bl, 100 mul bl ; now ax = 15*100
mov cx, ax ; let`s save the value of AX in CX for now
mov ax, 20 mov bl, 10 div bl
; Now AL = 20/10 and AH = 20%10 ; we need only AL in this example
mov ah, 0 ; set ah to 0 add ax, cx ; The value of AX : (20/10)+(15*100)
Funny note : If the result of an operation puts 256 in AL or any other 8 bit register, you will get a division by zero error. Why is that? This happens when dividing by zero on the CPU as well, 256 on a byte is theoretically infinite, the undefined value.
Okay, you can use multiplication, division, and also learned something about binary and perhaps other bases. You`re technically ready to work at BitDefender.
Okay, now that we can add, subtract, multiply and divide, we want to do these things multiple times since it is so awesome to do these repeatedly. In order to start with loops, we must understand comparisons first.
In the 8086 assembly language there are two operations which are specifically designed for such purposes, but you can use some other ones as well, you will see how. These two are :
PHP Code:
CMP AX, BX ; Compares AX to BX and sets the appropriate flags. Basically it is a [B]nondestructive[/B] subtraction. Basically subtracts BX from AX without actually changing any of the registers. TEST AX, BX ; Compares AX to BX and sets the appropriate flags. It is a nondestructive [B]bitwise[/B] AND operation (more on bitwise operations later). Equivalent to the & binary operator in c++ (NOT &&!)
Okay, so what flags are we talking about? Remember Lesson 1 about the registers? We talked about the flags register, which has some significant bits in it. The more important ones (for now) are:
PHP Code:
ZF - zero flag - set if last operations resulted in 0 SF - sign flag - last operations sign bit - used when dealing with signed values only OF - overflow flag - set to 1 if last operation resulted in an overflow for signed numbers CF - carry flag - set to 1 if last operation resulted in an overflow for unsigned numbers PF - parity flag - set to 1 if last operations result contains an odd number of 1 bits
There are a few more flags, but we will only use these for now, we`ll see the rest later.
Okay, so what does this mean? For instance :
PHP Code:
mov ax, 5 cmp ax, 5 ; Compares AX to 5 - they are equal sub ax, 5 ; ZF set to 1, since the result is 0 mov ah, 255 add ah, 1 ; CF is set to 1, since an unsigned overflow occured mov ah, 128 add ah, 1 ; OF is set to 1, since a signed overflow occured
; et cetera
You can see how this can be useful. NOTE : These flags are shown in the register panel in Turbo Debugger
Okay, so we know how to compare stuff. This is useful, these combined with the jumps are the ifs and loops of assembly language.
What are jumps? Basically, a jump operation sets the IP register to the address of the result label. Sounds confusing? The IP register is the address of the next operation. A jump basically does what it name suggest : jumps in the code. Let`s see a very basic example :
PHP Code:
myLoop: ; this is a label. We use it to pin parts of the code mov ax, 5 jmp myLoop
Run this in TD, and you will see that after jmp it jumps back to the myLoop label - an infinite loop. YES! That`s how loops are done in assembly!
Okay, so there are LOTS of jump statements in assembly. Let`s list them here :
PHP Code:
JMP - unconditional jump JZ - jumps if zero flag is set JNZ - jumps if zero flag is not set JE - jumps if the two things in the CMP or TEST are equal JNE - almost the same, jumps if they are NOT equal
Jumps for dealing with unsigned interpretations : JA - jumps if above, basically a > operator JAE - jumps if above or equal, >= JB - jumps if below JBE - jumps if below or equal
Jumps for dealing with signed interpretations : JG - jumps if greater, > JGE - jumps if greater or equal, >= JL - jumps if less, < JLE - jumps if less or equal, <=
Please note that in order to jumps to work, the label must be -127 to 128 bytes of distance from the current IP (so don`t put them far). There are far jumps as well, which can do this, but we will deal with those later.
Okay, so let`s see an example!
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later
data ENDS
code SEGMENT start: ; this would be your int main() { return 0; } in C++ mov ax, data mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov ax, 100 ; AX = 100
MyLoop: cmp AX, 0 je EndLoop ; when AX == 0, end the loop ; else do this sub AX, 10 ; AX -= 10 jmp MyLoop ; back to the loop
EndLoop:
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this. int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS end start
This little thing makes AX 100, and loops. In every step subtracts 10 from AX. If AX is 0, exits the loop. Not really good for anything, but you get the idea.
Okay! Feel free to try the other jumps.
Now you can do logical statements and loops in assembly. You can basically do anything in the world, and conquer Mars.
Let`s use arrays! This will be a really short example, let`s dive right into the code because it`s easier to understand.
PHP Code:
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared
data SEGMENT ; here we will declare variables later byteExample db 5 ; declare a byte named byteExample with the value 5 byteArray db 1, 2, 3, 5, 6 ;declare a byte array [1,2,3,5,6] arrayLength db 5 ; lets save our length as well
newArray db arrayLength dup(?) ; sets up an empty byte array with length arrayLength. Every value in the array will be ?, undefined. data ENDS
code SEGMENT start: ; this would be your int main() { return 0; } in C++ mov ax, data mov ds, ax ; these two lines set up the DS register to point to our variables. We will use it later.
mov cx, 0 ; set the counter to 0
mov si, 0 ; source index = 0 mov di, 0 ; destination index = 0
copyLoop: cmp cx, arrayLength jge endLoop mov al, byte ptr byteArray[si] mov byte ptr newArray[di], al inc si inc di jmp copyLoop
endLoop:
mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this. int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later
code ENDS end start
Okay, so basically what this does, is copying one array to another. To see if it works for you, use View->Variables in Turbo Debugger to see the arrays.
What does byte ptr mean?
Byte ptr memoryAddress[5] means that we want the byte value stored at the memoryAddress address + 5. It`s much like a byte pointer in C++ or even C#, please look up on pointers if you don`t understand this, this is outside the scope of this series.
Using arrays fortunately is really easy. Now you can do all kinds of crazy stuff! Guess what : you know to manipulate strings as well now.
Okay, so what are interrupts? Interrupts are basically functions, data passed to the BIOS or to the operating system requesting some functionality, for example, print a character on the screen, open a file, get current directory, et cetera. We already worked with interrupts in the previous example, when we requested DOS to exit the program with the 4C interrupt with return code 0. (mov ax, 4C00h, int 21h). There are LOTS of interrupts, and it would be impossible for me to show you all of them. Thankfully, there`s this ancient little application called Norton Guide which has most of them. I will attach the program to the thread, you can download it and browse it yourself.
Okay, let`s see an example. Let`s use a DOS interrupt from the 21h (int 21h) library to put a single character on the screen.
PHP Code:
ASSUME CS:CODE, DS:DATA
data SEGMENT helloString db 'Hello world!' helloLength db 12 data ENDS
code SEGMENT start: mov ax, data mov ds, ax ;remember, set up the data
mov ah, 02h ; 02h is 02 in hexa, which is what we need to put in AH in order to tell DOS we want to use the 02h interrupt - to print a character on the console
mov si, 0 ; let`s set the source index to 0
printLoop: cmp si, helloLength jge endLoop mov dl, byte ptr helloString[si] ; put a character from helloString into the DL register - we need to put it here, this is what interrupt 02h from int 21h needs - check Norton Guide for further reference int 21h ; Tell DOS we want to print on the screen jmp printLoop ; continue looping endLoop:
mov ax, 4C00h int 21h ; yay, another interrupt! code ENDS end start
Okay, we iterated through a string and printed it. Let`s see another example.
PHP Code:
ASSUME CS:CODE, DS:DATA
data SEGMENT anotherString db 'Hello world again!$' ; this string needs to be ended with $ data ENDS
code SEGMENT start: mov ax, data mov ds, ax
lea dx, anotherString ; lea is a kind of mov instruction. Instead of the value, it puts the memory address of anotherString to DX. mov ah, 09h ; Tell DOS we want to use the 09h function from the int 21h interrupt - print a string (ending with $) int 21h ; Tell DOS to do stuff
mov ax, 4C00h int 21h ; yay, another interrupt! code ENDS end start
Okay, as you can see, we printed a string again! How nice.
(((((Not even close to being worse than lisp.)))))()()()()
Not a bad tutorial. I certainly wouldn't start off learning x86 as an introduction to assembly though. I'd start off with a RISC architecture first for learning general assembly concepts (which is why they usually teach MIPS in school), then go to x86.
@Lateralus : Yeah, it would be a viable choice, but I`m not really familiar with either of them, and 8086 seems closer to 32-bit assembly as well. Also thank you.
If I could recommend something, you can make and run assembly programs for windows using MASM and Textpad 7. I'll post a program that uses it, if I remember to get it off my college network drive tomorrow.
This code won`t run with a MASM compiler, nor will it run with a 8086 emulator. If someone wants to convert it to MASM though, it won`t be hard at all.
This code won`t run with a MASM compiler, nor will it run with a 8086 emulator. If someone wants to convert it to MASM though, it won`t be hard at all.
Right, which is why I was offering to post a program for it. Should I not then? I don't want to ruin your little tutorial.
Awesome! I'll do that tomorrow at the lab.
Btw, the lessons are looking really nice. Good job. Nothing I'll post will be at the level you're making lessons for, but hopefully it'll help for anyone interested in seeing it for Windows using MASM.
Alright. Sorry, I forgot yesterday. Here's the make file for my little program:
Code:
# Declare & Initialize Constants:
EXECUTABLE = Lab1.exe # The executable file.
ASSEMBLY = Lab1.asm # The assembly file.
LINKER_INPUT = Lab1.ilk # The linker input file.
PROJ_DEBUG = Lab1.pdb # The project debugger file.
OBJECT_FILE = Lab1.obj # The object file for assembly.
LIST_FILE = Lab1.lst # The list file.
ALL: $(EXECUTABLE)
CLEAN:
-@erase $(EXECUTABLE)
-@erase $(LINKER_INPUT)
-@erase $(PROJ_DEBUG)
-@erase $(OBJECT_FILE)
-@erase $(LIST_FILE)
$(ASSEMBLY):
$(OBJECT_FILE): $(ASSEMBLY)
ml /c /coff /Zi $(ASSEMBLY)
# If the object file, executable, kernel, or io object has changed,
# remake the executable file:
$(EXECUTABLE): $(OBJECT_FILE)
link /debug /subsystem:console /out:$(EXECUTABLE) \
/entry:start $(OBJECT_FILE) KERNEL32.LIB IO.OBJ
And here's the assembly file:
Code:
.386
; The memory model:
.MODEL FLAT
; We don't know where this prototype is, but it's in the address space
; (NEAR32), and we're passing in a dword parameter.
ExitProcess PROTO NEAR32 stdcall, dwExitCode:dword
include io.h
cr EQU 0dh; cr = carriage return
lf EQU 0ah; lf = line feed
; Remember, memory is broken up into four types:
; Stack, data, code, and heap.
.STACK 4096
.DATA
szPrompt1 BYTE "Enter first number: ",0
szPrompt2 BYTE "Enter second number: ",0
szLabel1 BYTE "The sum is:",0
dwNumber1 DWORD ? ; numbers to be added
dwNumber2 DWORD ?
szString BYTE 16 DUP(?) ; input string for numbers
szSum BYTE 12 DUP(0) ; sum in string form
szNewline BYTE cr,lf,0
.CODE
_start:
output szPrompt1 ; prompt for the first number
input szString, 16 ; input first number as ASCII
atod szString ; convert to integer (ASCII to decimal)
mov dwNumber1, eax ; and store in memory.
output szPrompt2 ; repeat for second number
input szString, 16 ; input second number as ASCII
atod szString ; convert to integer - always goes to eax
mov dwNumber2, eax ; and store in memory.
mov eax, dwNumber1
add eax, dwNumber2 ; add second number to first number
dtoa szSum, eax ; convert to ASCII
output szLabel1 ; output label and results
output szSum
output szNewline
INVOKE ExitProcess,0
PUBLIC _start
END
[Guide] Assembly guides to improve your VSRO Server! 12/28/2021 - SRO PServer Guides & Releases - 79 Replies This Thread contains many Assembly edits which are useful if you run a Vsro P-Server.
Requirments:
Ollydbg (download here)
Dezimal -> Hex Converter (This or This) they are all the same.
Small tutorial how to use OllyDBG:
- if you start OllyDBG, simply drag the file you want to check in the OllyDBG window (in this case the SR_GameServer.exe of VSRO). Let Olly analyze the file completly (black bar at the bottom)
- The lines are working like this: Expression |. Binary | Assemble |...
[Guide] Introduction to DLL Modding 09/18/2018 - Mabinogi Hacks, Bots, Cheats & Exploits - 8 Replies Now in order to successfully mod a dll file you need to understand
exactly how it is map'd out IDA Pro witch can be found on thepiratebay
is great for something like this because it has a graph view that shows all
subroutines that a specific line may call upon or transfer/read data to and from. it is important that when modding to only pay attention to an address when your switching from IDA Pro to OllyDBG, olydbg allows for quick modification. Pay attention to the name of the function...
[Help]assembly language tutorials 06/12/2010 - General Coding - 1 Replies Can someone recommend me some assembly language tutorials, please?
[GUIDE]EO Class Introduction 11/05/2009 - Eudemons Online - 0 Replies Here is an introduction for each classes.
I hope this is useful for beginners;).
http://i590.photobucket.com/albums/ss347/PhoeNix4 Real/10_1.jpg
A powerfull class, that is good in fights.
They are good at magicranging, but not at melee.
They are maybe one of the mighest classes in EO.