[Guide] ASM 101 - Introduction to the Assembly language

KraHen · 01/29/2014, 12:49

Hello, fellow epvpers.

In the last few weeks/months I have seen a steep rise on people who wanted to give reverse engineering a try, so I figured I help a few people with a basic introduction to the assembly language.

IMPROTANT! This is NOT a reverse engineering tutorial!!! These will be just a few assembly tutorials, then at the end I might show you some basic things you can do with OllyDBG. This series is intended to help you understand that nightmare of code which appears in Olly, then you apply this to Conquer Online

Update : (Thanks to a friend of mine )Do you need an editor to work in 8086 assembly and don`t like Notepad or Notepad++? Use Sublime_text. You will need to make it work.

If you find any errors, and I`m sure you will, probably multiple times, please PM me or post it here and I`ll fix it. Thanks you in advance.

This thread will be updated with more tutorials, of course.

So yeah, let`s get started!

Lesson 0 - Compiling your first 8086 assembly program and set up the dev enviroment

Spoiler

Okay, so the first thing you might ask : why 8086 assembly and what is it? 8086 is a very old variant of the assembly language, which runs on 16 bit processors. This is not the assembly you will find in your OllyDBG output, but this is easier to explain for me. If you understand 8086 assembly I assure you that the transition to MASM or NASM won`t be hard at all.

In order to get started, we need a linker, a compiler, and a debug environment. I`m sure you`re familiar with these terms if you are reading this, if not, please learn some basic C or C++, or even Pascal or something first.

In this series we will use the Turbo ASM compiler, linker and debugger (it is the same used by Turbo Pascal, and if I recall it right, Borland C). This is ANCIENT, but it`s much simpler to use than OllyDBG, and is more suited for our purposes here.

Please note : if you have a 64 bit operating system, you MUST use DOSBOX to run these programs!

Okay, so please download the TASM.zip file attached to this thread. All done? Great. Create a folder and put all three .exe files in it, then create a new file in it called FIRST.ASM Please note that this template will be used for almost all of our next tutorials.

The FIRST.ASM will be the simplest ASM program we can write currently :

PHP Code:


			
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared

data SEGMENT ; here we will declare variables later

data ENDS

code SEGMENT
start: ; this would be your int main() { return 0; } in C++
mov ax, data
mov ds, ax     ; these two lines set up the DS register to point to our variables. We will use it later.

mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this.
int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later

code ENDS
end start

Save this, and open a command prompt in the given folder (or navigate to it with DOSBOX). Then write the following stuff:

PHP Code:


			
TLINK /v FIRST.ASM
TASM /zi FIRST.OBJ
TD FIRST.EXE

This will tell our linket and compiler (tlink and tasm) to output the exe in such a way, that the source code can be seen in the turbo debugger. If we don``t use /v and /zi we would get the equivalent in machine code (not what we want for debugging).

Okay, in TD you just press F7 to get to the next instruction, keep pressing it until your program ends.

Congratulations! You just compiled your first assembly program which does absolutely nothing!

Lesson 1 - Basic arithmetic operations

Spoiler

Lesson 2 - Advanced arithmetic operations. Signed representations.

Spoiler

Lesson 3 - Loops, logical satements

Spoiler

Lesson 4 (really short one) - using arrays

Spoiler

Let`s use arrays! This will be a really short example, let`s dive right into the code because it`s easier to understand.

PHP Code:


			
ASSUME CS:CODE, DS:DATA ; makes the compiler aware of where our code is written and where our variables will be declared

data SEGMENT ; here we will declare variables later
byteExample db 5 ; declare a byte named byteExample with the value 5
byteArray db 1, 2, 3, 5, 6 ;declare a byte array [1,2,3,5,6]
arrayLength db 5 ; lets save our length as well

newArray db arrayLength dup(?) ; sets up an empty byte array with length arrayLength. Every value in the array will be ?, undefined.
data ENDS

code SEGMENT
start: ; this would be your int main() { return 0; } in C++
mov ax, data
mov ds, ax     ; these two lines set up the DS register to point to our variables. We will use it later.

mov cx, 0 ; set the counter to 0

mov si, 0  ; source index = 0
mov di, 0  ; destination index = 0

copyLoop:
cmp cx, arrayLength
jge endLoop
mov al, byte ptr byteArray[si]
mov byte ptr newArray[di], al
inc si
inc di
jmp copyLoop

endLoop:

mov ax, 4C00h ; We put 4C in the high bits of the AX register, and 00 (the return value 0) in the low bits of this.
int 21h ; We tell DOS to process our call to the 4C function - exit program, we will talk about this later

code ENDS
end start

Okay, so basically what this does, is copying one array to another. To see if it works for you, use View->Variables in Turbo Debugger to see the arrays.

What does byte ptr mean?

Byte ptr memoryAddress[5] means that we want the byte value stored at the memoryAddress address + 5. It`s much like a byte pointer in C++ or even C#, please look up on pointers if you don`t understand this, this is outside the scope of this series.

Using arrays fortunately is really easy. Now you can do all kinds of crazy stuff! Guess what : you know to manipulate strings as well now.

Lesson 5 - some interrupts - print "Hello World"

Spoiler

Okay, so what are interrupts? Interrupts are basically functions, data passed to the BIOS or to the operating system requesting some functionality, for example, print a character on the screen, open a file, get current directory, et cetera. We already worked with interrupts in the previous example, when we requested DOS to exit the program with the 4C interrupt with return code 0. (mov ax, 4C00h, int 21h). There are LOTS of interrupts, and it would be impossible for me to show you all of them. Thankfully, there`s this ancient little application called Norton Guide which has most of them. I will attach the program to the thread, you can download it and browse it yourself.

Okay, let`s see an example. Let`s use a DOS interrupt from the 21h (int 21h) library to put a single character on the screen.

PHP Code:


			
ASSUME CS:CODE, DS:DATA

data SEGMENT
helloString db 'Hello world!'
helloLength db 12
data ENDS

code SEGMENT
start:
mov ax, data
mov ds, ax ;remember, set up the data

mov ah, 02h ; 02h is 02 in hexa, which is what we need to put in AH in order to tell DOS we want to use the 02h interrupt - to print a character on the console

mov si, 0 ; let`s set the source index to 0

printLoop:
cmp si, helloLength
jge endLoop
mov dl, byte ptr helloString[si] ; put a character from helloString into the DL register - we need to put it here, this is what interrupt 02h from int 21h needs - check Norton Guide for further reference
int 21h ; Tell DOS we want to print on the screen
jmp printLoop ; continue looping
endLoop:

mov ax, 4C00h
int 21h ; yay, another interrupt!
code ENDS
end start

Okay, we iterated through a string and printed it. Let`s see another example.

PHP Code:


			
ASSUME CS:CODE, DS:DATA

data SEGMENT
anotherString db 'Hello world again!$' ; this string needs to be ended with $
data ENDS

code SEGMENT
start:
mov ax, data
mov ds, ax

lea dx, anotherString ; lea is a kind of mov instruction. Instead of the value, it puts the memory address of anotherString to DX.
mov ah, 09h ; Tell DOS we want to use the 09h function from the int 21h interrupt - print a string (ending with $)
int 21h ; Tell DOS to do stuff

mov ax, 4C00h
int 21h ; yay, another interrupt!
code ENDS
end start

Okay, as you can see, we printed a string again! How nice.

~~Y u k i~~ · 01/29/2014, 14:05

My brains :S This is worse than lisp.

KraHen · 01/29/2014, 14:14

Try Haskell.

Lateralus · 01/29/2014, 14:51

(((((Not even close to being worse than lisp.)))))()()()()

Not a bad tutorial. I certainly wouldn't start off learning x86 as an introduction to assembly though. I'd start off with a RISC architecture first for learning general assembly concepts (which is why they usually teach MIPS in school), then go to x86.

KraHen · 01/29/2014, 19:18

@Lateralus : Yeah, it would be a viable choice, but I`m not really familiar with either of them, and 8086 seems closer to 32-bit assembly as well. Also thank you.

Added lesson 2!

Spirited · 01/29/2014, 22:55

If I could recommend something, you can make and run assembly programs for windows using MASM and Textpad 7. I'll post a program that uses it, if I remember to get it off my college network drive tomorrow.

KraHen · 01/29/2014, 22:57

This code won`t run with a MASM compiler, nor will it run with a 8086 emulator. If someone wants to convert it to MASM though, it won`t be hard at all.

Spirited · 01/29/2014, 22:58

Quote:

Originally Posted by KraHen

This code won`t run with a MASM compiler, nor will it run with a 8086 emulator. If someone wants to convert it to MASM though, it won`t be hard at all.

Right, which is why I was offering to post a program for it. Should I not then? I don't want to ruin your little tutorial.

KraHen · 01/29/2014, 23:23

Quote:

Originally Posted by Spirited Fang

Right, which is why I was offering to post a program for it. Should I not then? I don't want to ruin your little tutorial.

You absolutely should, and welcome to do so.

Also added lesson 3 in the meantime!

Spirited · 01/30/2014, 00:20

Awesome! I'll do that tomorrow at the lab.
Btw, the lessons are looking really nice. Good job. Nothing I'll post will be at the level you're making lessons for, but hopefully it'll help for anyone interested in seeing it for Windows using MASM.

KraHen · 01/30/2014, 14:08

I`m eager to see it as well.

In the meantime, added a few more guides!

~~OverKill.~~ · 01/31/2014, 17:10

keep it up, nice tutorials buddy

Spirited · 01/31/2014, 17:37

Alright. Sorry, I forgot yesterday. Here's the make file for my little program:

Code:

# Declare & Initialize Constants:
EXECUTABLE = Lab1.exe		# The executable file.
ASSEMBLY = Lab1.asm		# The assembly file.
LINKER_INPUT = Lab1.ilk		# The linker input file.
PROJ_DEBUG = Lab1.pdb		# The project debugger file.
OBJECT_FILE = Lab1.obj		# The object file for assembly.
LIST_FILE = Lab1.lst		# The list file.


ALL: $(EXECUTABLE)


CLEAN:
	-@erase $(EXECUTABLE)
	-@erase $(LINKER_INPUT)
	-@erase $(PROJ_DEBUG)
	-@erase $(OBJECT_FILE)
	-@erase $(LIST_FILE)

	
$(ASSEMBLY):


$(OBJECT_FILE): $(ASSEMBLY)
	ml /c /coff /Zi $(ASSEMBLY)

		
# If the object file, executable, kernel, or io object has changed, 
# remake the executable file:
$(EXECUTABLE): $(OBJECT_FILE) 
	link /debug /subsystem:console /out:$(EXECUTABLE) \
		/entry:start $(OBJECT_FILE) KERNEL32.LIB IO.OBJ

And here's the assembly file:

Code:

.386

; The memory model:
.MODEL FLAT

; We don't know where this prototype is, but it's in the address space
; (NEAR32), and we're passing in a dword parameter.
ExitProcess PROTO NEAR32 stdcall, dwExitCode:dword

include io.h

cr EQU 0dh; cr = carriage return
lf EQU 0ah; lf = line feed

; Remember, memory is broken up into four types:
; Stack, data, code, and heap.
.STACK 4096

.DATA
szPrompt1 BYTE "Enter first number: ",0
szPrompt2 BYTE "Enter second number: ",0
szLabel1 BYTE "The sum is:",0
dwNumber1 DWORD ? 		; numbers to be added
dwNumber2 DWORD ?
szString BYTE 16 DUP(?) 	; input string for numbers
szSum BYTE 12 DUP(0) 		; sum in string form
szNewline BYTE cr,lf,0

.CODE
_start:
	output szPrompt1 	; prompt for the first number
	input szString, 16 	; input first number as ASCII
	atod szString 		; convert to integer (ASCII to decimal)
	mov dwNumber1, eax 	; and store in memory.
	output szPrompt2 	; repeat for second number
	input szString, 16 	; input second number as ASCII
	atod szString 		; convert to integer - always goes to eax
	mov dwNumber2, eax 	; and store in memory.
	mov eax, dwNumber1
	add eax, dwNumber2 	; add second number to first number
	dtoa szSum, eax 	; convert to ASCII
	output szLabel1 	; output label and results
	output szSum
	output szNewline
	INVOKE ExitProcess,0
	
PUBLIC _start
END

KraHen · 01/31/2014, 19:49

Quote:

include io.h

I wish we had this in my lab as well lol.

Spirited · 01/31/2014, 20:17

Quote:

Originally Posted by KraHen

I wish we had this in my lab as well lol.

The magic of the MASM linker.