Register for your free account! | Forgot your password?

Go Back   elitepvpers > Popular Games > Silkroad Online > SRO Coding Corner
You last visited: Today at 09:53

  • Please register to post and access all features, it's quick, easy and FREE!

Advertisement



[Poll] Reading data from text file or database?

Discussion on [Poll] Reading data from text file or database? within the SRO Coding Corner forum part of the Silkroad Online category.

View Poll Results: Reading data from text file or database
Text file 12 35.29%
Database 22 64.71%
Voters: 34. You may not vote on this poll

Reply
 
Old   #1
 
kevin_owner's Avatar
 
elite*gold: 0
Join Date: Jan 2010
Posts: 1,484
Received Thanks: 809
[Poll] Reading data from text file or database?

Hi,

I'm currently facing a little choice about how I should read the data of silkroad. So I hope that I can get a clear answer with this thread/poll.

Well as you might know I'm using C++ and currently I'm reading the data into the memory which is just a big list of for example npc position objects.

So yesterday I created a little test and I inserted all the npc positions into an MySQL database (version 5.1) and I used 10 different id's of npcs to get the positions. I runned this test 1000 times and the avarge times was 16 milli seconds for those 10 different id's.

for the list of the npc objects I used the linear search algorithm to make it easier and the result was 15 milliseconds for 10 id's.

I could use somethings like an selection search but it would still be a little problem with those id's cause they aren't unique.

so technically the text file should be faster than the database. But it would be easier to just use the database. but if you have something like an emulator you want speed.

Well what's your opinion?
kevin_owner is offline  
Old 03/31/2011, 15:13   #2
 
lesderid's Avatar
 
elite*gold: 0
Join Date: Dec 2007
Posts: 2,400
Received Thanks: 1,517
Doesn't really matter.
Just load all the IDs into a List (or array) when you start the server.
It will make the loading time slightly longer but it will be a huge performance win instead of searching in a database/file when some function needs a value. (like NPC chat ids)
lesderid is offline  
Old 03/31/2011, 15:24   #3
 
elite*gold: 0
Join Date: Nov 2007
Posts: 959
Received Thanks: 602
well,if I use c#,I prefer txt files because it's easier to handle ( for me )
dunno about c++,but I think you shouldn't load it into the database^^
vorosmihaly is offline  
Old 03/31/2011, 15:53   #4
 
Shadowz75's Avatar
 
elite*gold: 0
Join Date: Mar 2009
Posts: 443
Received Thanks: 597
Quote:
Originally Posted by kevin_owner View Post
Hi,

I'm currently facing a little choice about how I should read the data of silkroad. So I hope that I can get a clear answer with this thread/poll.

Well as you might know I'm using C++ and currently I'm reading the data into the memory which is just a big list of for example npc position objects.

So yesterday I created a little test and I inserted all the npc positions into an MySQL database (version 5.1) and I used 10 different id's of npcs to get the positions. I runned this test 1000 times and the avarge times was 16 milli seconds for those 10 different id's.

for the list of the npc objects I used the linear search algorithm to make it easier and the result was 15 milliseconds for 10 id's.

I could use somethings like an selection search but it would still be a little problem with those id's cause they aren't unique.

so technically the text file should be faster than the database. But it would be easier to just use the database. but if you have something like an emulator you want speed.

Well what's your opinion?
Use a database it makes your life much easier. And the little speed difference is not important
Shadowz75 is offline  
Old 03/31/2011, 18:22   #5
 
kevin_owner's Avatar
 
elite*gold: 0
Join Date: Jan 2010
Posts: 1,484
Received Thanks: 809
Thank you for your responses

So I can assume that everbody agrees with the following?
text files are faster but are a little harder to use than a database.
so text file = more speed
database = easier

btw intresting result in the poll since 5 people voted for database and only 1 for text file.
kevin_owner is offline  
Old 03/31/2011, 20:03   #6
 
npcdoom's Avatar
 
elite*gold: 0
Join Date: Jun 2009
Posts: 76
Received Thanks: 147
Well i prefer reading from textfiles, better in this case, but theres the unicode issue, but parsing the text files is pretty easy.
npcdoom is offline  
Old 03/31/2011, 20:29   #7
 
bootdisk's Avatar
 
elite*gold: 0
Join Date: Sep 2010
Posts: 134
Received Thanks: 41
Text files are tricky. Why don't you convert the data to something more convenient? I feel better when I read directly an struct from a binary file on disk.

For example, npc positions, why should it be in a database? that data isn't going to change. Parsers of strings waste cpu cycles.

I'd say, that you could try what I did (isn't the best option, right, but well, sharing improves your own stuff and so). When I have huge amounts of data to treat and not to kill computers I create 2 files, 1 holding ids + offsets in the second one and another with all the data together.
Then at runtime, I just load the one that holds ids and offsets, when I need something from an specific id, I just read from the second file at that specific offset.

To generate those 2 files I create a tool (nowadays I use Python the most for tools).
Well, hope you can imagine this kind of "system", and it's the base for most of the file systems used and in using by game developers.
bootdisk is offline  
Old 04/01/2011, 08:56   #8
 
kevin_owner's Avatar
 
elite*gold: 0
Join Date: Jan 2010
Posts: 1,484
Received Thanks: 809
Thank you for your answers

Well what I did with my text files I just converted them to ascii. (I hope this doesn't change too much since only the koreans chars are now ??)

and I stripped them a little bit down to remove the unused columns.

@bootdisk that is a nice way which I've never thought about thanks It sure is some memory saving way since you don't load them all in the memory and you're not using the unused ones

btw I made a stupid stupid mistake in my test. Because I ran it in debug mode.. once i've changed it to release the result was a lot different. So I ran the same test and it still took 16 milliseconds for the database and just 0 milli seconds for text files with and inefficient loop. So I changed the text file test too make it search for 1000 times more npcs and it took only 81 milliseconds.

So I guess that text files are way faster than a database.
kevin_owner is offline  
Old 04/01/2011, 09:38   #9
 
InvincibleNoOB's Avatar
 
elite*gold: 20
Join Date: Mar 2007
Posts: 4,277
Received Thanks: 2,990
Use Media.pk2 for the internal game data, just like sro_client does.
InvincibleNoOB is offline  
Old 04/01/2011, 16:28   #10

 
elite*gold: 260
Join Date: Aug 2008
Posts: 560
Received Thanks: 3,780
Quote:
Originally Posted by kevin_owner View Post
So I guess that text files are way faster than a database.
As a programmer, you have to learn to be more scientific with your research and analysis and come to more realistic conclusions rather than just take what you see on the surface and apply it to the entire subject.

Right now, it's like you are using a fork to cut a stick of butter in half and coming to the conclusion that a fork can cut as fast as a knife, simply because of one test. Cutting a stick of butter with a fork is as fast as using a knife simply because of how small sticks of butter are. However, you know that logic is flawed because if you had to cut a cake with a fork, it'd not be as fast or as clean as using a knife.

The same is true of comparing flat files to a database. You can't take one test's results and then come to a broad conclusion. You can only come to a conclusion in context of what you tested. This is because the test might have been flawed (as yours currently is) so you come to the wrong general conclusion and then create a system based on wrong results and that comes back to hurt you later on.

Now, to get to why your current test is flawed.

The first stage of the test would be opening the data for access. For a flat file, that is calling fopen/or member function open of an input stream. For a database, that's calling the correct connect function. You can benchmark these if you want, but the results are only important if you frequently perform the logic in your design. I.e., if it takes a couple of seconds to connect to a database, but you only need to connect once at startup, it's irrelevant compared to a flat file taking no time.

The second stage of the test would be loading the data to memory, since that's what you seem to want to be doing here. With a flat file, you just read the data into your linear array so it is ready for searching. However, for a database, you incorrectly benchmarked a different task. The concept of querying a database for data vs searching a linear array for it has nothing to do with flat file access vs. database access. It has to do with in memory access vs. database access. Obviously, with such a small set of data, memory searching is significantly faster than database searching simply by design. The correct thing to do would be to query the entire database table once and then load that into an array.

As you can see, if you actually do that, there is nothing to benchmark really because the solution to the problem is the same for both methods. That is, you load data from one storage medium (flat files / database) into memory and then search it. The time it takes to open/connect and then load/query can be measured, but the searching itself will be the same.

In the context of your original test, you can't really compare flat file searching to database searching because of how Window's file caching works. You'd always get skewed results since the file contents would always be in memory so access times are a lot faster than if the cache had to be flushed and the file reloaded each search. I came across that problem when writing one of my PK2 APIs and using memory mapped files, for example. It greatly changed the results of my tests!

Another reason you can't accurately benchmark loading a flat file into memory and compare it to a database, generally speaking, is because you will hit physical memory limitations with flat files that you won't with a database. Let's say you had 10gb of database data and 10gb of flat file data. Unless you actually had a 64bit system and a lot of ram, you could not run your current benchmark of loading a flat file to memory and searching it linearly vs querying a database. Even if you could, which do you think would win then? I'd be willing to put my money on a DB.

This is why you have to be more scientific in your benchmarks. You can't say "flat files are faster than a database" because that statement makes no sense. You can say, "given a small set of data, accessing data in memory is faster than querying a database each time" and that would be generally true according to your test results. However, just because you come to those results doesn't mean they are necessarily always true. The settings of the database do matter as well as the system running the tests.

Lastly, generally you use "flat files" to describe free standing files in a system. Text files are a specific type of flat files in which data is represented as, well, text. There are other formats you can use. Text files are not that efficient either when it comes to the type of data you are working with. Binary files would make for a better choice since you process it once, dump it to a file, then can load it straight to memory without any additional processing overhead like there is with text files. Note that there will be additional overhead with database and type conversions as well in some cases! You have to factor that in to a benchmark as well according to how the database access semantics you are using work.

For your project, if all you need to do is load static data into memory and then search by a 'key', then preprocessed binary flat files are the way to go. They will able to be loaded from disk to memory the fastest and require no additional processing. You do not have to do a linear search on them though. You can simply load into a vector or a list and then store iterators in a map that maps the id to the object iterator in the list/vector. That way, you have fast lookup times at only a small overhead cost of maintaining the map, a tradeoff well worth it not to have to perform a linear search each time.

A database is only useful in your case if you wanted to search by saying "I want a list of entity ids who are N units away from point X,Y,Z" or "I want a list of entity ids who are between a height of A and B". If you have no need for such data context specific queries, and the data is not going to change at run time, and you are working with a small set of data, there's no real benefit to using a database to load the data in this particular case.

Going back to what was said earlier about startup times, if all you do is load this data once, then the overhead of using any method is irrelevant really. Loading it from the PK2 as was mentioned is a viable solution as well. However, I'd prefer custom tools to process from the PK2 into your own useful formats simply because of the flexibility it grants you.

Anyways, keep these things in mind with anything you do. Don't focus on raw performance solely; as it is often meaningless in most contexts. People that get obsessed with performance in C++ end up costing themselves a lot of productivity and their projects suffer as a result. "The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” Get something working that you are comfortable with and profile later. Don't worry about having to recode something because of a design decision you made because it will happen regardless. That's just the way it works when you code first without a design.

Good luck with your project!
pushedx is offline  
Thanks
8 Users
Old 04/01/2011, 20:38   #11
 
Keyeight's Avatar
 
elite*gold: 844
Join Date: Oct 2010
Posts: 839
Received Thanks: 192
Quote:
Originally Posted by pushedx View Post
As a programmer, you have to learn to be more scientific with your research and analysis and come to more realistic conclusions rather than just take what you see on the surface and apply it to the entire subject.

Right now, it's like you are using a fork to cut a stick of butter in half and coming to the conclusion that a fork can cut as fast as a knife, simply because of one test. Cutting a stick of butter with a fork is as fast as using a knife simply because of how small sticks of butter are. However, you know that logic is flawed because if you had to cut a cake with a fork, it'd not be as fast or as clean as using a knife.

The same is true of comparing flat files to a database. You can't take one test's results and then come to a broad conclusion. You can only come to a conclusion in context of what you tested. This is because the test might have been flawed (as yours currently is) so you come to the wrong general conclusion and then create a system based on wrong results and that comes back to hurt you later on.

Now, to get to why your current test is flawed.

The first stage of the test would be opening the data for access. For a flat file, that is calling fopen/or member function open of an input stream. For a database, that's calling the correct connect function. You can benchmark these if you want, but the results are only important if you frequently perform the logic in your design. I.e., if it takes a couple of seconds to connect to a database, but you only need to connect once at startup, it's irrelevant compared to a flat file taking no time.

The second stage of the test would be loading the data to memory, since that's what you seem to want to be doing here. With a flat file, you just read the data into your linear array so it is ready for searching. However, for a database, you incorrectly benchmarked a different task. The concept of querying a database for data vs searching a linear array for it has nothing to do with flat file access vs. database access. It has to do with in memory access vs. database access. Obviously, with such a small set of data, memory searching is significantly faster than database searching simply by design. The correct thing to do would be to query the entire database table once and then load that into an array.

As you can see, if you actually do that, there is nothing to benchmark really because the solution to the problem is the same for both methods. That is, you load data from one storage medium (flat files / database) into memory and then search it. The time it takes to open/connect and then load/query can be measured, but the searching itself will be the same.

In the context of your original test, you can't really compare flat file searching to database searching because of how Window's file caching works. You'd always get skewed results since the file contents would always be in memory so access times are a lot faster than if the cache had to be flushed and the file reloaded each search. I came across that problem when writing one of my PK2 APIs and using memory mapped files, for example. It greatly changed the results of my tests!

Another reason you can't accurately benchmark loading a flat file into memory and compare it to a database, generally speaking, is because you will hit physical memory limitations with flat files that you won't with a database. Let's say you had 10gb of database data and 10gb of flat file data. Unless you actually had a 64bit system and a lot of ram, you could not run your current benchmark of loading a flat file to memory and searching it linearly vs querying a database. Even if you could, which do you think would win then? I'd be willing to put my money on a DB.

This is why you have to be more scientific in your benchmarks. You can't say "flat files are faster than a database" because that statement makes no sense. You can say, "given a small set of data, accessing data in memory is faster than querying a database each time" and that would be generally true according to your test results. However, just because you come to those results doesn't mean they are necessarily always true. The settings of the database do matter as well as the system running the tests.

Lastly, generally you use "flat files" to describe free standing files in a system. Text files are a specific type of flat files in which data is represented as, well, text. There are other formats you can use. Text files are not that efficient either when it comes to the type of data you are working with. Binary files would make for a better choice since you process it once, dump it to a file, then can load it straight to memory without any additional processing overhead like there is with text files. Note that there will be additional overhead with database and type conversions as well in some cases! You have to factor that in to a benchmark as well according to how the database access semantics you are using work.

For your project, if all you need to do is load static data into memory and then search by a 'key', then preprocessed binary flat files are the way to go. They will able to be loaded from disk to memory the fastest and require no additional processing. You do not have to do a linear search on them though. You can simply load into a vector or a list and then store iterators in a map that maps the id to the object iterator in the list/vector. That way, you have fast lookup times at only a small overhead cost of maintaining the map, a tradeoff well worth it not to have to perform a linear search each time.

A database is only useful in your case if you wanted to search by saying "I want a list of entity ids who are N units away from point X,Y,Z" or "I want a list of entity ids who are between a height of A and B". If you have no need for such data context specific queries, and the data is not going to change at run time, and you are working with a small set of data, there's no real benefit to using a database to load the data in this particular case.

Going back to what was said earlier about startup times, if all you do is load this data once, then the overhead of using any method is irrelevant really. Loading it from the PK2 as was mentioned is a viable solution as well. However, I'd prefer custom tools to process from the PK2 into your own useful formats simply because of the flexibility it grants you.

Anyways, keep these things in mind with anything you do. Don't focus on raw performance solely; as it is often meaningless in most contexts. People that get obsessed with performance in C++ end up costing themselves a lot of productivity and their projects suffer as a result. "The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” Get something working that you are comfortable with and profile later. Don't worry about having to recode something because of a design decision you made because it will happen regardless. That's just the way it works when you code first without a design.

Good luck with your project!

I have quoted half explain about silkroad security if you have objection to the piece I will cancel half the Commentary

answer me plz and thx alot for you explain
Keyeight is offline  
Old 04/01/2011, 23:53   #12
 
elite*gold: 0
Join Date: Nov 2008
Posts: 31
Received Thanks: 29
Can only add an excerpt from UNIX rules (Eric S. Raymond: The Art of Unix Programming). I have selected few that apply to your situation and underlined those that are indeed related.
  • Rule of Clarity: Clarity is better than cleverness.
  • Rule of Simplicity: Design for simplicity; add complexity only where you must.
  • Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
  • Rule of Diversity: Distrust all claims for "one true way".
dracek is offline  
Reply


Similar Threads Similar Threads
[Help]When reading/writing from a file
10/08/2010 - CO2 Private Server - 3 Replies
Hello guys...I have 1 or 2 question about reading and writing from a file. Reading: After you have read all the infos from the file you use filestream.close(); binaryreader.close I am wondering...if I insert a filstream.flush() (before filestream.close()) does it have any effect? or is it necessary in reading a file? (as the flush() void says it clears all buffers and causes any buffered data to be writen to the file) Writing:
Decryrting or reading Data.dat
07/17/2009 - Archlord - 9 Replies
Hello guys i need the texture from the Archlord Data.dat the normal Data.dat have i break but the animation texture i think is packed or cryptet like Skill Data.dat. Any one a idear how to read the data?
reading through a file
04/12/2009 - Silkroad Online - 2 Replies
when people post files on this site, other people always wanna know or it is safe but an antivirus check doesn't always work since your antivirus considers bots and hacks almost always as a virus /trojan and then people say, it's safe, i read through it but now i wonder how can you see the source of the program? how can i check it by myself because i got yesterday a big virus from a fake sibibot
my data.pak file
04/16/2008 - Dekaron - 7 Replies
ok, this is final, no trying to bargon me with anything. but tonight i will be uploading my data.pak file... it will be on a random website that you haft to sign up too, and it will be one i have made so NOONE without my permission will be able to put there hands on it =) but for ingame cash i will be selling my data.pak file 150mill per person. you give me the cash in full upfront then i send u a link to the file via email or P.M. on EPVP do not try to bargon me down in prices, 150 mill...
AutoIt text file open?
01/29/2008 - AutoIt - 2 Replies
Hi, bin noch recht neu im bezug scripten mit autoit, bzw. alg. . Mein Frage ist ob es irgentwie möglich ist auch Text.txt filex per Interface zu öfnen. Der geht wenn ich in der Form auf "button" open klicke... öfnet sich die l,.exe!



All times are GMT +1. The time now is 09:54.


Powered by vBulletin®
Copyright ©2000 - 2026, Jelsoft Enterprises Ltd.
SEO by vBSEO ©2011, Crawlability, Inc.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Support | Contact Us | FAQ | Advertising | Privacy Policy | Terms of Service | Abuse
Copyright ©2026 elitepvpers All Rights Reserved.