Socket System & Threading Models

08/26/2014 08:18 Spirited#1
Hey everyone,

Recently, I was asked about optimal threading and socket system models for a game server; optimal in respect to efficiency and player load (or scale). Before I start, I should clarify that there are many threading and socket models that I will not address in this thread; I will only be addressing the fundamental threading and socket models. First, let's talk about threading. Recall that all program instructions are processed by the CPU in sequence, meaning one line at a time (pre-threading). If we visualized this process, we could imagine it as a single lane of traffic on a highway, where each car represents a line of code in the sequence of instructions, or thread.

Now, as you know, most highways don't have just a single lane of traffic, they have multiple where two or more lanes of traffic allow cars to travel. Well, a CPU works in a similar way, where it can process multiple threads (sequences of instructions) at the same time. This is called multi-threading. Just like how a highway doesn't have just one lane of traffic, a CPU doesn't have just one core processing instructions (most of the time). Just like how multiple lanes on a highway allows for more traffic, the amount of cores the CPU has determines the amount of threads that can be processed at the same time (where one core = one thread being processed, assuming no hyper-threading).

So, let's go into an example: let's say you have a single-core processor (1 core, where only one thread can be processed at once). Let's say you're rendering a video for youtube, and your CPU is near 100% usage. You might notice that though your computer might slow down, you can still do multiple things at once. How does this work? Well, let's go back to the highway example. Let's say you have a one lane highway that can only accept one line of cars at a time. As the first line of cars travel along the road, let's say a second line of cars from an on-ramp request access to the highway. Well, since the highway can only accept one line of cars at a time, we need to cut the two lines of cars up and merge the lines. Assuming the drivers are polite, a few cars at a time from each line (alternating) will merge into the final line of cars. In other words, the cars will merge into one lane of traffic that can travel on the highway. The CPU uses the same technique (usually handled by the operating system) to process multiple threads on one core, where sequences of instructions are cut up to be processed on one core. As you may imagine, this slows down the execution of instructions (just as it slows traffic on a highway).

Now, let's talk about threading in regards to applications. Let's say you're reading it data from hundreds of files. Well, you can speed up this process using multi-threading, where groups of files are given to each thread to read in. Using multiple threads, you can reduce the amount of time it takes to read in files by ~200% - 800%. So we should create as many threads as we can to process files, right? Well, let's think back to the traffic example. The more on-ramps (or multiple lines of cars) leading to the same highway, the more cars on the road and the slower traffic becomes. So to answer the question, more threads than the CPU can process is bad. So, how many threads are too many? It depends on the type of work, but for I/O and database processes (like socket systems and game servers), no more than four times the amount of cores available is best. Twice the amount of cores for packet receiving seems to be the best strategy (which leaves the remaining threads for sever maintenance and operating system threads).

Phew. Now let's talk about socket systems. Obviously, a one thread socket system isn't optimal. With a model like that, only one person can be connected at one time (blocking the socket system up). This type of model is a blocking socket system. Now, let's add a thread for accepting new players and a thread per player to receive data. At first, this seems like a great solution that won't block the socket system up (a non-blocking socket system), but after a few players, this server model will slow down due to the amount of threads requesting processing. When will it slow down? It depends on the amount of data packets requesting processing. If we're talking about Conquer Online, maybe around 25 - 75 active players (depending on the system, of course).

The next option, which is popular on these forums, is using an asynchronous model using the built in asynchronous calls in .NET. Depending on the configuration of the setup, an asynchronous model can make a strong, scalable socket system. The model first allows clients to have their own threads, then shares threads between players as traffic increases. The main disadvantage of this model is that as the volume of data processing increases, the amount of overhead caused by asynchronous calls through Windows and managed threading through .NET also increases (creating and destroying threads). It's far from the worst socket model, but is still far below the final option that I will discuss. In regards to Conquer Online, this model can support around 35 - 300 players (again, depending on the system).

Finally, let's talk about overlapped I/O (otherwise known as IOCP / I/O Completion Ports). With this thread model, a dedicated amount of worker threads are created, all listening on a completion port. This eliminates the need to create and terminate threads upon connect and disconnect. All I/O (data packets) are queued to the completion port. When a packet is queued to the completion port, the last associated thread is released and the packet is processed almost immediately. This type of model doesn't use windows events, which speeds up packet processing significantly. In regards to Conquer Online, this model can support around 50 - 1000+ players (again, depending on the system).

Keep in mind, the player counts are hypothesized based on the socket's threading implementation. I hope this answers the original question that was asked. If you have any further questions on threading and socket systems, send a comment or private message my way. Excuse any grammatical errors (I didn't proof read this).

Cheers,
Spirited
08/26/2014 09:51 KraHen#2
You tend to forget that this doesn`t apply to sockets only. Sockets are just another thing to perform I/O operations on, that`s it, non-blocking I/O processing should be talked about separately IMO. These ways work well for windows, but if you have a UNIX-based system for instance, you might want to consider other options.

Nonetheless, great article, enjoyed reading. :)
08/26/2014 12:35 Best Coder 2014#3
Quote:
Originally Posted by Spirited View Post
Now, let's add a thread for accepting new players and a thread per player to receive data. At first, this seems like a great solution that won't block the socket system up (a non-blocking socket system), but after a few players, this server model will slow down due to the amount of threads requesting processing.
Just using multiple threads doesn't make it non-blocking, though. The operations on each socket are still blocking.
Quote:
Originally Posted by Spirited View Post
The next option, which is popular on these forums, is using an asynchronous model using the built in asynchronous calls in .NET. Depending on the configuration of the setup, an asynchronous model can make a strong, scalable socket system. The model first allows clients to have their own threads, then shares threads between players as traffic increases. The main disadvantage of this model is that as the volume of data processing increases, the amount of overhead caused by asynchronous calls through Windows and managed threading through .NET also increases (creating and destroying threads). It's far from the worst socket model, but is still far below the final option that I will discuss. In regards to Conquer Online, this model can support around 35 - 300 players (again, depending on the system).

Finally, let's talk about overlapped I/O (otherwise known as IOCP / I/O Completion Ports). With this thread model, a dedicated amount of worker threads are created, all listening on a completion port. This eliminates the need to create and terminate threads upon connect and disconnect. All I/O (data packets) are queued to the completion port. When a packet is queued to the completion port, the last associated thread is released and the packet is processed almost immediately. This type of model doesn't use windows events, which speeds up packet processing significantly. In regards to Conquer Online, this model can support around 50 - 1000+ players (again, depending on the system).
These "built-in asynchronous calls in .NET" already use IOCP, assuming you're talking about Beginxxx/xxxAsync.
08/26/2014 14:23 KraHen#4
Quote:
Originally Posted by Best Coder 2014 View Post
These "built-in asynchronous calls in .NET" already use IOCP, assuming you're talking about Beginxxx/xxxAsync.
And an epoll based approach on Mono AFAIK.
08/26/2014 14:30 Super Aids#5
I don't think you really understand asynchronous sockets.
A prime example is this
[Only registered and activated users can see links. Click Here To Register...]
from
[Only registered and activated users can see links. Click Here To Register...]

@The poll discussion
.NET Sockets though doesn't use poll() on Windows though, they use select()
[Only registered and activated users can see links. Click Here To Register...]

[Only registered and activated users can see links. Click Here To Register...]
[Only registered and activated users can see links. Click Here To Register...]
08/26/2014 15:26 Best Coder 2014#6
Quote:
Originally Posted by Super Aids View Post
@The poll discussion
.NET Sockets though doesn't use poll() on Windows though, they use select()
[Only registered and activated users can see links. Click Here To Register...]
That's not really what we were talking about, though. The asynchronous socket operations (BeginReceive, ReceiveAsync, etc.) use IOCP on Windows, while the Mono implementation uses epoll on Linux-based operating systems for these asynchronous operations.
08/26/2014 15:52 Super Aids#7
Quote:
Originally Posted by Best Coder 2014 View Post
That's not really what we were talking about, though. The asynchronous socket operations (BeginReceive, ReceiveAsync, etc.) use IOCP on Windows, while the Mono implementation uses epoll on Linux-based operating systems for these asynchronous operations.
I know I know. I was just referring to that usually you would use poll() or epoll() on Posix, but .NET Sockets doesn't.
08/26/2014 17:28 Spirited#8
How do I "tend to forget" about other forms of I/O? This thread is about sockets. I'm answering a question not writing a book. Also, I never associated non-blocking with multithreaded. I associated non-blocking with a non-blocking model. The only thing I neglected to mention was the word "Synchronous", but oh well.

PS: I'm more than aware that IOCP is Windows only. Who here is developing for Linux or Mac besides CptSky? The keyword I'm going for in replying is relevance.
08/26/2014 18:07 Best Coder 2014#9
Quote:
Originally Posted by Spirited View Post
PS: I'm more than aware that IOCP is Windows only. Who here is developing for Linux or Mac besides CptSky? The keyword I'm going for in replying is relevance.
No one is arguing whether IOCP is Windows-only or not ... you were talking about the "built in asynchronous calls in .NET" and IOCP as if they were two completely seperate things, when in fact these methods are basically just wrappers around the IOCP API.
08/27/2014 05:01 Spirited#10
Quote:
Originally Posted by Best Coder 2014 View Post
No one is arguing whether IOCP is Windows-only or not ... you were talking about the "built in asynchronous calls in .NET" and IOCP as if they were two completely seperate things, when in fact these methods are basically just wrappers around the IOCP API.
You're right. I just reflected it, and it does use some overlapping IO. I somewhat expected that when you said it does, but I didn't think about it until you brought it up. It still doesn't use the traditional IOCP model I discussed though. It uses .NET's managed thread pool which still creates and destroys threads when processing events. You are right about me being wrong though, they both use IOCP, just one uses a managed thread pool. Arg... I suppose I'll have to edit the thread now. I'll do it some time later this week.
08/27/2014 10:38 KraHen#11
Quote:
Originally Posted by Spirited View Post
You're right. I just reflected it, and it does use some overlapping IO. I somewhat expected that when you said it does, but I didn't think about it until you brought it up. It still doesn't use the traditional IOCP model I discussed though. It uses .NET's managed thread pool which still creates and destroys threads when processing events. You are right about me being wrong though, they both use IOCP, just one uses a managed thread pool. Arg... I suppose I'll have to edit the thread now. I'll do it some time later this week.
You see this is why I still come back to epvp after all these years, I (and I`m sure I`m not alone) can still learn a few things in threads like this. :D