Drew's SilkroadSecurityAPI is great at what it does but was probably never intended to be used in a scenario where it becomes part of the horrible choke point we know as filters today.
X-Filter is using regular SilkroadSecurityAPI with a few QoL changes so I wouldn't really call it a different API but there are probably also people who have written another one from scratch.
In order to write a better API you'll first have to understand

and then where the weaknesses are in SilkroadSecurityAPI and how it connects to C# in particular.
There are some who ported the API to C++ for performance reasons but in reality they're just offsetting the weaknesses instead of working on them. Creating more headroom for mistakes made in the higher application code.
Before we analyze some of the weaknesses of SilkroadSecurityAPI we have to address why C# plays a big role in this.
C# or rather the CLR (

) uses a

that is responsible of cleaning up all the objects (Sessions, Packets, Strings) you've created when they're no longer needed. So every time you allocate something (on the heap) it needs to be collected later. This magic does not come for free. So we'll have to design around the GC to write high performance code in C#.
We can achieve this by:
- Pooling Memory
- Pooling Objects
- Avoiding unnecessary copying
- Allocating on the stack instead of heap
- Using value types when appropriate
Here are some links for techniques you can use to tame the GC.
-

-

-
1. Blowfish
Code:
public byte[] Encode(byte[] stream, int offset, int length)
{
if (length == 0)
return null;
var workspace = new byte[GetOutputLength(length)];
Buffer.BlockCopy(stream, offset, workspace, 0, length);
for (int x = length; x < workspace.Length; ++x)
workspace[x] = 0;
for (int x = 0; x < workspace.Length; x += 8)
{
uint l = BitConverter.ToUInt32(workspace, x + 0);
uint r = BitConverter.ToUInt32(workspace, x + 4);
Blowfish_encipher(ref r, ref l);
Buffer.BlockCopy(BitConverter.GetBytes(r), 0, workspace, x + 0, 4);
Buffer.BlockCopy(BitConverter.GetBytes(l), 0, workspace, x + 4, 4);
}
return workspace;
}
If we look at the Encode method of the the Blowfish there are several issues.
1.
Allocates a new buffer for the output (workspace).
2.
Copies the input (stream) into the output (workspace).
3. Clears the array where no data has been copied over. (unnecessary as arrays are always default initialized in C#)
5. BitConverter.GetBytes
allocates the result array which is then
copied back into the output (workspace)
This could be improved by using unsafe pointers but .NET Core has brought us

so we can avoid a lot of these issues while writing safe code.
Span<T> allows us to access the same memory sequentially as any blittable data type with little to no overhead, thus we encipher without converting and copying back between parts of the byte[] and uints.
Code:
internal int Transform(TransformMode transformMode, in ReadOnlySpan<byte> input, in Span<byte> output, int length)
{
var input32 = MemoryMarshal.Cast<byte, uint>(input);
var output32 = MemoryMarshal.Cast<byte, uint>(output);
int blockIndex = 0;
for (int i = 0; i < length / 8; i++)
{
var left = input32[blockIndex + 0];
var right = input32[blockIndex + 1];
if (transformMode == TransformMode.Encode)
this.Encipher(ref left, ref right);
else if (transformMode == TransformMode.Decode)
this.Decipher(ref left, ref right);
output32[blockIndex + 0] = left;
output32[blockIndex + 1] = right;
blockIndex += 2;
}
}
final block transforming omitted for simplicity
With this method you can also easily encrypt into the the same buffer which gets rid of the output (workspace) allocation and copying input to output which is done by the transform anyway.
2. Packet
The Packet class is a nice general purpose class which you can use to read and write your packets data, no matter if massive or not, but not without flaws.
I don't have any fancy tables to prove my point but some of the write and read methods are rather slow. Further more the underlying MemoryStream in PacketWriter has a default buffer size of 256 and will grow when running out of space by
allocating a new buffer that is double the size of the old buffer. The older buffer is also
copied over to the new buffer. Spawn packets can easily break this threshold.
Thus it's probably not a good idea to pool a MemoryStream as you can't shrink it's growing buffer. So with enough time every pooled instance of a Packet could've been a spawn packet once and will have a rather large buffer wasting memory when the majority of the packets are small. But you can try to use the

.
I'd suggest creating a Packet class that works on a fixed 4096 byte buffer which has been allocated from a Pool (ArrayPool<T> / MemoryPool<T>) and is living on an object pool as well. But since it can't grow there is no way to create a massive packet anymore. For this you'll need to use multiple regular packets and read and write across them thus it simply contains a list of packets.
3. Security
I intendet to write this section in more detail but it turned out more exhausting than expected.
In essence it comes down to a full rewrite as you'll need to carefully avoid copying and allocating where possible.
- Don't allocate a new list everytime you TransferOutgoing/TransferIncoming
With a fixed 4096 byte message you can:
- Receive directly into the the fixed message buffer.
- encrypt/decrypt into itself avoiding allocations and coyping found in Recv and FormatPacket.
Pointing out every small issue would make an obscene large list and it's not all purely performance focused.
The SecurityFlags class for example can be easily swapped with a [Flags] enum to improve readability and reduce complexity.
4. Other
I'm not saying you should all now go and try to rebuild SilkroadSecurityAPI but if you truly care about performance by all means give it a try.
But it won't magically solve all your issues as the Network and Threading design equally important.
Another problem with filters I've seen is that nobody seems to think about which data structures to use.
Things like List<T> for white/blacklists are easy to avoid where a Dictionary<TKey, TValue> or HashSet<T> is appropriate. Please educate yourself about the

so you understand how to judge the

and choose accordingly.