What you explained regarding encoding is correct ... but the issue is not there.
First, advices are obviously welcome but the problem is not the lack of a solution, it is the lack of time to code it.
To solve the problem, I mean properly solve the pb for all roman (not english) languages (it does work with last change but only if codepages of system and sdata are the same) and solve it for asian languages that will continue to randomly bug; I just have to recode all processes involving strings (Windows msg, displays, ...) in Unicode, as simple as that.
Currently I decode the MBS coded in sdata-codepage to MSB coded in system-codepage (actually to Unicode but it is managed as MBS and so let system default codepage applies). It appears that it works in some cases, but it's not reliable.
Let's review the cash.sdata to explain some points.
The Plain cash.sdata (w/o encryption header) is:
Code:
20010000 => 0120h = 288 records
1st record:
rowID 01000000
type 01000000
icon 35000000
cost 20030000 => 0320h = 800
itemID[24] and itemCnt[24]
1E870100 0A (100.126 x 10)
00000000 00 x 23
name:
len 14000000
val D6C7BEABD2A9BCC120B5C8BCB633283130290000
code:
len 0E000000
val 42315F4170737450303331300000 = "B1_ApstP0310\0\" in plain ASCII
description:
len 44000000
val 3130B7D6D6D3C4DAD6C7C1A6BACDBEAB
C9F1B8F7C9FD31322CCBC0CDF6CAB1D0
A7B9FBCFFBCAA72EBDF6CFDECAB9D3C3
BDC7C9AB3CCFDED6C6BDBBD2D7B5C0BE
DF3E0000
and so on for next items
there D6C7BEABD2A9BCC120B5C8BCB633283130290000
and
3130B7D6D6D3C4DAD6C7C1A6BACDBEAB
C9F1B8F7C9FD31322CCBC0CDF6CAB1D0
A7B9FBCFFBCAA72EBDF6CFDECAB9D3C3
BDC7C9AB3CCFDED6C6BDBBD2D7B5C0BE
DF3E0000
are multi-bytes chars encoded with a "given system".
"multi-bytes" means that each individual character or glyph (pictogram) is coded with 1 or several bytes; '3130' is "10", subsequent bytes encode glyphs on 2 bytes.
The used encoding system can be figured out by Notepad++
If one creates a text file, without BOM header, with the content
Code:
D6 C7 BE AB D2 A9 BC C1 20 B5 C8 BC B6 33 28 31 30 29 0D 0A
31 30 B7 D6 D6 D3 C4 DA D6 C7 C1 A6 BA CD BE AB C9 F1 B8 F7
C9 FD 31 32 2C CB C0 CD F6 CA B1 D0 A7 B9 FB CF FB CA A7 2E
BD F6 CF DE CA B9 D3 C3 BD C7 C9 AB 3C CF DE D6 C6 BD BB D2
D7 B5 C0 BE DF 3E 0D 0A
and drop it on Notepad++, it displays:
智精药剂 等级3(10)
10分钟内智力和精神各升12,死亡时效果消失.仅限使用角色<限制交易道具>
and indicates that text is encoded using "GB2312", where "gb2312" is the name of the norm defining official charset of the People's Republic of China.
Windows (since W95) uses a page named "936" that contains the full GB2312 charset plus some additional glyphs.
The current code performs:
(where name is a ""buffer with string facilities"" that contains the data read from .sdata w/o any change, so)
// name = D6C7 BEAB D2A9 BCC1 20 B5C8 BCB6 33 28 31 30 29
wchar_t temp[256] = { 0 };
::MultiByteToWideChar(936, MB_PRECOMPOSED, (LPCCH) name, name.length(), temp, 256);
and obtains (for the name):
667A 7CBE 836F 5242 0020 7B49 7EA7 0033 0028 0031 0030 0029
which displays: 智精药剂 等级3(10).
The issue is that the decoded strings ("智精药剂 等级" there) are not used as wchar_t[] but still as a MBS; it appears that "sometime" Windows understands it as Unicode (WS), possibly on systems for lazy where a lot of softs didn't care about Unicode, and quite always fail on systems like your where all appl must be fully Unicode compliant.
One last time: shStudio is not fully Unicode-based (because shaiya files were not Unicode-encoded) and a recoding (not some pseudo-magic-fix) is required.
Quote:
Originally Posted by LibPor22
Castor, I did not tell you in the upper reply that I already own an episode 8 server stable version [...]
|
Several fixes of the edition of BDxxx.sdata files are also planned - saving as DB file is corrupted for several (incl items) and some use the old list-view w/o support of the tailored editors.
The global change to Unicode and the support of the DB files (the DBxxxTextxx are finally Unicode encoded) will occur in the same phase.
Regarding a test server, I no longer have my own svr since years; a package "ready to be used" will significantly help me. ("ready to use" means for tests purpose, any weakness if used as prod svr won't be relevant).
Thank you to drop me private mail if you wanna help me on this point.