v50 Steam/Premium information for editors
- v50 information can now be added to pages in the main namespace. v0.47 information can still be found in the DF2014 namespace. See here for more details on the new versioning policy.
- Use this page to report any issues related to the migration.
This notice may be cached—the current version can be found here.
Editing Utility:Accent Removal
Jump to navigation
Jump to search
Warning: You are not logged in.
Your IP address will be recorded in this page's edit history.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 3: | Line 3: | ||
Some tile sets use the accented characters for additional graphical symbols. This can make racial language text difficult to read. You can remove the accented characters and symbols from the data files. This works on existing worlds and saved games. | Some tile sets use the accented characters for additional graphical symbols. This can make racial language text difficult to read. You can remove the accented characters and symbols from the data files. This works on existing worlds and saved games. | ||
− | Since the structure of language files might change, it is safest if you remove the problem characters from the files yourself. Here are | + | Since the structure of language files might change, it is safest if you remove the problem characters from the files yourself. Here are two methods to do just that. The first (Jackard's) only works on Windows, but is probably the easiest for novice users. The second (frobnic8's) will work anywhere Python does (i.e. just about anywhere), but requires using the command line a little. |
==[[User:Jackard|Jackard]]'s [http://www.inforapid.de/html/searchreplace.htm InfoRapid] Script== | ==[[User:Jackard|Jackard]]'s [http://www.inforapid.de/html/searchreplace.htm InfoRapid] Script== | ||
Line 123: | Line 123: | ||
<ol> | <ol> | ||
− | <li>Ensure you have [http://www.python.org Python] installed.</li> | + | <li>Ensure you have [http://www.python.org Python] installed. (If you have Python 3.x installed, you will need to remove the unicode functions on line 100 and 104, and change the print statements to functions.)</li> |
− | <li>Copy and paste | + | <li>Copy and paste (this modified version of) "The Unicode Hammer" with the name <code>unicode_hammer.py</code> in the <code>raw/objects</code> sub-directory of your Dwarf FOrtress directory. (The Unicode Hammer: Is that a name worthy of Dwarf Fortress, or what?)<p><pre> |
#!/usr/bin/env python | #!/usr/bin/env python | ||
− | """ | + | """ |
− | + | latin1_to_ascii -- The UNICODE Hammer -- AKA "The Stupid American" | |
− | + | ||
+ | This takes a UNICODE string and replaces Latin-1 characters with | ||
+ | something equivalent in 7-bit ASCII. This returns a plain ASCII string. | ||
+ | This function makes a best effort to convert Latin-1 characters into | ||
+ | ASCII equivalents. It does not just strip out the Latin1 characters. | ||
+ | All characters in the standard 7-bit ASCII range are preserved. | ||
+ | In the 8th bit range all the Latin-1 accented letters are converted to | ||
+ | unaccented equivalents. Most symbol characters are converted to | ||
+ | something meaningful. Anything not converted is deleted. | ||
+ | |||
+ | Background: | ||
− | + | One of my clients gets address data from Europe, but most of their systems | |
+ | cannot handle Latin-1 characters. With all due respect to the umlaut, | ||
+ | scharfes s, cedilla, and all the other fine accented characters of Europe, | ||
+ | all I needed to do was to prepare addresses for a shipping system. | ||
+ | After getting headaches trying to deal with this problem using Python's | ||
+ | built-in UNICODE support I gave up and decided to use some brute force. | ||
+ | This function converts all accented letters to their unaccented equivalents. | ||
+ | I realize this is dirty, but for my purposes the mail gets delivered. | ||
+ | |||
+ | Noah Spurrier noah at noah.org | ||
+ | License free and public domain | ||
""" | """ | ||
+ | """This version has had its translation table abused to produce | ||
+ | better results for the language files of the game Dwarf Fortress by | ||
+ | frobnic8. The original translation table is commented out. | ||
− | + | There are arguably better ways to do this using things like: | |
− | |||
− | def latin1_to_ascii(unicrap): | + | for line in codecs.open(source, encoding='latin-1'): |
+ | print unicodedata.normalize('NFKD', line).encode('ASCII', 'ignore') | ||
+ | """ | ||
+ | |||
+ | def latin1_to_ascii (unicrap): | ||
"""This takes a UNICODE string and replaces Latin-1 characters with | """This takes a UNICODE string and replaces Latin-1 characters with | ||
something equivalent in 7-bit ASCII. It returns a plain ASCII string. | something equivalent in 7-bit ASCII. It returns a plain ASCII string. | ||
Line 147: | Line 173: | ||
something meaningful. Anything not converted is deleted. | something meaningful. Anything not converted is deleted. | ||
""" | """ | ||
− | xlate = { | + | xlate={0xc0:'A', 0xc1:'A', 0xc2:'A', 0xc3:'A', 0xc4:'A', 0xc5:'A', |
− | + | 0xc6:'Ae', 0xc7:'C', | |
− | + | 0xc8:'E', 0xc9:'E', 0xca:'E', 0xcb:'E', | |
− | + | 0xcc:'I', 0xcd:'I', 0xce:'I', 0xcf:'I', | |
− | + | 0xd0:'Th', 0xd1:'N', | |
− | + | 0xd2:'O', 0xd3:'O', 0xd4:'O', 0xd5:'O', 0xd6:'O', 0xd8:'O', | |
− | + | 0xd9:'U', 0xda:'U', 0xdb:'U', 0xdc:'U', | |
− | + | 0xdd:'Y', 0xde:'th', 0xdf:'ss', | |
− | + | 0xe0:'a', 0xe1:'a', 0xe2:'a', 0xe3:'a', 0xe4:'a', 0xe5:'a', | |
− | + | 0xe6:'ae', 0xe7:'c', | |
− | + | 0xe8:'e', 0xe9:'e', 0xea:'e', 0xeb:'e', | |
− | + | 0xec:'i', 0xed:'i', 0xee:'i', 0xef:'i', | |
− | + | 0xf0:'th', 0xf1:'n', | |
− | + | 0xf2:'o', 0xf3:'o', 0xf4:'o', 0xf5:'o', 0xf6:'o', 0xf8:'o', | |
− | + | 0xf9:'u', 0xfa:'u', 0xfb:'u', 0xfc:'u', | |
− | + | 0xfd:'y', 0xfe:'th', 0xff:'y', | |
− | + | 0xa1:'aa', 0xa2:'cz', 0xa3:'ii', 0xa4:'tz', | |
− | + | 0xa5:'yy', 0xa6:'|', 0xa7:'zz', 0xa8:'"', | |
− | + | 0xa9:'CC', 0xaa:'aa', 0xab:'<<', 0xac:'not', | |
− | + | 0xad:'-', 0xae:'{R}', 0xaf:'_', 0xb0:'o', | |
− | + | 0xb1:'+/-', 0xb2:'^2', 0xb3:'^3', 0xb4:"'", | |
− | + | 0xb5:'uu', 0xb6:'PP', 0xb7:'*', 0xb8:',,', | |
− | + | 0xb9:'^1', 0xba:'^o', 0xbb:'>>', | |
− | + | 0xbc:'1/4', 0xbd:'1/2', 0xbe:'3/4', 0xbf:'?', | |
− | + | 0xd7:'*', 0xf7:'/' | |
− | + | } | |
− | + | """ Orignals below, the above is hacked for Dwarf Fortress languages. | |
− | + | 0xa1:'!', 0xa2:'{cent}', 0xa3:'{pound}', 0xa4:'{currency}', | |
− | + | 0xa5:'{yen}', 0xa6:'|', 0xa7:'{section}', 0xa8:'{umlaut}', | |
− | + | 0xa9:'{C}', 0xaa:'{^a}', 0xab:'<<', 0xac:'{not}', | |
− | + | 0xad:'-', 0xae:'{R}', 0xaf:'_', 0xb0:'{degrees}', | |
− | + | 0xb1:'{+/-}', 0xb2:'{^2}', 0xb3:'{^3}', 0xb4:"'", | |
− | + | 0xb5:'{micro}', 0xb6:'{paragraph}', 0xb7:'*', 0xb8:'{cedilla}', | |
− | + | 0xb9:'{^1}', 0xba:'{^o}', 0xbb:'>>', | |
− | + | 0xbc:'{1/4}', 0xbd:'{1/2}', 0xbe:'{3/4}', 0xbf:'?', | |
− | + | 0xd7:'*', 0xf7:'/' | |
− | + | } | |
− | + | """ | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
r = '' | r = '' | ||
for i in unicrap: | for i in unicrap: | ||
Line 240: | Line 222: | ||
if __name__ == '__main__': | if __name__ == '__main__': | ||
− | + | import sys | |
− | + | input = sys.stdin | |
− | + | output = sys.stdout | |
− | for | + | if len(sys.argv) == 1 or (len(sys.argv) == 2 and \ |
− | + | sys.argv[1] in ('-h', '-H', '-?', '--help', '/?', '/H', '/h')): | |
− | + | print 'unicode_hammer.py [infile [outfile]]\n' | |
− | + | #for python 3.x, changes the following line to s = '' | |
− | + | s = unicode('','latin-1') | |
− | + | for c in range(32, 256): | |
+ | if c != 0x7f: | ||
+ | #for python 3.x, change the following line to s += str(chr(c)) | ||
+ | s += unicode(chr(c), 'latin-1') | ||
+ | plain_ascii = latin1_to_ascii(s) | ||
+ | |||
+ | #for python 3.x, change all of the following print statements to functions (wrap the entire statement in parenthesis) | ||
+ | print 'INPUT type:', type(s) | ||
+ | print 'INPUT:' | ||
+ | print s.encode('latin-1') | ||
+ | print | ||
+ | print 'OUTPUT type:', type(plain_ascii) | ||
+ | print 'OUTPUT:' | ||
+ | print plain_ascii | ||
+ | sys.exit() | ||
+ | |||
+ | if len(sys.argv) > 1: | ||
+ | input = open(sys.argv[1]) | ||
+ | if len(sys.argv) > 2: | ||
+ | output = open(sys.argv[2], 'w') | ||
+ | for line in input: | ||
+ | output.write(latin1_to_ascii(line)) | ||
</pre></p></li> | </pre></p></li> | ||
− | <li> | + | <li>Open a command prompt and change directory to your <code>raw/objects</code> directory.</li> |
+ | <li>Rename the four language files, adding '.orig' to the end of their names:<p><pre> | ||
+ | mv language_DWARF.txt language_DWARF.txt.orig | ||
+ | mv language_ELF.txt language_ELF.txt.orig | ||
+ | mv language_GOBLIN.txt language_GOBLIN.txt.orig | ||
+ | mv language_HUMAN.txt language_HUMAN.txt.orig | ||
+ | </pre></p></li> | ||
+ | <li>Apply the hammer to each of the four language files as follows:<p><pre> | ||
+ | python unicode_hammer.py language_DWARF.txt.orig language_DWARF.txt | ||
+ | python unicode_hammer.py language_ELF.txt.orig language_ELF.txt | ||
+ | python unicode_hammer.py language_GOBLIN.txt.orig language_GOBLIN.txt | ||
+ | python unicode_hammer.py language_HUMAN.txt.orig language_HUMAN.txt | ||
+ | </pre></p></li> | ||
<li>Enjoy!</li> | <li>Enjoy!</li> | ||
</ol> | </ol> | ||
Line 273: | Line 288: | ||
For Windows users there is this small [http://dffd.wimbli.com/file.php?id=2088 application] that replaces accented characters from files by just dragging & dropping the file on the application icon. | For Windows users there is this small [http://dffd.wimbli.com/file.php?id=2088 application] that replaces accented characters from files by just dragging & dropping the file on the application icon. | ||
− | |||
− |