Kodi Community Forum
utf-8 file names and content - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Add-ons (https://forum.kodi.tv/forumdisplay.php?fid=26)
+---- Forum: Python 3 migration (https://forum.kodi.tv/forumdisplay.php?fid=281)
+---- Thread: utf-8 file names and content (/showthread.php?tid=366245)



utf-8 file names and content - fbacher - 2021-12-31

If you need to access a file with a utf-8 path, then you need to explicitly encode the path. If your text is utf-8, then you need to specify the encoding as utf-8:

io.open(filename.encode('utf-8'), mode='rt', encoding='utf-8)

Normally, Python discovers the filesystem encoding (for filenames) and sets it. However, due to a patch introduced in Kodi 19.2 (https://github.com/xbmc/xbmc/issues/19883) to work around what looks like a nasty Kodi Turkish (and other) string handling problem, the filename encoding is 'ASCII' instead of 'utf-8' (at least on Linux). This means that you have to explicitly specify it (at least until the other bug is fixed).

I'm not sure of the behavior of utf-8 filenames on different windows versions or OS's that don't support utf-8 filenames. Most modern systems support utf-8 paths.

Failure to specify filename.encode('utf-8') can cause errors about out of range ASCII characters when the filename contains non-ASCII characters

Issue 19883 is a cautionary tale about subtle handling of character comparison, etc. in different languages. They don't always obey the rules that we expect.