Normal Topic import data from files in UTF-8 format wrong Chars (Read 942 times)
assistent
Member
*
Offline



Posts: 11
Joined: Feb 11th, 2008
import data from files in UTF-8 format wrong Chars
Mar 6th, 2015 at 3:33pm
Print Post Print Post  
I have files coming from the mail-server which i want import on a daily bases. These files contains all data from our webform of the website.

These files are in EML format, and can import them.
I also succeeded to convert the Mulitpart BASE-64 sections in the file (if present)

The problem is with the encoding. These files are in UTF-8 format, and contain sometimes international characters. like ë.
When importing the file with @Insert, or FileOpen method. it will show "ë" instead of ë.

So Sesame assumes reading ISO 8859-1 format but in fact the file = UTF-8. (which is more international) and widely accepted.
Is there anyway a possibility to force Sesame to read file in UTF-8 format?

(In XLS files there can be set  <?xml version="1.0" encoding="UTF-8"?> as first part of the data)

  
Back to top
 
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: import data from files in UTF-8 format wrong Chars
Reply #1 - Mar 7th, 2015 at 3:55am
Print Post Print Post  
Sesame can read and work with UTF-8, bit cannot display the multi-byte characters as intended. If Sesame is the end program in the chain, you might consider viewing that field using an external display program.

During early development, the UTF-8 versus unicode debate was still in progress and the GUI library on which Sesame relies was holding off until the dust settled. It has since integrated UTF-8 support, but because of multiple modifications, we are still linked to the older version. Adding UTF-8 support to Sesame is on my to-do list.
  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
assistent
Member
*
Offline



Posts: 11
Joined: Feb 11th, 2008
Re: import data from files in UTF-8 format wrong Chars
Reply #2 - Mar 13th, 2015 at 9:22am
Print Post Print Post  
Thank you for your explanation, i understand the difficulties in the horror of charatersets.
maybe consider an option in de Sesame -ini file and let the end-user decide which Characterset is supported.
  
Back to top
 
IP Logged
 
assistent
Member
*
Offline



Posts: 11
Joined: Feb 11th, 2008
Re: import data from files in UTF-8 format wrong Chars
Reply #3 - Mar 13th, 2015 at 1:08pm
Print Post Print Post  
I now resolved this by converting the UTF-8 data to ISO-8859-1 with Iconv for windows.
See: http://dbaportal.eu/2012/10/24/iconv-for-windows/

Just copy the Bin directory of Iconv to a directory (like c:\sesame2\Iconv


function Conv_UTF(datastr as String) as String
var ISOData as String
var UTFFile as String = "a file in my sesame directory
     //Iconf needs a file to convert
     FileOverWrite(UTFFile, datastr) // We save the UTF-8 data to a file
     //We convert the file, and Iconv writes the result to the ISOData
     ISOData = @RedirectProcess("K:\QA_Sesame\Sesame2_SERVER_APP\Iconv\bin\iconv -f utf-8 -t iso-8859-1 " & UTFFile ,  "")
return ISOData
end function
  
Back to top
 
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: import data from files in UTF-8 format wrong Chars
Reply #4 - Mar 13th, 2015 at 1:14pm
Print Post Print Post  
I may be able to implement something like that using ISO code pages as a stopgap until UTF-8 can be implemented. But, unfortunately, that won't do you a lot of good, in that you have UTF-8 encoded data coming in from another a source.

If you know the character set likely to be used in your UTF-8 data, and that data is in a variation of the roman character set (western European as opposed to Japanese or Arabic), it may be possible to convert it to 8 bit ISO code page values.
  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged