iconv() : Detected an illegal character in input string
$str = iconv(‘UTF-8′, ‘GBK//IGNORE’, unescape(isset($_GET['str'])? $_GET['str']:”));
If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can’t be represented in the target charset, it can be approximated through one or several similarly looking characters.
本地测试//IGNORE能忽略掉它不认识的字接着往下转,并且不报错,而//TRANSLIT是截掉它不认识的字及其后面的内容,并且报错。//IGNORE是我需要的。
1. 发现iconv在转换字符”-”到gb2312时会出错,如果没有ignore参数,所有该字符后面的字符串都无法被保存。不管怎么样,这个”-”都无法转换成功,无法输出。另外mb_convert_encoding没有这个bug.
2. mb_convert_encoding 可以指定多种输入编码,它会根据内容自动识别,但是执行效率比iconv差太多;如:$str = mb_convert_encoding($str,”euc-jp”,”ASCII,JIS,EUC-JP,SJIS,UTF- 8″);“ASCII,JIS,EUC-JP,SJIS,UTF-8”的顺序不同效果也有差异
3. 一般情况下用 iconv,只有当遇到无法确定原编码是何种编码,或者iconv转化后无法正常显示时才用
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
需要先启用 mbstring 扩展库
from_encoding is specified by character code name before conversion. it can be array or string – comma separated enumerated list. If it is not specified, the internal encoding will be used.
$str = mb_convert_encoding($str, “UCS-2LE”, “JIS, eucjp-win, sjis-win”);
$str = mb_convert_encoding($str, “EUC-JP’, “auto”);
例子:
$content = iconv(“GBK”, “UTF-8//IGNORE”, $content);
$content = mb_convert_encoding($content, “UTF-8″, “GBK”);
最后总结, iconv(“GBK”, “UTF-8//IGNORE”, $content)仍然会报错,mb_convert_encoding($content, “UTF-8″, “GBK”)不报错。
fantastic post! fantastic assistance, is going to take fully briefed!
Very good composed info. It will likely be helpful to anybody who employess that, and also us. Keep doing what you are doing canr hold out you just read far more posts.