Category: Basic Keywords: utf8 编码 gb2312
- 问题:
- 将gb2312数据格式的文件转为utf8格式
- 在utf8编码下显示gb2312数据文件
- 如何直接生成utf8格式的数据文件
- 不涉及文件读取,script文件里print "中文";
use Encode qw/encode decode/;
my $utf_data = encode("utf8", decode("gb2312", $data));
# $data为gb2312格式, $utf_data为utf8格式
等同的代码还有:
use Encode qw/from_to/; from_to($data, "gb2312", "utf8"); # $data从gb2312格式转为utf8格式相反从utf8转为gb2312也成。encode,decode里的参数互换下。
例子与代码:
gb.dat是gb2312数据格式的文件。在-charset=>'utf-8'时显示乱码,gb2312时正常。
#!/usr/bin/perl
use strict;
use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:standard/;
use Encode qw/encode decode from_to/;
my $cgi = new CGI;
# charset utf8
print $cgi->header(-type=>'text/html',-charset=>'utf-8');
# open the gb2312 file
open(FH, "gb.dat");
my $data = <FH>;
close(FH);
# convert gb2312 to utf8
my $utf_data = encode("utf8", decode("gb2312", $data));
# produce the utf8 file
open(FH, ">utf8.dat");
print FH $utf_data;
close(FH);
my $word = "我是中国人";
from_to($word, "gb2312", "utf8");
print "$utf_data, $word";
经过转换后,以后在-charset=>'utf-8'下直接读取utf8.dat而不用再次decode/encode.
不涉及文件读取,script文件里print "中文";
在script.pl里use encoding "euc-cn", STDOUT => "utf8";use CGI qw/:standard/; use encoding "euc-cn", STDOUT => "utf8"; my $cgi = new CGI; print $cgi->header(-type=>'text/html',-charset=>'utf-8'); print "我是中国人,我爱野文";