Journal(2005) | Blog(2006) | RandomLink | WhoAmI | LiveBookmark | HomePage

<<Previous: Maypole  >>Next: WWW Security FAQ: CGI Scripts

utf8与gb2312编码

Category: Basic   Keywords: utf8 编码 gb2312

    问题:
  1. 将gb2312数据格式的文件转为utf8格式
  2. 在utf8编码下显示gb2312数据文件
  3. 如何直接生成utf8格式的数据文件
  4. 不涉及文件读取,script文件里print "中文";
其实最主要的一句为:
use Encode qw/encode decode/;
my $utf_data = encode("utf8", decode("gb2312", $data));
# $data为gb2312格式, $utf_data为utf8格式
等同的代码还有:
use Encode qw/from_to/;
from_to($data, "gb2312", "utf8");
# $data从gb2312格式转为utf8格式
相反从utf8转为gb2312也成。encode,decode里的参数互换下。

例子与代码:
gb.dat是gb2312数据格式的文件。在-charset=>'utf-8'时显示乱码,gb2312时正常。


#!/usr/bin/perl
use strict;
use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:standard/;
use Encode qw/encode decode from_to/;

my $cgi = new CGI;
# charset utf8
print $cgi->header(-type=>'text/html',-charset=>'utf-8');

# open the gb2312 file
open(FH, "gb.dat");
my $data = <FH>;
close(FH);

# convert gb2312 to utf8
my $utf_data = encode("utf8", decode("gb2312", $data));

# produce the utf8 file
open(FH, ">utf8.dat");
print FH $utf_data;
close(FH);

my $word = "我是中国人";
from_to($word, "gb2312", "utf8");

print "$utf_data, $word";
经过转换后,以后在-charset=>'utf-8'下直接读取utf8.dat而不用再次decode/encode.

不涉及文件读取,script文件里print "中文";

在script.pl里use encoding "euc-cn", STDOUT => "utf8";
use CGI qw/:standard/;
use encoding "euc-cn", STDOUT => "utf8";

my $cgi = new CGI;
print $cgi->header(-type=>'text/html',-charset=>'utf-8');

print "我是中国人,我爱野文";

Refer

<<Previous: Maypole  >>Next: WWW Security FAQ: CGI Scripts

Options: +Del.icio.us

Related items Created on 2004-11-09 12:42:33, Last modified on 2004-11-10 23:24:17
Copyright 2004-2005 All Rights Reserved. Powered by Eplanet && Catalyst 5.62.