Journal(2005) | Blog(2006) | RandomLink | WhoAmI | LiveBookmark | HomePage

<<Previous: fail to embed parrot in pugs(Resolved)  >>Next: my subtype Examples16 of Perl6 where /^kissu$/

Google Sitemap

Category: Script   Keywords: google sitemap

Google 最近推出了新业务,Google Sitemap.
主要是给 webmaster 在自己的网站上建一个 XML 文档,然后 Google Sprider 来抓取这个文档查看网站更新了什么。
我不明白为什么不直接抓取 RSS/Atom, 这样就可以省去重新写一个文档了。

或许 RSS/Atom 的冗长资料太多了,Google Sitemap 所要求的 XML 标签很少。详细的要求请参阅 https://www.google.com/webmasters/sitemaps/docs/en/protocol.html
我在自己的 Eplanet 上简单 hack 了下。
因为我自己的要求很少(比如不用过滤 XML 的特殊标签),所以代码很简单(没有使用任何 XML:: 模块)。
代码很简单,看完所要求的格式就可以了。

# declare the vars
my $data; # the data to save in

# normal settings
my $file = $c->config->{build_root} . "/sitemaps.xml";
my $site_prefix = 'http://www.fayland.org/journal';
    
# got the data from database
my @topics = Eplanet::M::CDBI::Cms->retrieve_from_sql(qq{
    1=1 ORDER BY cms_id DESC LIMIT 0, 20
});
    
# the data format see https://www.google.com/webmasters/sitemaps/docs/en/protocol.html
$data = '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">'."\n";

# for every topic
foreach my $topic (@topics) {
    my $topic_name = $topic->get('cms_file');
    my $date = $topic->{'cms_mod_data'} || $topic->{'cms_cre_data'};
    # format the date
    $date = &format_date($date);
    
    # add to the data
    $data .= "<url>
    <loc>$site_prefix/${topic_name}.html</loc>
    <lastmod>$date</lastmod>
</url>\n";
}
    
# finish the data format
$data .= '</urlset>';

# save to the file
open(FH, ">$file");
flock(FH, 2);
print FH $data;
close(FH);

完毕后就提交 URL. Done!

<<Previous: fail to embed parrot in pugs(Resolved)  >>Next: my subtype Examples16 of Perl6 where /^kissu$/

Options: +Del.icio.us

Related items Created on 2005-06-04 22:52:44, Last modified on 2005-06-04 22:53:45
Copyright 2004-2005 All Rights Reserved. Powered by Eplanet && Catalyst 5.62.