DBIメソッドの返値一覧 
Tuesday, January 24, 2012, 08:54 AM - Programing, Programing / Perl
Posted by Administrator
なんか必要に駆られたので
全部SELECT

メソッド正常終了10件正常終了0件エラー
do100E0undef
selectall_arrayrefARRAY[10]ARRAY[0]undef
selectall_hashrefHASH{10}HASH{0}undef
selectrow_array1undefundef
selectrow_arrayrefARRAY[7]undefundef
selectrow_hashrefHASH{7}undefundef
selectcol_arrayrefARRAY[10]ARRAY[0]undef
fetchARRAY[7]undefundef
fetchrow_arrayrefARRAY[7]undefundef
fetchrow_array1undefundef
fetchrow_hashrefHASH{7}undefundef
fetchall_arrayrefARRAY[10]ARRAY[0]ARRAY[0]
fetchall_hashrefHASH{10}HASH{0}undef
execute100E0
row100-1

add comment ( 1007 views )   |  permalink
PerlとKakasi/ChaSen/MeCab 
Sunday, January 30, 2011, 12:51 AM - Programing, Programing / Perl
Posted by Administrator
簡易なインストール~検証まで

Kakasi+Text::Kakasi
# wget -q -O - http://kakasi.namazu.org/stable/kakasi-2.3.4.tar.gz|tar zx
# cd kakasi-2.3.4
# ./configure --prefix=/opt/kakasi;make install

# export C_INCLUDE_PATH=/opt/kakasi/include
# export LD_LIBRARY_PATH=/opt/kakasi/lib
# ln -s `ls /opt/kakasi/lib/libkakasi*` /usr/local/lib # *1
# cpan Text::Kakasi
# rm -fr `ls -d /usr/local/lib/libkakasi*` # *2

*1 libkakasi*が/usr/local/lib以下で見付けられないと正しくリンクされないようなので
*2 一度Text::Kakasiをインストールしてしまえばlibkakasi*は/usr/local/lib以下である必要ない

ChaSen+Text::ChaSen
# wget -q -O - http://chasen.org/~taku/software/darts/src/darts-0.32.tar.gz|tar zx # *3
# cd darts-0.32
# ./configure --prefix=/opt/chasen;make install
# export CPLUS_INCLUDE_PATH=/opt/chasen/include

# wget -q -O - 'http://sourceforge.jp/frs/redir.php?m=iij&f=%2Fchasen-legacy%2F32224%2Fchasen-2.4.4.tar.gz'|tar zx
# cd chasen-2.4.4
# ./configure --prefix=/opt/chasen;make install

# wget -q -O - 'http://sourceforge.jp/frs/redir.php?m=jaist&f=%2Fipadic%2F24435%2Fipadic-2.7.0.tar.gz'|tar zx
# cd ipadic-2.7.0
# export PATH=$PATH:/opt/chasen/bin
# for I in `ls *.cha *.dic chasenrc*`;do iconv -f euc-jp -t utf-8 -o $I~ $I;mv -f $I~ $I;done # *4
# `chasen-config --mkchadic`/makemat -i w
# `chasen-config --mkchadic`/makeda -i w chadic *.dic
# mkdir -p /opt/chasen/etc
# ./configure --prefix=/opt/chasen;make install

# wget -q -O - http://search.cpan.org/CPAN/authors/id/K/KN/KNOK/Text-ChaSen-1.04.tar.gz|tar zx # *5
# cd Text-ChaSen-1.04
# perl Makefile.PL LIBS='-L/usr/local/lib -lchasen'
# export LD_LIBRARY_PATH=/opt/chasen/lib
# ln -s `ls /opt/chasen/lib/libchasen*` /usr/local/lib
# make install
# rm -fr `ls -d /usr/local/lib/libchasen*`

*3 ChaSenのmakeでdarts.hが必要
*4 辞書をUTF-8へコンバートする必要がある
*5 Makefile.PLへLIBSパラメータを与える必要があるのでcpanが使えない、よって手動

MeCab+Text::MeCab
# wget -q -O - http://sourceforge.net/projects/mecab/files/mecab/0.98/mecab-0.98.tar.gz/download|tar zx
# cd mecab-0.98
# ./configure --prefix=/opt/mecab --with-charset=utf8;make install
# wget -q -O - http://sourceforge.net/projects/mecab/files/mecab-ipadic/2.7.0-20070801/mecab-ipadic-2.7.0-20070801.tar.gz/download|tar zx
# cd mecab-ipadic-2.7.0-20070801
# export PATH=$PATH:/opt/mecab/bin
# ./configure --prefix=/opt/mecab --with-charset=utf8;make install

# cpan Text::MeCab

# wget -q -O - http://sourceforge.net/projects/mecab/files/mecab-perl/0.98/mecab-perl-0.98.tar.gz/download|tar zx # *6
# cd mecab-perl-0.98
# perl Makefile.PL
# make install

*6 本家MeCab.pm

さて、検証コードはこちら
#!/usr/bin/perl
use 5.10.0;
use Benchmark;
use Data::Dumper;
use Encode;
use Text::Kakasi;
use Text::ChaSen;
use Text::MeCab;
use MeCab;

sub kakasiA
{
my($s) = @_;

$s = Text::Kakasi->new(qw(-iutf8 -outf8 -HK -KK -JK))->get($s);
utf8::encode($s);

return($s);
}

sub kakasiB
{
my($s) = @_;

Text::Kakasi::getopt_argv(qw(kakasi -ieuc -oeuc -HK -KK -JK));
Encode::from_to($s,"utf-8","euc-jp");
$s = Text::Kakasi::do_kakasi($s);
Encode::from_to($s,"euc-jp","utf-8");
Text::Kakasi::close_kanwadic;

return($s);
}

sub chasen
{
my($s) = @_;

Text::ChaSen::getopt_argv(qw(chasen -i w -j -F %y));
$s = Text::ChaSen::sparse_tostr($s);
#$s = Text::ChaSen::sparse_tostr_long($s);

return($s);
}

sub mecabA
{
my($s) = @_;

my $s_;
my $mecab = Text::MeCab->new(qw(node_format %f[7] unk_format %M eos_format),"");
for(my $node = $mecab->parse($s);$node;$node = $node->next()){
$s_ .= $node->format($mecab);
}

return($s_);
}

sub mecabB
{
my($s) = @_;

my $mecab = MeCab::Tagger->new(q(--node-format=%f[7] --unk-format=%M --eos-format=));
$s = $mecab->parse($s);

return($s);
}

$s = "俺の巫女萌えぶろぐちゃん !!";
#say &kakasiA($s);
#say &kakasiB($s);
#say &chasen($s);
#say &mecabA($s);
#say &mecabB($s);

if($#ARGV != -1){
timethese(
$ARGV[0],
{
"Kakasi(->get)" =>sub(){&kakasiA($s)},
"Kakasi(do_kakasi)" =>sub(){&kakasiB($s)},
"Chasen" =>sub(){&chasen($s)},
"MeCab(Text::MeCab)" =>sub(){&mecabA($s)},
"MeCab(MeCab.pm)" =>sub(){&mecabB($s)}
}
);
}

__END__


実行時LD_LIBRARY_PATHを必要とする
# export LD_LIBRARY_PATH=/opt/chasen/lib:/opt/kakasi/lib:/opt/mecab/lib
結果
> Benchmark: timing 500 iterations of ChaSen, Kakasi(->get), Kakasi(do_kakasi), MeCab(MeCab.pm), MeCab(Text::MeCab)...
> ChaSen: 0 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 25000.00/s (n=500)
> Kakasi(->get): 0 wallclock secs ( 0.02 usr + 0.15 sys = 0.17 CPU) @ 2941.18/s (n=500)
> Kakasi(do_kakasi): 0 wallclock secs ( 0.01 usr + 0.10 sys = 0.11 CPU) @ 4545.45/s (n=500)
> MeCab(MeCab.pm): 1 wallclock secs ( 0.50 usr + 0.12 sys = 0.62 CPU) @ 806.45/s (n=500)
> MeCab(Text::MeCab): 1 wallclock secs ( 0.54 usr + 0.12 sys = 0.66 CPU) @ 757.58/s (n=500)
ChaSenが異常な程ぶっ飛んで高速……だが、これは恐らくキャッシュだろう

また試行回数を増やすとKakasiが
> /opt/kakasi/share/kakasi/kanwadict: Too many open files
で死んでしまう
Text::Kakasi::getopt_argvを外に出すことで回避はできるがBだけの話であって、Aはやっぱり死んでしまう

しかし
> Benchmark: timing 10000 iterations of ChaSen, Kakasi(do_kakasi), MeCab(MeCab.pm), MeCab(Text::MeCab)...
> ChaSen: 0 wallclock secs ( 0.25 usr + 0.04 sys = 0.29 CPU) @ 34482.76/s (n=10000)
> Kakasi(do_kakasi): 0 wallclock secs ( 0.27 usr + 0.01 sys = 0.28 CPU) @ 35714.29/s (n=10000)
> MeCab(MeCab.pm): 14 wallclock secs ( 9.89 usr + 3.65 sys = 13.54 CPU) @ 738.55/s (n=10000)
> MeCab(Text::MeCab): 14 wallclock secs (10.97 usr + 3.30 sys = 14.27 CPU) @ 700.77/s (n=10000)
これは……
add comment ( 3139 views )   |  permalink
UTF-8フラグを落とすのはどれが早いのか 
Wednesday, January 26, 2011, 07:49 AM - Programing, Programing / Perl
Posted by Administrator
Perl 5.8以降導入されたUTF-8フラグを落とすのはどれが早いのか

#!/usr/bin/perl
use Benchmark;
use Encode;

while(<STDIN>){
$gplv2 .= $_;
}
utf8::decode($gplv2);

timethese(
1000000,
{
"utf8::encode" =>sub(){
my $s = $gplv2;
utf8::encode($s);
return();
},
"Encode::encode" =>sub(){
my $s = Encode::encode("utf8",$gplv2);
return();
},
"Encode::encode_utf8" =>sub(){
my $s = Encode::encode_utf8($gplv2);
return();
}
}
);

STDINへぶち込むのはGPLv2 日本語訳全文
> Benchmark: timing 1000000 iterations of Encode::encode, Encode::encode_utf8, utf8::encode...
> Encode::encode: 14 wallclock secs (11.13 usr + 0.01 sys = 11.14 CPU) @ 89766.61/s (n=1000000)
> Encode::encode_utf8: 7 wallclock secs ( 5.56 usr + 0.01 sys = 5.57 CPU) @ 179533.21/s (n=1000000)
> utf8::encode: 4 wallclock secs ( 2.41 usr + 0.00 sys = 2.41 CPU) @ 414937.76/s (n=1000000)
圧勝過ぎワロタ
1 comment ( 1402 views )   |  permalink
DBI->connect_cached(...)は早いの? 
Tuesday, January 18, 2011, 06:12 PM - Programing, Programing / Perl
Posted by Administrator
DBIのpodにも書いてあるけど、->*_cachedは遅い
多少内部処理がお粗末になる覚悟でそれっぽいモジュールや後述するplan Aのような単純な実装を選択する方が望ましいだろう

検証用コード
#!/usr/bin/perl
use Benchmark;
use DBI;

sub plan_a
{
$sth1->bind_param(1,$ARGV[1],DBI::SQL_INTEGER);
$sth1->execute();
while($sth1->fetch()){
}
$sth1->finish();

return();
}

sub plan_b
{
my $dbh2 = DBI->connect(qw(dbi:mysql:database=a;host=172.16.2.205;port=3306 a));
my $sth2 = $dbh2->prepare("SELECT * FROM `a` WHERE 1 LIMIT 0,?");
$sth2->bind_param(1,$ARGV[1],DBI::SQL_INTEGER);
$sth2->execute();
while($sth2->fetch()){
}
$sth2->finish();
$dbh2->disconnect();

return();
}

sub plan_c
{
my $dbh3 = DBI->connect_cached(qw(dbi:mysql:database=a;host=172.16.2.205;port=3306 a));
my $sth3 = $dbh3->prepare_cached("SELECT * FROM `a` WHERE 1 LIMIT 0,?");
$sth3->bind_param(1,$ARGV[1],DBI::SQL_INTEGER);
$sth3->execute();
while($sth3->fetch()){
}

return();
}

printf("=== %d records, %d repeats ===\n",reverse(@ARGV));

$dbh1 = DBI->connect(qw(dbi:mysql:database=a;host=172.16.2.205;port=3306 a));
$sth1 = $dbh1->prepare("SELECT * FROM `a` WHERE 1 LIMIT 0,?");

sqrt($_) for(1..10000);
timethese(
$ARGV[0],
{
"A" =>\&plan_a,
"B" =>\&plan_b,
"C" =>\&plan_c,
}
);

__END__

の結果
> === 1 records, 10000 repeats ===
> Benchmark: timing 10000 iterations of A, B, C...
> A: 6 wallclock secs ( 1.23 usr + 0.34 sys = 1.57 CPU) @ 6369.43/s (n=10000)
> B: 41 wallclock secs ( 8.22 usr + 2.34 sys = 10.56 CPU) @ 946.97/s (n=10000)
> C: 17 wallclock secs ( 4.84 usr + 0.74 sys = 5.58 CPU) @ 1792.11/s (n=10000)
これは少々酷い
接続切断を繰り返すplan Bと比べれば2倍だろうが、plan Aから見れば大きな変化とは言えない

大体こういうふざけたモジュールが存在していたとして
package CachedDBI4;
use DBI;
use Data::Dumper;
use vars qw(%db_vars);
use vars qw(%st_vars);

sub connect_cached
{
my($f,@r) = @_;

return $CachedDBI4::db_vars{Dumper(@r)} ||= $f->connect(@r);
}


sub prepare_cached
{
my($f,@r) = @_;

return $CachedDBI4::st_vars{Dumper(@r)} ||= $f->prepare(@r);
}

*DBI::connect_cached = \&connect_cached;
*DBI::db::prepare_cached = \&prepare_cached;

__PACKAGE__

これを使ったスコアが
> === 1 records, 10000 repeats ===
> Benchmark: timing 10000 iterations of D...
> D: 9 wallclock secs ( 3.94 usr + 0.49 sys = 4.43 CPU) @ 2257.34/s (n=10000)
B, C, Dとどんぐりだが、こんなコードが最も早いとは思わなかったわけで

podに書いている時点で知れてる事だけど
add comment ( 694 views )   |  permalink
BlackCurtain::DBI::Cache::Memcachedって早いの? (2) 
Thursday, January 13, 2011, 12:10 PM - Programing, Programing / Perl
Posted by Administrator
実のところBlackCurtain::DBI::Cache::Memcachedが遅い、加えてその検証結果も予定調和だったりして

検証用コードは前回検証の付録と同一(timetheseの直前でCool'n'Quiteの影響を緩和するため糞ループを追加している)
sub BlackCurtainDBI
{
$sth2->bind_param(1,$ARGV[1],DBI::SQL_INTEGER);
$sth2->execute();
while($sth2->fetch()){
}
$sth2->finish();

return();
}

結果
> === 1 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 14 wallclock secs ( 7.73 usr + 2.12 sys = 9.85 CPU) @ 10152.28/s (n=100000)
> === 10 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 19 wallclock secs (13.48 usr + 1.76 sys = 15.24 CPU) @ 6561.68/s (n=100000)
> === 100 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 71 wallclock secs (62.88 usr + 3.10 sys = 65.98 CPU) @ 1515.61/s (n=100000)
この傾向は前回の検証でも十分観測できる

と、いうことはBlackCurtain::DBI::Cache::Memcached::stのsub FETCHが高い確率で悪
邪道だが動作そのものに問題ないので削除

そして結果
> === 1 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 9 wallclock secs ( 7.51 usr + 0.87 sys = 8.38 CPU) @ 11933.17/s (n=100000)
> === 10 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 16 wallclock secs (10.18 usr + 1.83 sys = 12.01 CPU) @ 8326.39/s (n=100000)
> === 100 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 58 wallclock secs (49.93 usr + 1.05 sys = 50.98 CPU) @ 1961.55/s (n=100000)
正直コード削れば早くなるのは実に当たり前の話だがな
ともかく今回に限らずsub FETCH{ ... }へ割り込むのは余り好ましくないっていうのははっきり見える結果

そしてこれを踏襲し
・ FETCH/STOREは最小限にしろ
・ サブクラスから->SUPERするな
・ コードを書くな
の3点を重点に置いて考えるならば

->execute(...)で勝手に全部キャッシュに突っ込んで->fetch(...)で->SUPER::fetch(...)しない
こんな感じのコードが好ましい
sub execute
{
my($f,@r) = @_;

my $memcached = \$f->{Database}->{Memcached_};
my $cache = \$f->{Memcached_cache};
(my $q = $f->{Statement}) =~s/\?/$f->{ParamValues}->{my $i++}/go;
if(!($$cache = $$memcached->get($q))){
$f->SUPER::execute();
push(@{$$cache},$row) while(my $row = $f->SUPER::fetch());
$$memcached->set($q,\@$$cache);

}
return($#{$$cache});
}

sub fetch
{
my($f,@r) = @_;

return(shift(@{$f->{Memcached_cache}}));
}

結果(5回試行して最高と最低を記載、どうしてか揺れる)
> === 1 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 7 wallclock secs ( 4.65 usr + 0.84 sys = 5.49 CPU) @ 18214.94/s (n=100000)
> === 1 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 11 wallclock secs ( 5.28 usr + 2.16 sys = 7.44 CPU) @ 13440.86/s (n=100000)
> === 10 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 7 wallclock secs ( 4.74 usr + 0.86 sys = 5.60 CPU) @ 17857.14/s (n=100000)
> === 10 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 12 wallclock secs ( 5.41 usr + 2.25 sys = 7.66 CPU) @ 13054.83/s (n=100000)
> === 100 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 8 wallclock secs ( 5.41 usr + 0.89 sys = 6.30 CPU) @ 15873.02/s (n=100000)
> === 100 records, 100000 repeats ===
> Benchmark: timing 100000 iterations of BCDBI(a)...
> BCDBI(a): 10 wallclock secs ( 5.54 usr + 1.67 sys = 7.21 CPU) @ 13869.63/s (n=100000)
1.0bと比べると理想的なカーブでパフォーマンスが向上している

pDBIが1, 10, 100 recordsで7739.94/s, 4095.00/s, 880.98/sである事を考えると13000/s以上の速度は上々といったところか
add comment ( 1541 views )   |  permalink

| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Next> Last>>