行有餘力則以學文: 8月 2014

2014年8月20日星期三

AnyEvent , POE , IO::Async 的比較

https://blog.afoolishmanifesto.com/posts/concurrency-and-async-in-perl/

這篇文章滿新的，作者比較了這三種事件驅動模式，作了以下結論：

初學者建議用 AnyEvent ，但工作上建議用 IO::Async 或 POE 。尤其是 IO::Async 作者非常推薦，某方面來說可能也是因為這個架構比較新的關係(吧?)。

2014年8月19日星期二

超酷的逆向代理(reverse proxy):使用bash one liner

參考 http://www.frameloss.org/2013/12/14/wicked-cool-reverse-proxy-with-bash-and-netcat/

Process substitution 是裏面最重要的概念，使得 pipe 可以同時雙向轉送。

File descriptors 也是滿新鮮的概念。檔案和 socket 因此可視為等效，使得 nc/netcat 的功能實質上可以用 cat 來達成。

或許很多程式一開始只要寫成針對 stdio / stdout 就好，再利用這些上述提到的概念，在 shell 的層次導入 socket / network 的參數。

這個小技巧可以 work ，首先要思考的是 proxy 和 nc 的本質。 proxy 本身同時是 server 和 client ，server端接受外來的連線，再利用 client 端將資料轉送到真實的 server 。nc 依執行時所提供的參數，可以是 client ，也可以是 server ，而資料則轉送到該程式自身的 stdin / stdout 。所以在實質上來看，同時啟動一個 nc server 和 nc client ，然後把它們的 stdin / stdout 串起來，就可以等效於一個 proxy。

延伸--bidirectional filter

理想上一個雙向 bidirectional filter 可以插入到這兩個 nc server/client 之間，但是 bash 的版本可能要在4以上 (2010年3月以後)。先檢查版本：

$ echo $BASH_VERSION

利用 mkfifo 建立 4 個具名管道 a b c d，使用<及>運算子將 a output b input 指定給 nc server 的 io ，c output d input 指定給 nc client ，然後~~將 a input b output c input d output 作為參數~~啟動 filter 。自行建立的 fileter 如果是用 perl 撰寫，可參考文件如何使用open打開管道。

如果讀寫之間不需要協調的話，那麼這樣形式的 filter 是更為簡單的，讀、寫各兩個管道，所以不需要使用具名管道。指令的形式如下

[ ! -p $pipefwd ] && mkfifo pipeback;#只需要執行一次

[ ! -p $pipefwd ] && mkfifo pipeback;nc -l -p 9090 <pipeback | perl -e "while(<>){print STDERR;}" | nc 127.0.0.2 9999 | perl -e "while(<>){print STDERR;}" > pipeback

如果需要在螢幕上顯示，可以搭配 tee

[ ! -p $pipefwd ] && mkfifo pipeback;nc -l -p 9090 <pipeback | tee >(perl -e "while(<>){print STDERR;}") | nc 127.0.0.2 9999 | tee >(perl -e "while(<>){print STDERR;}") > pipeback

20170202更新：嚴格說來 ssh 才是比較安全的作法，請參考 http://chimerhapsody.blogspot.tw/2015/09/ssh.html
http://chimerhapsody.blogspot.tw/2015/09/ssh.html

2014年8月16日星期六

驚!! PHP 也可以拿來寫 AP !?

Building Command-Line Applications with PHP

據文件言，這個特性是2003年開始支援的。說來慚愧，我是從來沒聽說過... 冏rz

github 上熱門的專案統計

GitHub 上的熱門專案

滿有趣的，看來是用討論的熱度來作統計。有時間的話，一定要好好的看一看啊…

perl 竟然幾乎全滅的了…冏rz

看來 javascript / css 真．的．是．很．紅．啊…

如何寫一個模組，並上傳到 cpan

模組的基本結構和要求
怎么样写一个模块上传到 CPAN

寫得太好了…

補充幾點：

模組架構可以用 Module::Starter 來建立，或是使用經典的 Module::Starter::PBP ，可參考使用 Module::Starter 建立模組框架
上傳可以用 CPAN::Uploader
寫模組的風格可以參考 PBP (Perl Best Practices)

一些跟 web integration 有關的 perl 資源

1. 相當於瀏覽器中的 javascirpt console 、開發人員工具的模組。搭配 one-liner 的絕配：

网页分析处理的极品模块Web::Scraper

2. 相當於離線瀏覽/砍站軟體(如 teleport pro, httrack )的 perl 實現方式。

Mojo 版本的 Perl 爬虫

Perl 整站采集有什么好方案

多线程的 Perl 爬虫

其中還提到重覆文字偵測的 bloom 演算法，的確是滿酷的

2014年8月15日星期五

具有版本控制功能的網頁空間 GitHub Pages

說明 https://help.github.com/articles/what-are-github-pages

手動建立方式 https://help.github.com/articles/creating-project-pages-manually

簡易建立方式 https://dl.dropboxusercontent.com/u/3813488/train/gitapp.pdf

API介面 https://developer.github.com/v3/repos/pages/

以javascript所撰寫的語音辨識程式庫

annyang 有些demo可以馬上試試；可處理中文； https://github.com/TalAter/annyang
Pocketsphinx.js 有些demo可以馬上試試；；可處理中文；https://github.com/syl22-00/pocketsphinx.js

其它注意事項：

20180708更新：目前 firefox 瀏覽器上已內建

git client的javascript implementation及其在協作平台的潛在應用

目前找到兩個

github.js ：它顯然是把在本機上執行的 git client 使用javascript 改寫，並且依賴於 node.js
js-git ：它不依賴於本機的檔案系統，因此也就有機會在網頁上單獨執行(git-browser)，不需要依賴 node.js

git 本身其實可以用來當作資料/檔案系統存放的工具， js-git 就是把一個 repo 給 clone 下來，放在記憶體中進行處理，有更動時再 push 回去。以此概念該作者寫了 tedit ，方便在平板上進行文件編輯。

git 本身就是一個多人協作的版本控制系統，反過來說，協作的特性某些程度上需要的就是 git 的功能，如 clone , commit , branching , reconcile , push 等等。

以下摘錄作者的說明：

Feature Goals

I don't intent to make a 100% clone of all the features of the official git program. That would be insane and require a lot more money than I'm asking for. My main goal is to enable the 90% case of interesting stuff:

Clone remote repositories to local storage over http, git, or ssh.
Make and commit local changes offline.
Manage tags and branches offline.
Push changes back up to remote repositories.
Serve git repositories over http, git, or ssh.
Be very modular so bits can be used by any software that needs them.

Potential Products

Some example products that would be enabled by this are:

ChromeOS IDE for developing on Chromebooks.
Node.JS blog engine with git as the database.
Custom Git hosting using custom storage back-ends.
GIT CLI for restricted environments.
Standalone GIT GUI desktop app.
Git based deployment tools.
JavaScript package management for server and client.
Whatever else you come up with.

網頁上的簡報--html5的應用

reveal.js

html5slides

deck.js

impress.js

CSSS

其中近來大家推崇的是 impress.js 。如果能寫個程式自動把大部頭的書/論文轉成簡報，應該可以方便很多人吧…

2014年8月14日星期四

mozrepl + emacs 的設定

http://blog.binchen.org/posts/how-to-do-htmljavascript-repl-read-eval-print-loop-with-no-server-set-up-2.html

關於直接 telnet 然後下達指令的部分，這個網頁的解說很詳細，值得參考

perl one-liner以mozrepl查詢firefox瀏覽器資料--以btkitty網頁內容的表格為例

perl -MNet::Telnet -e "$t=new Net::Telnet();$t->open(Host=>'localhost', Port=>4242);$t-> print('content.document.body.innerHTML');while(1){my $data=$t->get(Timeout=>1);print $data;}" | perl -MData::Dumper -MHTML::TreeBuilder::XPath -MHTML::Element -e "my $crlf=\"\n\";$string = do { local($/); <> }; $tree= HTML::TreeBuilder::XPath-> new_from_content($string);my $pages=$tree->findnodes( '//div[@class=\'pagination\']/span') -> [0]->as_text;$pages=~s/[^0-9]//g;my $table=$tree->findnodes( '//div[@class=\'list\']')->[0]; foreach my $row($table->findnodes('//dl[not(@class=\'banner\')]')){my @cells=$row-> content_list();print join('',map{$_->as_text.'|||' if ref($_)} $cells[1]-> content_list()),$crlf,$cells[1]->dump;my @dec=$cells[1]->descendants();print $dec[0] ,$crlf;print join('|||',map{$_->as_text if ref($_)}$cells[0]->content_list()),$crlf;}print $crlf;" >btkitty.txt

存到 btkitty.txt 再打開來看，因為原始網頁編碼是 utf8 ，直接輸出到螢幕會有亂碼，轉碼對非繁中語系還是會出問題。

windows平台上的doskey

http://forum.twbts.com/thread-10210-1-1.html

我覺得最有用的是 doskey /history>abc.txt

這個指令可以把之前打的指令存到abc.txt中，看是要做成批次檔，還是貼到網頁上，都非常好用

2014年8月13日星期三

sourceforge上有趣的語音辨識專案

http://cmusphinx.sourceforge.net/

這是 cmu 所開發的

http://julius.sourceforge.jp/en_index.php

這是京都大學所開發的

中文的專案目前還沒有，不過網路上現在有這麼多影片和字幕，拿來訓練這些辨識引擎應該很方便才對…

github上有趣的 python 專案

參考自 https://github.com/trending?l=python

1．scrapy 。據文件說是類似網路爬蟲的專案。有趣是有趣，但文件一看就很嚇人的多啊…

2．hardseed。對岸寫的抓妙蛙種子的程式~~，重點是因為可以利用proxy繞過金盾，所以在對岸火得很啊…~~

perl one-liner以mozrepl查詢firefox瀏覽器資料--以海盜灣(the pirate bay)網頁內容的表格為例

在此將問題分解為三個部分，分別是

取得網頁原始碼。結果為一個檔案；generic
取得表格原始碼。結果為列導向的多筆資料，特定分隔符號分隔各欄位；site specific
對表格原始碼進行後處理，得到想要的資料，排列成所需的格式；requirement specific

1．取得網頁原始碼。使用 mozrepl 來取得原始碼有許多好處，可以忽略登入的問題，不需處理解壓縮的問題，可以處理 javascript ，各種好處。

perl -MNet::Telnet -e "$t=new Net::Telnet();$t->open(Host=>'localhost', Port=>4242); $t-> print ('content.document.body.innerHTML');while(1){my $data=$t->get(Timeout=>1);print $data;}"

這個程式碼片斷會取得顯示中頁面的原始碼

2．表格原始碼的取得。列舉的方式有 map 或 foreach ，兩者我都列出來，供大家參考。範例頁面為 http://thepiratebay.se/top/all ，其中列有當日前100名資源的資料。為方便觀察，將結果存到 top100.txt。

perl -MNet::Telnet -e "$t=new Net::Telnet();$t->open(Host=>'localhost', Port=>4242);$t->print('content.document.body.innerHTML');while(1){my $data=$t->get(Timeout=>1);print $data;}"

| perl -MData::Dumper -MHTML::TreeBuilder::XPath -MHTML::Element -e "$string = do { local($/); <> }; $tree= HTML::TreeBuilder::XPath-> new_from_content($string);my @results=$tree->findnodes( '/html/body/div[@id=\"content\"]/div[@id=\"main-content\"]/table[@id=\"searchResult\"]/tbody') ;foreach my $table(@results){foreach my $row($table->findnodes('.//tr')){my @cells=$row-> findnodes ('.//td'); print join(\"\n\", map{ $_->as_HTML if ref($_)} $cells[1]-> content_list() ), \"\n\";foreach $acell($cells[1]->content_list()){print $acell->as_text.\"\n\" if ref($acell) ;};print $cells[0]->string_value,\"\n\";}print \"\n\";}" > top100.txt

這個程式片斷前半就是抓網頁原始碼的程式，使用 pipe 將結果做過濾

3．to be continued...

2014年8月12日星期二

perl one-liner以mozrepl查詢firefox瀏覽器資料--以標題為例

perl -MNet::Telnet -MEncode -e "$t=new Net::Telnet(Dump_Log=>\*STDOUT);$t->open (Host=>'localhost', Port=>4242);$t->print('document.title');while(1){my $data=$t->get (Timeout=>1);print encode('big5',decode('utf8',$data));}"

要先裝好mozrepl
本例在win8上測試正常
encode的部分視所在環境而調整，在ubuntu上完全可以拿掉
或許你會說，幹麻不用 WWW::Mechanize::Firefox 就好了，問題是它在windows上沒有人 port 啊 XD
嘗試一下 WWW::Mechanize::Firefox 的 porting 。下載並解壓後，執行 perl makefile.pl ，出現以下錯誤訊息：
Warning: prerequisite HTML::Selector::XPath 0 not found.
Warning: prerequisite MozRepl::RemoteObject 0.31 not found.
Warning: prerequisite Object::Import 0 not found.
打開 ppm ，安裝上述三個套件。
MozRepl::RemoteObject 可能無法用 ppm 安裝，此時下載該套件的 tar.gz 檔，解壓後進入子目錄執行 perl Makefile.PL ，再將 lib 子目錄中的所有內容複製到 C:\Perl64\site\lib (視perl 安裝在何處而定)
補充一下，ppm上沒有的套件，在不需 c compiler 的情況下，可以下達 cpan WWW::Mechanize::Firefox 安裝

mozrepl在putty/pietty和php中的使用

http://www.codediesel.com/tools/peeking-inside-firefox-using-mozrepl/

很奇怪的，我使用 pietty 時，一旦連上馬上就被切斷。可能要研究一下 pietty 本身的設定，因為用手寫的 perl script 去連就不會被切斷…

2014年8月7日星期四

perl one-liner查詢網頁資料--以下載yyets某頁面上所有字幕為例

perl -MLWP::Simple -e "getprint('http://www.yyets.com/search/index?keyword=%E7%A1%85%E8%B0%B7&type=tv');" |perl -e "while(<>){print \"start http://www.yyets.com/subtitle/index/download?id=$1\n\" if m/\"http.+?subtitle\/(.+?)\"/;}" > abc.bat

然後執行所產生的abc.bat 即可

2014年8月6日星期三

perl one-liner查詢網頁資料--以103年指考放榜為例

perl -MLWP::Simple -e "for $i(21011601..21011842){getprint('http://fast.uac.edu.tw/'.$i);}" | perl -MHTML::Entities -e "while (<>){print decode_entities( \"$1\n\" )if m/(准考證號 :.*?)
<\/BODY/;}"

這是在win8上執行的形式，其它平台可能要做些修正
已安裝LWP及HTML模組
使用pipe將第一段程式的結果導向到第二段，此時可以while(<>) 做逐行讀取的動作

常用資訊速查

2014年8月20日 星期三

2014年8月19日 星期二

延伸--bidirectional filter

2014年8月16日 星期六

2014年8月15日 星期五

Feature Goals

Potential Products

2014年8月14日 星期四

2014年8月13日 星期三

2014年8月12日 星期二

2014年8月7日 星期四

2014年8月6日 星期三