行有餘力則以學文

2015年7月29日星期三

pdftotext

使用說明參考 http://linux.die.net/man/1/pdftotext

Options

-f number: Specifies the first page to convert.
-l number: Specifies the last page to convert.
-r number: Specifies the resolution, in DPI. The default is 72 DPI.
-x number: Specifies the x-coordinate of the crop area top left corner
-y number: Specifies the y-coordinate of the crop area top left corner
-W number: Specifies the width of crop area in pixels (default is 0)
-H number: Specifies the height of crop area in pixels (default is 0)
-layout: Maintain (as best as possible) the original physical layout of the text. The default is to 'undo' physical layout (columns, hyphenation, etc.) and output the text in reading order.

2015年7月24日星期五

R存取期刊論文全文的介面：fulltext

https://github.com/ropensci/fulltext

真心希望早日打破各出版社聯合壟斷的局面

2015年7月20日星期一

" 大數據(big data)"之"方法論(methodology)"

http://www.36dsj.com/archives/247 這篇講得不錯，四平八穩

http://www.36dsj.com/archives/8911 這篇從ML的角度來講，所以偏向AI

在規則的發現上，有兩種方式，一種從統計的角度出發(Association Rules)，一種從NLP的角度下手(Grammar Induction)

關聯式規則(Association Rules, AR)

https://zh.wikipedia.org/wiki/%E5%85%B3%E8%81%94%E5%BC%8F%E8%A7%84%E5%88%99

http://blog.csdn.net/gjwang1983/article/details/45015203

http://blog.csdn.net/qq_25684755/article/details/46584805

https://github.com/dengyishuo/Top10Algorithm/blob/master/apriori/arules.md

應用在文字探勘，有以下應用

http://blog.csdn.net/u013946794/article/details/44246569

http://qxde01.blog.163.com/blog/static/6733574420132915952828/

http://qxde01.blog.163.com/blog/static/673357442013355192638/

2015年7月19日星期日

程式語言間的語法比較： RosettaCode.org

以 perl 常用的 map / grep 為例

http://rosettacode.org/wiki/List_comprehensions#Perl

有中文的介紹可參考

这个网站的要旨到底是什么？比如说你会 A 语言，又想看看如果用 B 语言写的话会如何。那么通过很多问题的解决的对比，你应该可以了解到 B 语言的语法，以及它跟你会的 A 语言的关系了。然后你就会注意到语言的设计模式和行为——也就是这门语言的习惯。你也会看到哪些习惯你已经有了，哪些习惯正则阻碍你的发展。

2015年7月18日星期六

ubuntu下對應到ultraedit的find in files的指令

http://edsionte.com/techblog/archives/3164

以前只會一招，就是用find搭配grep xargs，但是語法不好記，指令又長

這篇介紹的 grep -R -w -n "關鍵字" "目錄"，非常好用

現在我幾乎找不到任何原因在 windows 下寫程式了…

2015年7月17日星期五

heatmap究竟能畫得多複雜?

heatmap 的語法可參考

http://bioconductor.org/packages/devel/bioc/vignettes/ComplexHeatmap/inst/doc/ComplexHeatmap.html

提到的"微博"(大陸的山寨版tweet)有以下幾個

http://www.weibo.com/fly51fly

http://www.weibo.com/haoawesome

https://github.com/PuddingNnn/Rweibo/blob/master/topic/topicmodel.R