2011年1月25日 星期二

擷取PDF檔內圖片

使用軟體的話,有以下工具

http://opensecrets.pixnet.net/blog/post/27841494

http://azo-freeware.blogspot.com/2008/08/some-pdf-image-extract-14.html

使用linux/perl的話,參考下列連結


使用手寫程式的話,如果是jpg檔,以這個python script而言很簡單

http://nedbatchelder.com/blog/200712/extracting_jpgs_from_pdfs.html

否則就要參考一些資料了

http://stackoverflow.com/questions/2693820/extract-images-from-pdf-without-resampling-in-python

http://www.jpedal.org/PDFblog/2010/04/understanding-the-pdf-file-format-how-are-images-stored/

這也證實了,如果不是單純的jpg圖檔的話,"擷取PDF檔內圖片"這件工作可能會很麻煩

涉及中文的話,可參考以下連結

http://ccckmit.wikidot.com/pdf:streamcoding

原版的pdf規格

http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf

簡明的pdf檔格式的說明:

http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/

節錄其中重點如下


b  closepath, fill,and stroke path.
B  fill and stroke path.
b*  closepath, eofill,and stroke path.
B*  eofill and stroke path.
BI  begin image.
BMC  begin marked content.
BT  begin text object.
BX  begin section allowing undefined operators.
c  curveto.
cm  concat. Concatenates the matrix to the current transform.
cs  setcolorspace for fill.
CS  setcolorspace for stroke.
d  setdash.
Do  execute the named XObject.
DP  mark a place in the content stream, with a dictionary.
EI  end image.
EMC  end marked content.
ET  end text object.
EX  end section that allows undefined operators.
f  fill path.
f*  eofill Even/odd fill path.
g  setgray (fill).
G  setgray (stroke).
gs  set parameters in the extended graphics state.
h  closepath.
i setflat.
ID  begin image data.
j  setlinejoin.
J  setlinecap.
k  setcmykcolor (fill).
K  setcmykcolor (stroke).
l  lineto.
m  moveto.
M  setmiterlimit.
n  end path without fill or stroke.
q  save graphics state.
Q  restore graphics state.
re  rectangle.
rg  setrgbcolor (fill).
RG  setrgbcolor (stroke).
s  closepath and stroke path.
S  stroke path.
sc  setcolor (fill).
SC  setcolor (stroke).
sh  shfill (shaded fill).
Tc  set character spacing.
Td  move text current point.
TD  move text current point and set leading.
Tf  set font name and size.
Tj  show text.
TJ  show text, allowing individual character positioning.
TL  set leading.
Tm  set text matrix.
Tr  set text rendering mode.
Ts  set super/subscripting text rise.
Tw set word spacing.
Tz  set horizontal scaling.
T*  move to start of next line.
v  curveto.
w  setlinewidth.
W  clip.
y  curveto.

TABLE 1: PDF Page Markup Operators
(Note: Equivalent PostScript operators are in boldface.)

2011年1月24日 星期一

2011年1月23日 星期日

perl將ascii轉成utf-8的方法

似乎每隔一陣子就會遇到,參考 http://perldoc.perl.org/utf8.html



    use utf8;
    no utf8;
    # Convert the internal representation of a Perl scalar to/from UTF-8.
    $num_octets = utf8::upgrade($string);
    $success    = utf8::downgrade($string[, FAIL_OK]);
    # Change each character of a Perl scalar to/from a series of
    # characters that represent the UTF-8 bytes of each original character.
    utf8::encode($string);  # "\x{100}"  becomes "\xc4\x80"
    utf8::decode($string);  # "\xc4\x80" becomes "\x{100}"
    $flag = utf8::is_utf8(STRING); # since Perl 5.8.1
    $flag = utf8::valid(STRING);

2011年1月22日 星期六

解剖學小測驗

這個網站放了一些解剖學的小測驗,感覺滿齊全的

http://msjensen.cehd.umn.edu/WEBANATOMY/

一開始是被這個網站帶進來的,它的建議就見仁見智囉

http://www.squidoo.com/how-to-study-anatomy

2011年1月21日 星期五

醫學名詞縮寫

比較美式的可以看這個,但文件本身不全

http://sagemb.com/info-resources/medical-billing-reference/medical-abbreviations

但是拉丁文太多了,可能還需要參考下面這些文件

194.8.8.217/library/lat/NTL/MED/slowar_s-x.txt_Piece40.61

http://dic.academic.ru/dic.nsf/enc_medicine/24855/%D0%9F%D1%80%D0%B8%D0%BB%D0%BE%D0%B6%D0%B5%D0%BD%D0%B8%D0%B5

常用的列表如下


a. — arteria
aa. — arteriae
ant. — anterior
b. — bursa
Bac. — Bacillus
Bact. — Bacterium
bb. — bursae
dext. — dexter
ext. — externus
f. — fascia
ff. — fasciae
inf. — inferior
int. — internus
lat. — lateralis
lig. — ligamentum
ligg. — ligamenta
m. — musculus
med. — medialis
mm. — musculi
n. — nervus
nn. — nervi
post. — posterior
r. — ramus
rr. — rami
sin. — sinister
sup. — superior
v. — vena
vag. — vagina
vagg. — vaginae
vv. — venae