Nokogiriのインストールと練習

目次

Nokogiriのインストール

とりあえず入れてみないことにははじまらないということで、インストールしてみる。
「Nokogiri インストール」で検索するとなんだか一筋縄には行かなそうな感じ

記録残すの忘れてたので普通にもう一回やってみる。

動作環境

シェル

➜  nokogiriprac echo $SHELL
echo $SHELL
/usr/local/bin/zsh
➜  nokogiriprac /usr/local/bin/zsh --version
/usr/local/bin/zsh --version
zsh 5.4.1 (x86_64-apple-darwin14.5.0)
➜  nokogiriprac

OS

OS X Yosemite バージョン10.10.5

Homebrew

➜  nokogiriprac brew --version
brew --version
Homebrew 1.3.1
Homebrew/homebrew-core (git revision e097; last commit 2017-08-28)
➜  nokogiriprac

gem

➜  nokogiriprac gem --version
gem --version
2.6.11

消す作業

➜  nokogiriprac gem uninstall nokogiri
Remove executables:
    nokogiri

in addition to the gem? [Yn]  y
Removing nokogiri
Successfully uninstalled nokogiri-1.8.0
➜  nokogiriprac gem list

*** LOCAL GEMS ***

bigdecimal (default: 1.3.0)
did_you_mean (1.1.0)
io-console (default: 0.4.6)
json (default: 2.0.2)
mini_portile2 (2.2.0)
minitest (5.10.1)
net-telnet (0.1.1)
openssl (default: 2.0.3)
power_assert (0.4.1)
psych (default: 2.2.2)
rake (12.0.0)
rdoc (default: 5.0.0)
test-unit (3.2.3)
xmlrpc (0.2.1)
➜  nokogiriprac
➜  nokogiriprac brew uninstal libxml2
Error: Refusing to uninstall /usr/local/Cellar/libxml2/2.9.4_4
because it is required by libxslt, which is currently installed.
You can override this and force removal with:
  brew uninstall --ignore-dependencies libxml2
➜  nokogiriprac brew uninstal libxslt
Uninstalling /usr/local/Cellar/libxslt/1.1.29... (147 files, 3MB)
➜  nokogiriprac brew uninstall libxml2
Uninstalling /usr/local/Cellar/libxml2/2.9.4_4... (281 files, 10.5MB)
➜  nokogiriprac brew list
autoconf    mecab-ipadic    python      sqlite      zsh-completions
gdbm        openssl     readline    swi-prolog
gmp     openssl@1.1 ruby        vim
libyaml     pcre        scheme48    wget
mecab       perl        sl      zsh
➜  nokogiriprac

さあはじめよう

はじめにみるべきは、これ。

概要が書いてあって全体像が掴みやすい。
どうやらNokogiriを使うには、Libxml2Libxsltっていうやつがいるらしい
xsltだからxmlとかhtmlとかを解析する系の何かだろう。
とりあえずそれをインストール

Libxml2,Libxsltインストール
➜  nokogiriprac brew install libxml2 libxslt
==> Downloading https://homebrew.bintray.com/bottles/libxml2-2.9.4_4.yosemite.bottle.tar.gz
Already downloaded: /Users/taka/Library/Caches/Homebrew/libxml2-2.9.4_4.yosemite.bottle.tar.gz
==> Pouring libxml2-2.9.4_4.yosemite.bottle.tar.gz
==> Caveats
This formula is keg-only, which means it was not symlinked into /usr/local,
because macOS already provides this software and installing another version in
parallel can cause all kinds of trouble.

If you need to have this software first in your PATH run:
  echo 'export PATH="/usr/local/opt/libxml2/bin:$PATH"' >> ~/.zshrc

For compilers to find this software you may need to set:
    LDFLAGS:  -L/usr/local/opt/libxml2/lib
    CPPFLAGS: -I/usr/local/opt/libxml2/include


If you need Python to find bindings for this keg-only formula, run:
  echo /usr/local/opt/libxml2/lib/python2.7/site-packages >> /usr/local/lib/python2.7/site-packages/libxml2.pth
  mkdir -p /Users/taka/Library/Python/2.7/lib/python/site-packages
  echo 'import site; site.addsitedir("/usr/local/lib/python2.7/site-packages")' >> /Users/taka/Library/Python/2.7/lib/python/site-packages/homebrew.pth
==> Summary
🍺  /usr/local/Cellar/libxml2/2.9.4_4: 281 files, 10.5MB
==> Downloading https://homebrew.bintray.com/bottles/libxslt-1.1.29.yosemite.bottle.tar.gz
Already downloaded: /Users/taka/Library/Caches/Homebrew/libxslt-1.1.29.yosemite.bottle.tar.gz
==> Pouring libxslt-1.1.29.yosemite.bottle.tar.gz
==> Caveats
To allow the nokogiri gem to link against this libxslt run:
  gem install nokogiri -- --with-xslt-dir=/usr/local/opt/libxslt

This formula is keg-only, which means it was not symlinked into /usr/local,
because macOS already provides this software and installing another version in
parallel can cause all kinds of trouble.

If you need to have this software first in your PATH run:
  echo 'export PATH="/usr/local/opt/libxslt/bin:$PATH"' >> ~/.zshrc

For compilers to find this software you may need to set:
    LDFLAGS:  -L/usr/local/opt/libxslt/lib
    CPPFLAGS: -I/usr/local/opt/libxslt/include


If you need Python to find bindings for this keg-only formula, run:
  echo /usr/local/opt/libxslt/lib/python2.7/site-packages >> /usr/local/lib/python2.7/site-packages/libxslt.pth
  mkdir -p /Users/taka/Library/Python/2.7/lib/python/site-packages
  echo 'import site; site.addsitedir("/usr/local/lib/python2.7/site-packages")' >> /Users/taka/Library/Python/2.7/lib/python/site-packages/homebrew.pth
==> Summary
🍺  /usr/local/Cellar/libxslt/1.1.29: 147 files, 3MB
➜  nokogiriprac brew link --force libxml2
Linking /usr/local/Cellar/libxml2/2.9.4_4... 21 symlinks created

If you need to have this software first in your PATH instead consider running:
  echo 'export PATH="/usr/local/opt/libxml2/bin:$PATH"' >> ~/.zshrc
➜  nokogiriprac brew link --force libxslt
Linking /usr/local/Cellar/libxslt/1.1.29... 22 symlinks created

If you need to have this software first in your PATH instead consider running:
  echo 'export PATH="/usr/local/opt/libxslt/bin:$PATH"' >> ~/.zshrc
➜  nokogiriprac
Nokogiriインストール
➜  nokogiriprac gem install nokogiri
Fetching: nokogiri-1.8.0.gem (100%)
Building native extensions.  This could take a while...
Successfully installed nokogiri-1.8.0
Parsing documentation for nokogiri-1.8.0
Installing ri documentation for nokogiri-1.8.0
Done installing documentation for nokogiri after 14 seconds
1 gem installed
➜  nokogiriprac gem list
確認
➜  nokogiriprac nokogiri -v
# Nokogiri (1.8.0)
    ---
    warnings: []
    nokogiri: 1.8.0
    ruby:
      version: 2.4.1
      platform: x86_64-darwin14
      description: ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin14]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/usr/local/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.0/ports/x86_64-apple-darwin14.5.0/libxml2/2.9.4"
      libxslt_path: "/usr/local/lib/ruby/gems/2.4.0/gems/nokogiri-1.8.0/ports/x86_64-apple-darwin14.5.0/libxslt/1.1.29"
      libxml2_patches:
      - 0001-Fix-comparison-with-root-node-in-xmlXPathCmpNodes.patch
      - 0002-Fix-XPointer-paths-beginning-with-range-to.patch
      - 0003-Disallow-namespace-nodes-in-XPointer-ranges.patch
      libxslt_patches:
      - 0001-Fix-heap-overread-in-xsltFormatNumberConversion.patch
      - 0002-Check-for-integer-overflow-in-xsltAddTextString.patch
      compiled: 2.9.4
      loaded: 2.9.4
➜  nokogiriprac

使ってみよう

試しにから出場選手名を取得してみる。

boatrace結果

require 'open-uri'

require 'nokogiri'

doc = Nokogiri::HTML(open("https://www.boatrace.jp/owpc/pc/race/raceresult?rno=12&jcd=01&hd=20170827"))

p doc.title

nodes = doc.xpath("//span[@class='is-fs18 is-fBold']")

nodes.each do |node| 
  p node.text.tr!(" ","")
end

出力結果

➜  nokogiriprac ruby xpath.rb
"結果|BOAT RACE オフィシャルウェブサイト"
"萩原秀人"
"古澤光紀"
"河村了"
"川上聡介"
"荒井輝年"
"青木玄太"
➜  nokogiriprac

といった感じでとりあえず使ってみた。
参考

0xC2A0の消えないスペース

私がちょっと悩んだとこを簡単に書いておく 半角スペースは、htmlで「  」とかくことがあるらしい。
しかしこれは、0x20の半角スペースとは、異なる物である。
詳しくは、c2a0とか、 とかで調べてみてほしい。