# pup `pup` is a command line tool for processing HTML. It read from stdin, prints to stdout, and allows the user to filter parts ot the page using [CCS selectors](http://www.w3schools.com/cssref/css_selectors.asp). Inspired by [`jq`](http://stedolan.github.io/jq/), `pup` aims to be a fast and flexible way of exploring HTML from the terminal. ## Install go get github.com/ericchiang/pup ## Examples Download a webpage with `wget`. _Please exercise restraint when using any automated request tool._ ```bash $ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html ``` ###Clean and indent By default, `pup` will fill in missing tags, and properly indent the page. ```bash $ cat robots.html # nasty looking html $ cat robots.html | pup # cleaned and indented html ``` ###Filter by tag ``` $ pup < robots.html title Robots exclusion standard - Wikipedia, the free encyclopedia ``` ###Filter by id ``` $ pup < robots.html span#See_also See also ``` ###Chain selectors together The following two commands are equivalent. ``` $ pup < robots.html table.navbox ul a | tail ``` ``` $ pup < robots.html table.navbox | pup ul | pup a | tail ``` Both produce the ouput: ``` Stop words Poison words Content farm ``` ###How many nodes are selected by a filter? ``` $ pup < robots.html a -n 283 ``` ###Limit print level ``` $ pup < robots.html table -l 2 ... ...
... ``` ## TODO: * Attribute css selectors. * Print attribute value rather than html ({href}) * Print result as JSON (--json) * Print colorfully