1
0
mirror of https://github.com/ericchiang/pup synced 2024-11-24 08:58:08 +00:00

Update README.md

This commit is contained in:
Eric Chiang 2014-09-13 08:56:00 -04:00
parent fb2219a584
commit f00f4b450a

View File

@ -1,16 +1,32 @@
# pup # pup
`pup` is a command line tool for processing HTML. It reads from stdin, pup is a command line tool for processing HTML. It reads from stdin,
prints to stdout, and allows the user to filter parts ot the page using prints to stdout, and allows the user to filter parts ot the page using
[CCS selectors](http://www.w3schools.com/cssref/css_selectors.asp). [CCS selectors](http://www.w3schools.com/cssref/css_selectors.asp).
Inspired by [`jq`](http://stedolan.github.io/jq/), `pup` aims to be a Inspired by [jq](http://stedolan.github.io/jq/), pup aims to be a
fast and flexible way of exploring HTML from the terminal. fast and flexible way of exploring HTML from the terminal.
Looking for feature requests and argument design, feel free to open an
issue if you'd like to comment.
## Install ## Install
go get github.com/ericchiang/pup go get github.com/ericchiang/pup
## Quick start
```bash
$ curl http://www.pro-football-reference.com/years/2013/games.htm
```
Ew, HTML. Let's run that through some pup selectors:
```bash
$ curl http://www.pro-football-reference.com/years/2013/games.htm | \
pup table#games a[href*=boxscores] attr{href}
```
## Basic Usage ## Basic Usage
```bash ```bash
@ -25,7 +41,7 @@ $ pup < index.html [selectors and flags]
## Examples ## Examples
Download a webpage with `wget`. Download a webpage with wget.
```bash ```bash
$ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html $ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html
@ -33,7 +49,7 @@ $ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html
####Clean and indent ####Clean and indent
By default `pup` will fill in missing tags and properly indent the page. By default pup will fill in missing tags and properly indent the page.
```bash ```bash
$ cat robots.html $ cat robots.html
@ -60,8 +76,7 @@ $ pup < robots.html span#See_also
####Chain selectors together ####Chain selectors together
The following two commands are equivalent. (NOTE: pipes do not work with the The following two commands are (somewhat) equivalent.
`--color` flag)
```bash ```bash
$ pup < robots.html table.navbox ul a | tail $ pup < robots.html table.navbox ul a | tail
@ -86,12 +101,9 @@ Both produce the ouput:
</a> </a>
``` ```
####How many nodes are selected by a filter? Because pup reconstructs the HTML parse tree, funny things can
happen when piping two commands together. I'd recommend chaining
```bash commands rather than pipes.
$ pup < robots.html a -n
283
```
####Limit print level ####Limit print level
@ -197,4 +209,3 @@ $ pup < robots.html a attr{href} | head
## TODO: ## TODO:
* Print as json function `json{}` * Print as json function `json{}`
* Switch `-n` from a flag to a function