|
|
|
@ -57,7 +57,7 @@ Download a webpage with wget.
|
|
|
|
|
$ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
####Clean and indent
|
|
|
|
|
#### Clean and indent
|
|
|
|
|
|
|
|
|
|
By default pup will fill in missing tags and properly indent the page.
|
|
|
|
|
|
|
|
|
@ -68,7 +68,7 @@ $ cat robots.html | pup --color
|
|
|
|
|
# cleaned, indented, and colorful HTML
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
####Filter by tag
|
|
|
|
|
#### Filter by tag
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
$ cat robots.html | pup 'title'
|
|
|
|
@ -77,7 +77,7 @@ $ cat robots.html | pup 'title'
|
|
|
|
|
</title>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
####Filter by id
|
|
|
|
|
#### Filter by id
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
$ cat robots.html | pup 'span#See_also'
|
|
|
|
@ -86,7 +86,7 @@ $ cat robots.html | pup 'span#See_also'
|
|
|
|
|
</span>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
####Filter by attribute
|
|
|
|
|
#### Filter by attribute
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
$ cat robots.html | pup 'th[scope="row"]'
|
|
|
|
@ -113,7 +113,7 @@ $ cat robots.html | pup 'th[scope="row"]'
|
|
|
|
|
</th>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
####Pseudo Classes
|
|
|
|
|
#### Pseudo Classes
|
|
|
|
|
|
|
|
|
|
CSS selectors have a group of specifiers called ["pseudo classes"](
|
|
|
|
|
https://developer.mozilla.org/en-US/docs/Web/CSS/Pseudo-classes) which are pretty
|
|
|
|
@ -150,7 +150,7 @@ For a complete list, view the [implemented selectors](#implemented-selectors)
|
|
|
|
|
section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
####`+`, `>`, and `,`
|
|
|
|
|
#### `+`, `>`, and `,`
|
|
|
|
|
|
|
|
|
|
These are intermediate characters that declare special instructions. For
|
|
|
|
|
instance, a comma `,` allows pup to specify multiple groups of selectors.
|
|
|
|
@ -165,7 +165,7 @@ $ cat robots.html | pup 'title, h1 span[dir="auto"]'
|
|
|
|
|
</span>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
####Chain selectors together
|
|
|
|
|
#### Chain selectors together
|
|
|
|
|
|
|
|
|
|
When combining selectors, the HTML nodes selected by the previous selector will
|
|
|
|
|
be passed to the next ones.
|
|
|
|
|