|
|
@ -272,7 +272,7 @@ $ cat robots.html | pup 'div#p-namespaces a'
|
|
|
|
<a href="/wiki/Robots_exclusion_standard" title="View the content page [c]" accesskey="c">
|
|
|
|
<a href="/wiki/Robots_exclusion_standard" title="View the content page [c]" accesskey="c">
|
|
|
|
Article
|
|
|
|
Article
|
|
|
|
</a>
|
|
|
|
</a>
|
|
|
|
<a href="/wiki/Talk:Robots_exclusion_standard" title="Discussion about the content page [t]" accesskey="t">
|
|
|
|
<a href="/wiki/Talk:Robots_exclusion_standard" rel="discussion" title="Discussion about the content page [t]" accesskey="t">
|
|
|
|
Talk
|
|
|
|
Talk
|
|
|
|
</a>
|
|
|
|
</a>
|
|
|
|
```
|
|
|
|
```
|
|
|
@ -282,16 +282,25 @@ $ cat robots.html | pup 'div#p-namespaces a json{}'
|
|
|
|
[
|
|
|
|
[
|
|
|
|
{
|
|
|
|
{
|
|
|
|
"accesskey": "c",
|
|
|
|
"accesskey": "c",
|
|
|
|
|
|
|
|
"children": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"text": "Article"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
],
|
|
|
|
"href": "/wiki/Robots_exclusion_standard",
|
|
|
|
"href": "/wiki/Robots_exclusion_standard",
|
|
|
|
"tag": "a",
|
|
|
|
"tag": "a",
|
|
|
|
"text": "Article",
|
|
|
|
|
|
|
|
"title": "View the content page [c]"
|
|
|
|
"title": "View the content page [c]"
|
|
|
|
},
|
|
|
|
},
|
|
|
|
{
|
|
|
|
{
|
|
|
|
"accesskey": "t",
|
|
|
|
"accesskey": "t",
|
|
|
|
|
|
|
|
"children": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"text": "Talk"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
],
|
|
|
|
"href": "/wiki/Talk:Robots_exclusion_standard",
|
|
|
|
"href": "/wiki/Talk:Robots_exclusion_standard",
|
|
|
|
|
|
|
|
"rel": "discussion",
|
|
|
|
"tag": "a",
|
|
|
|
"tag": "a",
|
|
|
|
"text": "Talk",
|
|
|
|
|
|
|
|
"title": "Discussion about the content page [t]"
|
|
|
|
"title": "Discussion about the content page [t]"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
]
|
|
|
|
]
|
|
|
@ -304,32 +313,30 @@ $ cat robots.html | pup -i 4 'div#p-namespaces a json{}'
|
|
|
|
[
|
|
|
|
[
|
|
|
|
{
|
|
|
|
{
|
|
|
|
"accesskey": "c",
|
|
|
|
"accesskey": "c",
|
|
|
|
|
|
|
|
"children": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"text": "Article"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
],
|
|
|
|
"href": "/wiki/Robots_exclusion_standard",
|
|
|
|
"href": "/wiki/Robots_exclusion_standard",
|
|
|
|
"tag": "a",
|
|
|
|
"tag": "a",
|
|
|
|
"text": "Article",
|
|
|
|
|
|
|
|
"title": "View the content page [c]"
|
|
|
|
"title": "View the content page [c]"
|
|
|
|
},
|
|
|
|
},
|
|
|
|
{
|
|
|
|
{
|
|
|
|
"accesskey": "t",
|
|
|
|
"accesskey": "t",
|
|
|
|
|
|
|
|
"children": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"text": "Talk"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
],
|
|
|
|
"href": "/wiki/Talk:Robots_exclusion_standard",
|
|
|
|
"href": "/wiki/Talk:Robots_exclusion_standard",
|
|
|
|
|
|
|
|
"rel": "discussion",
|
|
|
|
"tag": "a",
|
|
|
|
"tag": "a",
|
|
|
|
"text": "Talk",
|
|
|
|
|
|
|
|
"title": "Discussion about the content page [t]"
|
|
|
|
"title": "Discussion about the content page [t]"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
]
|
|
|
|
]
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
If the selectors only return one element the results will be printed as a JSON
|
|
|
|
|
|
|
|
object, not a list.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
$ cat robots.html | pup --indent 4 'title json{}'
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"tag": "title",
|
|
|
|
|
|
|
|
"text": "Robots exclusion standard - Wikipedia, the free encyclopedia"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Because there is no universal standard for converting HTML/XML to JSON, a
|
|
|
|
Because there is no universal standard for converting HTML/XML to JSON, a
|
|
|
|
method has been chosen which hopefully fits. The goal is simply to get the
|
|
|
|
method has been chosen which hopefully fits. The goal is simply to get the
|
|
|
|
output of pup into a more consumable format.
|
|
|
|
output of pup into a more consumable format.
|
|
|
|