override HTML title parsing with a data-title="... attribute

If you don't use a <h1> to markup your post's title (but h2), it
is no longer possible to reliable detect the site's title.

E.g. you have a single page with only one <h1> and that's the
*real* title of that page. But on the other hand, it is also
possible, that the <h1> tag is just your website's name and the
actual post title is marked up in <h2>.
This commit is contained in:
Martin Zimmermann 2013-11-02 15:07:42 +01:00
parent 46d5ccc38f
commit 0d07515c18
2 changed files with 51 additions and 3 deletions

View File

@ -88,6 +88,41 @@ current comment count.
This functionality is already included when you embed `embed.min.js`, do
*not* mix `embed.min.js` and `count.min.js` in a single document.
### Client Configuration
You can configure the client (the JS part) via `data-` attributes:
* data-title
When you start a new thread (= first comment on a page), Isso sends
a GET request that page to see if it a) exists and b) parse the site's
heading (currently used as subject in emails).
Isso assumes that the title is inside an `h1` tag near the isso thread:
```html
<html>
<body>
<h1>Website Title</h1>
<article>
<header>
<h1>Post Title</h1>
<section id="isso-thread">
...
```
In this example, the detected title is `Post Title` as expected, but some
older sites may only use a single `h1` as their website's maintitle, and
a `h2` for the post title. Unfortunately this is unambiguous and you have
to tell Isso what's the actual post title:
```html
<section data-title="Post Title" id="isso-thread">
```
Make sure to escape the attribute value.
### Webserver configuration
* nginx configuration to run Isso on `/isso`:

View File

@ -7,9 +7,10 @@ import datetime
from itertools import chain
try:
from urllib import unquote
from urlparse import urlparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlparse, unquote
import html5lib
@ -81,7 +82,7 @@ def title(data, default=u"Untitled."):
which is the nearest H1 node in context to an element with the `isso-thread` id.
>>> title("asdf") # doctest: +IGNORE_UNICODE
u'Untitled.'
'Untitled.'
>>> title('''
... <html>
... <head>
@ -101,7 +102,14 @@ def title(data, default=u"Untitled."):
... </article>
... </body>
... </html>''') # doctest: +IGNORE_UNICODE
u'Can you find me?'
'Can you find me?'
>>> title('''
... <html>
... <body>
... <h1>I'm the real title!1
... <section data-title="No way%21" id="isso-thread">
... ''') # doctest: +IGNORE_UNICODE
'No way!'
"""
html = html5lib.parse(data, treebuilder="dom")
@ -137,6 +145,11 @@ def title(data, default=u"Untitled."):
for item in gettext(child):
yield item
try:
return unquote(el.attributes["data-title"].value)
except (KeyError, AttributeError):
pass
while el is not None: # el.parentNode is None in the very end
visited.append(el)