Skip to content

Commit b50db2e

Browse files
authored
fixed indexing of external posts (#2983)
This should fix several issues with indexing external posts, including #1828. In short, I found that the issue with indexing was that the index builder was receiving 'empty' documents. To fix that, I'm setting the document content to be the post content as retrieved from the rss feed or the text extracted from the external page. I've tested with various blog sources and it seems to be working as expected now.
1 parent 15fc779 commit b50db2e

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

_plugins/external-posts.rb

+7-2
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ def create_document(site, source_name, url, content)
6262
doc.data['description'] = content[:summary]
6363
doc.data['date'] = content[:published]
6464
doc.data['redirect'] = url
65+
doc.content = content[:content]
6566
site.collections['posts'].docs << doc
6667
end
6768

@@ -90,8 +91,12 @@ def fetch_content_from_url(url)
9091
parsed_html = Nokogiri::HTML(html)
9192

9293
title = parsed_html.at('head title')&.text.strip || ''
93-
description = parsed_html.at('head meta[name="description"]')&.attr('content') || ''
94-
body_content = parsed_html.at('body')&.inner_html || ''
94+
description = parsed_html.at('head meta[name="description"]')&.attr('content')
95+
description ||= parsed_html.at('head meta[name="og:description"]')&.attr('content')
96+
description ||= parsed_html.at('head meta[property="og:description"]')&.attr('content')
97+
98+
body_content = parsed_html.search('p').map { |e| e.text }
99+
body_content = body_content.join() || ''
95100

96101
{
97102
title: title,

0 commit comments

Comments
 (0)