The wikibase-dump-filter
module exposes the functions used internally by the CLI to parse, map, filter, and serialize entities from a Wikibase JSON dumps. This should allow to write more custom filters without having to start from scratch.
The all-in-one helper
const { parseEntitiesStream } = require('wikibase-dump-filter')
const options = {
type: 'item',
keep: [ 'labels', 'claims' ]
simplified: true,
languages: [ 'zh', 'fr' ]
}
parseEntitiesStream(process.stdin, options)
.pipe(process.stdout)
The same behavior can be implemented by using the underlying helpers:
const { getEntitiesStream, buildFilter, buildFormatter, serialize } = require('wikibase-dump-filter')
// Build a filter from options documented above
const customFilter = buildFilter({
type: 'item',
claim: 'P31:Q571&P300',
sitelink: 'zhwiki&frwiki'
})
// Build a formatter from options documented above
const customFormatter = buildFormatter({
simplified: true,
keep: [ 'labels', 'claims' ]
languages: [ 'zh', 'fr' ]
})
// Get a stream of entities with `map`, `filter`, `filterAndMap`, and `tap` methods
getEntitiesStream(process.stdin)
.filter(customFilter)
.map(customFormatter)
.map(serialize)
.pipe(process.stdout)
Or in a more condensed way
const { filterFormatAndSerialize } = require('wikibase-dump-filter')
const options = {
type: 'item',
keep: [ 'labels', 'claims' ]
simplified: true,
languages: [ 'zh', 'fr' ]
}
getEntitiesStream(process.stdin)
.filterAndMap(filterFormatAndSerialize(options))
.pipe(process.stdout)
Even more customized behaviors can be implemented by writting your own filter and map functions
const entityIdIsOdd= entity => parseInt(entity.id.slice(1)) % 2 === 1
const getClaims = entity => entity.claims
const oddEntitiesClaimsStream = GetEntitiesStream(process.stdin)
.filter(entityIdIsOdd)
.map(getClaims)