Skip to content

Latest commit

 

History

History
83 lines (67 loc) · 2.37 KB

module.md

File metadata and controls

83 lines (67 loc) · 2.37 KB

Module

Summary

The wikibase-dump-filter module exposes the functions used internally by the CLI to parse, map, filter, and serialize entities from a Wikibase JSON dumps. This should allow to write more custom filters without having to start from scratch.

parseEntitiesStream

The all-in-one helper

const { parseEntitiesStream } = require('wikibase-dump-filter')
const options = {
  type: 'item',
  keep: [ 'labels', 'claims' ]
  simplified: true,
  languages: [ 'zh', 'fr' ]
}
parseEntitiesStream(process.stdin, options)
.pipe(process.stdout)

custom parsers

The same behavior can be implemented by using the underlying helpers:

const { getEntitiesStream, buildFilter, buildFormatter, serialize } = require('wikibase-dump-filter')

// Build a filter from options documented above
const customFilter = buildFilter({
  type: 'item',
  claim: 'P31:Q571&P300',
  sitelink: 'zhwiki&frwiki'
})

// Build a formatter from options documented above
const customFormatter = buildFormatter({
  simplified: true,
  keep: [ 'labels', 'claims' ]
  languages: [ 'zh', 'fr' ]
})

// Get a stream of entities with `map`, `filter`, `filterAndMap`, and `tap` methods
getEntitiesStream(process.stdin)
.filter(customFilter)
.map(customFormatter)
.map(serialize)
.pipe(process.stdout)

Or in a more condensed way

const { filterFormatAndSerialize } = require('wikibase-dump-filter')
const options = {
  type: 'item',
  keep: [ 'labels', 'claims' ]
  simplified: true,
  languages: [ 'zh', 'fr' ]
}
getEntitiesStream(process.stdin)
.filterAndMap(filterFormatAndSerialize(options))
.pipe(process.stdout)

even more custom parsers

Even more customized behaviors can be implemented by writting your own filter and map functions

const entityIdIsOdd= entity => parseInt(entity.id.slice(1)) % 2 === 1
const getClaims = entity => entity.claims

const oddEntitiesClaimsStream = GetEntitiesStream(process.stdin)
  .filter(entityIdIsOdd)
  .map(getClaims)