Skip to content
This repository was archived by the owner on Oct 18, 2019. It is now read-only.

How to improve IOOS Catalog searches

robragsdale edited this page Nov 17, 2014 · 16 revisions

Introduction

Searching for datasets in a catalog are based on knowing the Who, What, When and Where-- the 4 W's. Currently there are several search mechanisms built into the US IOOS Catalog that take advantage of underlying metadata records to find datasets based on which regional associations (Who) and geographic location bounding (Where). For example, the US IOOS Catalog Map allows the user to visually see all the bounding boxes of the available datasets for the AOOS Region and then select one to drill down and view some or all of its metadata to learn more about it.

To provide search for the What of the datasets, we need a capability to search what variables are available in the datasets and what the dataset is about in context. Currently, to find datasets with specific variables, the user of US IOOS Catalog has to rely on the title and short description to provide enough context about what types of variables are in the dataset or visit each dataset to get its list of variable names.

One step in building on this search capability would be to provide a similar search capability for variable name. For example, based on the available variable names, only show on the geographic map, the datasets containing the specific parameter the user is after. This may seem simple enough, but when presented with the a number of issues discussed below, there will be limitations on how this will work given the current translations and abilities of the endpoints and how people search. The following documentation outlines these issues and will conclude with some options for building in the capability to search for the What in the US IOOS Catalog.

Questions

Questions for consideration to help define the issues to enhancing the search capabilities of the catalog.

  • What is extracted from each service from DAP, SOS, WMS, ERDAP to provide labels, variables names, etc??
  • Are their limitations or inconsistencies in variable names harvested by the Catalog?
  • What are the underlying metadata records available to the Catalog that can be used to search context and variables names?

High Level

Use cases (wouldn't it be nice if) https://github.com/ioos/catalog/wiki/Use-Cases-for-Vocabulary-Search-Capability

Under the hood

To begin addressing the questions above, let's take a look under the hood.

The Catalog obtains all service URLs from the metadata managed via IOOS Service Registry and accessed through the ESRI Geoportal Server hosted by NGDC. The registry documentation outlines instructions for getting dataset metadata into the Service Registry. The web based catalog of IOOS services and datasets is available at http://catalog.ioos.us. Figure 1 shows the steps registration process

Registration process Figure 1. IOOS service metadata registration steps

Metadata fields from each of the service acceptable types (THREDDS, SOS, ERRDAP, WMS, and WAF) are harvested and translated nightly into ISO 19115-2 metadata records. The ISO metadata records are then published into EMMA and provided to the Geoportal Server.

Daily, the IOOS Catalog accesses the records on the Geoportal Server thru NGDC CSW Service Endpoint. These services are then individually queried by the Catalog and their metadata is harvested and indexed. The Catalog gathers all the service URLs and then creates it's own harvest record by independently querying each of the services found in the registry. This provides an opportunity to garner the rich information available directly from these distributed services and does not rely on the ISO metadata translations.

Variable names field

One example DapHarvester

(Need more technical detail here from ASA_?

keywords field

Examples of what variables translated and exposed by IOOS Catalog

Their are fields in the Catalog's current data model that should be added in the future. Fields that should be added:

  • Keywords
  • Standard_name
  • Units
  • Geospatial Awareness

This is related to a move to a PostGIS relational database. See Migrate to Post GIS #178.

How will these be searched if you don’t know what it is called or how it is spelled?

Examples of how this looks on the Catalog.

Beyond keyword variable name search to understand context for search

Concept and Topic Search Tools available

  • Outside Vocabulary/Ontology Services
  • Vocabularies on MMI-ORR
  • SPARQL Endpoint

How to provide semantic search capability in IOOS Catalog?

What’s the next step? More vocab mappings? One ontology to rule them all?