-
Notifications
You must be signed in to change notification settings - Fork 13
Use Cases for Vocabulary Search Capability
Use Cases for Implementation of Vocabulary Search Concepts into the IOOS Catalog [#72]
The IOOS Catalog (http://catalog.ioos.us) is the primary client of the IOOS Service Registry. The Registry harvests metadata from several types of registered services. The translations supported are for THREDDS, ERDAPP, SOS and WAF services into ISO-compliant metadata records stored in an ESRI Geoportal Catalog maintained by the National Geophysical Data Center (NGDC), called the NGDC Geoportal. The interface between the IOOS Registry and the IOOS Catalog is the CSW API interface provided by the NGDC Geoportal.
Data provided by the services are labeled by the Data Author or a Regional Data Manager. They find and use CF Standard Names or other terms with resolvable URI such as the IOOS Parameter or other registered vocabulary that represent the underlying data. Standard Names are indexed as keywords by the IOOS Registry (based on the ncISO translation), but no ordinary user will search by Standard Names.
The following use cases aim to show how refined searches on keywords or domain topics can be useful. The external ontologies can provide context for similar or related topics, keywords, or even specific data labels that the user can choose. This can significantly enhance a users effectiveness in navigating and finding metadata records harvested by the IOOS Catalog.
The ESRI Geoportal product provides some useful parametric search capabilities but may not meet all users needs. IOOS Catalog may further be limited by search parameters or function not supplied through the API interface. Additionally, queries may rely on information that is not present in the ESRI Geoportal Catalog but in external sources.
Concentrating on vocabulary search concepts limited to measured or observed properties of the atmosphere and ocean, for example such terms as found in the CF Standard Name Table or the IOOS Parameter Vocabulary.
The external resource is the MMI Ontology Registry and Repository where CF Standard Name Table and the IOOS Parameter Vocabulary are registered and mapped. This resource provides more context and relation for these terms then if these vocabularies were separate.
These initial constraints limit the sprawl and complexity of too many domains or too many external resources. However, the ideas could be expanded to include sensor and platform types or geographic names and organizations. It could also be expanded to a federation of ontologies.
- Users need help to find the data that is in the catalog and labeled with a certain identifier in metadata, like temperature or pressure or
sea_surface_temperature
orwater_temperature
orsea_water_temperature
, but don't know exactly how to spell parameter name such asclostridium_perfringens
.
- Actor: domain-expert user
- Actions:
a. Type free-text (auto-complete list)
b. Type free-text (list of narrowed entries of similar or related terms)
c. Select (hierarchical) vocabulary set as a tree
- User needs help to find data about a general topic like "storm water runoff" or "offshore wind energy" but may not know what parameters are measured or modeled or may not know what exists and want to start by taking a shot-in-the-dark or best-guess.
- Actor: general user
- Actions:
a. Type Free-text topic (auto-complete list)
b. Select (hierarchical) vocabulary set as a tree
- Also those providing data need to know what labels they can use to accurately describe the data. The labels are usually found in a vocabulary list with definition provided in some standard ontology or controlled vocabulary, like CF standard_name. But like the catalog user, the data provider may not know what exact label should be.
- Actor: Data Author or Regional Data Manager
- Actions:
a. Create metadata record variable field
b. Find standard_name (list to choose from)
c. Verify standard_name (context)
d. Assign standard_name (uri identifier)
Ways to Discover Through Geoportal
- ○ Simple keyword search
- ○ Spatial search
- ○ Lucene syntax
- ○ Searching other repositories from your geoportal
- ○ Browse
- ○ Ontology service
- ○ Search geoportal from client applications
Spatial Queries
- ○ Anywhere
- ○ Intersecting
- ○ Fully Within
Lucene Query Syntax ‐ Query Language Provided through Query Parser
- ○ Fields – e.g., search by ‘title’ or ‘abstract’
- ○ Fuzzy – e.g., ‘air~’ will find items containing terms like air and airplane, but also aid
- ○ Proximity Searches – e.g., to search for "air" and "quality" within 10 words of each
- ○ other in a document use the search: "air quality"~10
- ○ Boosting a Term – e.g., if you are searching for air quality and you want the term
- ○ "air" to be more relevant, boost it using the ^ symbol along with the boost factor
- ○ next to the term. You would type: air^2 quality
- ○ Boolean Operators – e.g., AND, "+", OR, NOT and "‐"
- ○ Grouping – e.g., (air OR water) AND quality will find documents containing the
- ○ words air and quality or the words water and quality or both.
REST API Search Parameters
- ○ bbox ‐ by extent specified as two pairs of coordinates (west‐south and eastnorth)
- ○ spatialRel – Used with bounding box (within, overlaps)
- ○ searchText ‐ Keyword search
- ○ contentType ‐ Esri content types (liveData, applications, clearinghouses)
- ○ dataCategory ‐ 19115 data theme keywords (elevation, boundaries)
- ○ f ‐ Output format of results (GeoRSS, HTML, fragment, KML)
- ○ style ‐ Associate style sheet for results formatting
- ○ rid ‐ Id associated with the repository