presentations/2012-HPDE-Feb-TSDS

From TSDS

Jump to: navigation, search
TSDS ("HPDE DAPTS") update

Contents

  1. Objectives
  2. People
  3. API (non-SPASE)
  4. API (SPASE)
  5. Connecting to a data service
  6. Connecting to a data service - Catalog
  7. Connecting to a data service - NcML
  8. Connecting to a data service - IOSP
  9. Example use: Browser
  10. Example use: IDL
  11. Example use: Autoplot
  12. A look ahead

1. Objectives

  1. develop a standard API for time series-like data,
  2. develop a software package, TSDS (Time Series Data Server), that implements this API and provides server-side super-setting, sub-setting, filtering, and uniform gridding of time series-like data,
  3. make the data holdings from several key data providers in the heliophysics environment accessible through the TSDS API, and
  4. develop client-side software for standard data analysis packages (IDL, MATLAB, Java, Python, and Excel) that will allow access to a TSDS-enabled server.

2. People

  • Bob Weigel (GMU): PI
  • Doug Lindholm (LASP): TSDS lead developer
  • Ribao Wei (GMU): Catalog metadata
  • Sheng Li (GMU): Catalog metadata
  • Victoir Veibell (GMU): IOSPs
  • Jeremy Faden (Cottage Systems): IOSPs
  • Joey Mukherjee (Southwest): VSEO Catalog and IOSP
  • Robert Redmon (NGDC/SPIDR): SPIDR catalog
  • Peter Elespuru (NGDC/SPIDR): SPIDR catalog
  • Bobby Candey (SPDF): CDAWeb catalog and design feedback
  • Jon Vandegriff (JHU/APL): Design feedback
  • Nand Lal (SPDF): Design feedback
  • Aaron Roberts (GSFC): Design feedback
  • Tom Narock (UMBC): Design feedback

3. API (non-SPASE)

The base-line API builds on OPeNDAP-compliant URL requests of the form:

http://host/servletname/dataset.suffix?parameters&constraint&filter

Examples:

where

  • host: name of the computer hosting the TSDS servlet;
  • servletname: is the name of the servlet;
  • dataset: name of a dataset containing time series parameters;
  • suffix: type or format of the output;
  • parameters: list of parameters to return with optional hyperslab (index subset) definitions (default all);
  • constraint: contraints on the values of the parameters; and
  • filter: filters applied to the parameter values after the constraints have been applied.

filter options include

  • replace(a,b) replace any occurrence of the value a with b,
  • replace_missing(a) replace missing values with the value a,
  • exclude_missing() exclude any time sample that has a missing value,
  • format_time(format) format ASCII time output, see Java's SimpleDateFormat [1] (time variable must be explicitly requested to use),
  • stride(n) return every nth time sample, and
  • thin(n) apply a stride to return about n time samples.

constraint options include

  • >, <, >=, <=, and =.

The suffix options include

  • csv: comma separated values,
  • dat: tabular ASCII format,
  • bin: A flat binary table,
  • nc: Network Common Data Form (NetCDF) file (to be implemented),
  • cdf: Common Data Format (CDF) file (to be implemented),
  • h5: Hierarchical Data Format (HDF) version 5 (to be implemented),
  • json: JavaScript Object Notation (JSON),
  • xml: An XML representation of the data (to be implemented; structure to be determined),
  • info: information about the dataset and parameters,
  • html: HTML view of dataset information and a form for requesting data,
  • dds: dataset Descriptor Structure (ASCII),
  • das: dataset Attribute Structure (ASCII),
  • dods: dataset as defined by the Data Access Protocol (DAP), and
  • asc: dataset represented as ASCII.

4. API (SPASE)

  • non-SPASE-enabled: http://tsds.net/tsdsdev/cdaweb/AC_H1_MFI.asc?time,BX_GSE
  • SPASE-enabled (proposed, not implemented): http://tsds.net/tsdsdev/NumericalData/SPASE_ID.csv?time,BX_GSE
  • To implement this, we need a SPASE ID + ParameterKey and a mapping to the Product Name + Parameter Name that each data services uses internally (e.g., AC_H1_MFI/BX_GSE).
  • This is not possible with SPASE Numerical Data records, as implemented.
    • AC_H1_MFI is found in [2] . This Product Name is used to get a list of Parameters, e.g., BX_GSE is returned by [3]
    • but not in a SPASE Record which has spase://VSPO/NumericalData/P_ACE_HDR_MAG_SWEPAM_4M_MGD
  • I don't think this information should be hard-coded into SPASE Numerical records. I think that they should come from a service (see also presentations/2012-HPDE-Feb-SPASE).
  • Should this information be hard-coded in SPASE record (it is now): "Data are presently ~5 months delayed"? What happen when ACE stops returning data? Will someone receive and email alert that says "Update SPASE record"?

5. Connecting to a data service

  • Serving data through the TSDS API from a data service requires two key pieces of information and possibly some additional code.
  • This information is generated by TSDS developers based on the data service's API documentation.


  1. A catalog listing containing all information required to form a data request. At the very least is a list of parameter IDs for each data server and start dates. Ideally additional information is given including stop date, units, and a link to documentation.
    • In working on this, I realized that most of the SPASE Numerical Data records will require significant edits if they were to be used instead of the ad-hoc approach taken (discussed later). The reason is discussed in the Use 4. section of presentations/2012-HPDE-Feb-SPASE.
  2. A template NcML file that is used by TSDS to form a data request and interpret the result.
  3. An IOSP (Input/Output Service Provider) - Usually Java code that maps the response from a service to the internal TSDS data structure.

6. Connecting to a data service - Catalog

1. A catalog listing containing all information required to form a data request. At the very least is a list of parameter IDs for each data server and start dates. Ideally additional information is given including stop date, units, and a link to documentation.

Example data requests:

  • CDAWeb (parameter ID = AC_H1_MFI Magnitude): [4]
  • SPIDR (parameter ID = index_ssn): [5]
  • SuperMAG (parameter ID = BOU): [6]
  • VSEO (parameter ID = DE::DE-1::HAPI::HAPI::D1HE): [7]

Example catalogs:


<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
        name="SPIDR Data">
  <dataset name="spidr">
    <!-- The URL http://tsds.net/tss/spidr/ssn is the base URL
            for forming a data request -->
    <access serviceName="tss"
        urlPath="http://tsds.net/tss/spidr/ssn" />
    <!-- The URL http://tsds.net/meta/tsds?catalog=spidr&parameter=ssn
             returns NcML -->
    <access serviceName="ncml"
        urlPath="http://tsds.net/meta/tsds?catalog=spidr&parameter=ssn"/>
    <documentation
       xlink:href="http://spidr.ngdc.noaa.gov/spidr/servlet/GetData?describe&amp;param=ssn"
       xlink:title="Metadata" />
    <timeCoverage>
      <Start>19320101</Start>
       <End>20120201</End>
    </timeCoverage>
   </dataset>
</catalog>

7. Connecting to a data service - NcML

2. A template NcML file that is used by TSDS to form a data request.

  • The template file given below is modified based on a request of the form

http://tsds.net/tsdsdev/spidr/DATA_SET_ID.asc?time,PARAMETER1_SHORT_NAME,PARAMETER2_SHORT_NAME&time>STARTDATE&time<STOPDATE

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
        location="file:/dev/null" iosp="lasp.tss.iosp.ColumnarAsciiReader"
        commentCharacter="#"
        columns="1,2;3"
        url="http://spidr.ngdc.noaa.gov/spidr/servlet/GetData?
          format=csv&dateFrom=STARTDATE&dateTo=STOPDATE&
          param=PARAMETER_SHORT_NAME">
 
   <attribute name="title" value="DATA_SET_LONG_TITLE" />
   <dimension name="time" isUnlimited="true" />
   <variable name="time" shape="time" type="String">
      <attribute name="units" value="yyyy-MM-dd HH:mm" />
   </variable>
   <variable name="PARAMETER1_SHORT_NAME" shape="time" type="double">
     <attribute name="long_name" value="PARAMETER1_LONG_NAME" />
     <attribute name="units" value="PARAMETER1_UNITS" />
     <attribute name="precision" value="PARAMETER1_PRECISION" />
     <attribute name="_FillValue" type="double" value="PARAMETER1_FILL" />
   </variable>
   <variable name="PARAMETER2_SHORT_NAME" shape="time" type="double">
     <attribute name="long_name" value="PARAMETER2_LONG_NAME" />
     <attribute name="units" value="PARAMETER2_UNITS" />
     <attribute name="precision" value="PARAMETER2_PRECISION" />
     <attribute name="_FillValue" type="double" value="PARAMETER2_FILL" />
   </variable>
</netcdf>

8. Connecting to a data service - IOSP

3. An IOSP (Input/Output Service Provider) - Usually Java code that maps the response from a service to the internal TSDS data structure (CDM).

IOSPs exist for:

  • Columnar remote or local data files.
  • Data piped from the command line.
  • Data in a text file that is pre-processed by a regex.
  • Data from web services: CDAWeb, SPIDR, LISIRD, ViRBO, SuperMAG, VSEO (in development).
  • (Many of the IOSPs use Java code from Autoplot).

9. Example use: Browser

View ASCII data from web browser:

http://lasp.colorado.edu/lisird/tss/historical_tsi.csv

Return time and Irradiance in a time range:

http://lasp.colorado.edu/lisird/tss/historical_tsi.csv?time,Irradiance&time%3E2003-02-25&time%3C2009-03-27

Return time and Irradiance when Irradiance was greater than 1361.6 and less than 1361.8:

http://lasp.colorado.edu/lisird/tss/historical_tsi.csv?time,Irradiance&Irradiance%3E1361.6&Irradiance%3C1361.8

10. Example use: IDL

Import data into IDL. The following would be the response to a request for output=pro instead of output=csv, e.g., http://lasp.colorado.edu/lisird/tss/historical_tsi.pro (the following is the new style of output that differs from this link).

; Copy the following on to the IDL command line
oUrl = OBJ_NEW('IDLnetUrl')
fn   = oUrl->Get(filename='tss_reader__define.pro', $
  url='http://tsds.net/idl/tss_reader__define.pro')
tss  = OBJ_NEW('tss_reader',baseurl='http://lasp.colorado.edu/lisird/tss/')
data = tss->read_data(dataset='historical_tsi')
OBJ_DESTROY,tss
print, 'For more information,'
print, 'see http://lasp.colorado.edu/lisird/tss/historical_tsi.html'
plot,data[*].(0),data[*].(1), $
 yrange=[1360,1362],/xstyle,/ystyle, $
 xtitle='Year',ytitle='Irradiance (W/m^2)', $
 title='TSI Reconstruction from Wang, Lean, Sheeley (ApJ, 2005)'

11. Example use: Autoplot

http://autoplot.org/autoplot.jnlp&open=http://lasp.colorado.edu/lisird/tss/historical_tsi.csv

12. A look ahead

  • Test every parameter
  • Set up alert system for when data provider site goes down
  • Add links to metadata (which metadata)
  • Continue work on aggregation
Personal tools