-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindexer.go
70 lines (62 loc) · 3.55 KB
/
indexer.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
package thingfulx
import (
"context"
"time"
)
// IndexerBuilder is type definition for the constructor functions that new
// Indexer must implement.
type IndexerBuilder func() (Indexer, error)
// URLData is a struct used to send back data from the URLS method. We now send
// back data asynchronously via a channel so we want to package our data we are
// actually interested in (i.e. the slice of strings), along with any error that
// might happen.
type URLData struct {
// URLS is a slice of url strings representing a chunk of candidate resource
// URLs for the target infrastructure.
URLS []string
// Err is used to allow the process to signal back an error to the caller. We
// package it with the data so we can signal an error at any point.
Err error
}
// Indexer is the main interface for things that know how to index and fetch
// resources from external data infrastructures and are thus responsible for
// handling any authentication requirements, parsing the returned data and
// generating normalized representations of IoT things to the caller. The
// interface attempts to be agnostic in terms of protocol so an implementing
// class is free to perform whatever steps required to get the data.
type Indexer interface {
// Provider returns data for the current provider.
Provider() *Provider
// URLS returns the smallest set of URLs required to completely index the
// upstream data provider. The returned values might be a single URL for hosts
// that publish relatively few things that are all available via a single URL,
// or it might be hundreds of thousands of values long for hosts that publish
// data entirely on individual URLs.
//
// This function takes as parameters an instance of the Context interface for
// request-scoped values or cancellation, a Client implementation which the
// Indexer must use to make any outgoing requests, a delay Duration which
// defines the minimum interval between consecutive requests to the
// infrastructure. In addition the caller must pass in a channel by which the
// function can return data to the caller. This is intended to allow the indexer
// implementation to return chunks of data as it goes in order to support
// infrastructures with hundreds of thousands of unique URLs to return.
URLS(ctx context.Context, client Client, delay time.Duration, out chan<- URLData)
// Fetch is our method that knows how to obtain raw data for an infrastructure,
// and is therefore responsible for negotiating any authentication or protocol
// weirdness when it comes to fetching data. It takes as parameters an instance
// of the Context interface for request-scoped values and cancellation, the url
// we want to get data from, a Client object that the indexer must use to
// actually make the request, and a Clock to allow testing time related
// functions. It returns the raw data from the infrastructure as a slice of
// bytes for further parsing.
Fetch(ctx context.Context, urlStr string, client Client, clock Clock) ([]byte, error)
// Parse returns a slice of Channel objects extracted from that data source.
// This function takes as parameters a slice of bytes representing the data
// collected from the upstream data provider, the URL of the Channel being
// indexed and a Clock instance used internally by the fetcher to record the
// indexing time of the parser. This is to allow for easier testing. We
// separate Parsing from Fetching as we have some systems that provide data to
// us without having to be fetched from a remote HTTP source.
Parse(rawData []byte, urlStr string, clock Clock) ([]*Channel, error)
}