-
Notifications
You must be signed in to change notification settings - Fork 5
Data Fields
Meta-metadata Wrappers place extracted metadata in one of three types of fields. Scalars hold a single, scalar value. Composites hold a single value as defined by a metadata-wrapper. Collections hold either multiple scalars or multiple composites.
Aside from those basic fields, there are some basic metadata-types which you should be familiar with if you want to incorporate images, videos, and other web pages into your wrapper. See below for details.
In addition to the attributes listed below, each field can also have styles applied to it. For information on other operations see here.
Scalars often have the following basic attributes
- name - A unique (within the object) name for the field.
- scalar_type - usually of type "String", this determines the type of data. "Int" and "ParsedURL" are also available as types.
- comment - A helpful comment.
<scalar name="title" scalar_type="String" comment="The document name">
<xpath>//div/h3/text()</xpath>
</scalar>- String
- Date
- Int
- Double
- Float
- ParsedURL
- field_parser_key - see Wrapper operations for details
- tag - can be set any string value. tbd
- context_node - Can be set to any variable defined in a wrapper. These let you use an xpath relative to the context_node. If you have a lot of scalars that are extracted from nearby nodes, this can save a lot of computation.
<def_var name="rating_summary" type="node">
<xpath>//div[@class='CustomerRatings']</xpath>
</def_var>
<scalar name="overall_rating" context_node="rating_summary">
<xpath>.//div[@class='BVRROverallRatingContainer']//img/@alt</xpath>
</scalar>
<scalar name="num_reviews" context_node="rating_summary">
<xpath>.//div[@class='BVRRRatingSummaryLinks']//span[@class='BVRRNumber']</xpath>
</scalar>
<scalar name="reviews_location" context_node="rating_summary">
<xpath>.//span[@class='BVRRRatingSummaryLinkReadWithCountID']/a/@href</xpath>
</scalar>- navigate_to - tbd likely a typo
- schema_org_itemprop - associates a metadata value with a field from schema.
- hint - tbd
- as_composite_scalar - tbd
- ignore_in_term_vector - tbd
- extract_as_html - instructs the service to take the html from a node instead of the text
<scalar name="description" hide_label="true" extract_as_html="true">
<xpath>./p[@class='js-tweet-text tweet-text']</xpath>
<xpath>.//p[@class='ProfileTweet-text js-tweet-text u-dir']</xpath>
</scalar>Most metadata wrappers extend the type document in some way. These three fields are used in most wrappers.
- title - The name of your document.
- description - A synopsis, overview, ect. of the current document.
- location - The document's url
Composites must be of an already-existing type. Any time a composite includes a scalar field named 'location', it will only extract 'title' and 'location' from the original document. In MICE, a composite with a location can easily be expanded to show metadata from the page it links to.
The xpath for a composite's field can be relative to the xpath for the whole composite, as in the following example.
- name - A unique (to that object) name.
- type - The type of the field, defined by a wrapper.
- comment - A helpful comment. This attribute is not required.
<composite name="related_page" type="compound_document" comment="A related web page">
<xpath>//div[@id='related']</xpath>
<scalar name="title">
<xpath>./text()>
<scalar name="location">
<xpath>./a/@href>
</scalar>
</composite>- compound_document - this type is commonly used as a base class for much metadata. If you want to link to a webpage and aren't sure of what type it is, use compound_document.
- image - a .jpeg, .gif, .png, or .bmp image. Make sure to include a location scalar so that the service knows where to find the source image.
- video - a .mp4, .ogg, .flv, .m4v, .mov, or .wmv video. Make sure to include a location scalar so that the service knows where to find the source video.
A grouping of multiple elements, either scalar or composite. They typically have the following attributes, and they must have either a child_type or 'child_scalar_type.
The xpath for a collection's field can be relative to the xpath for the whole composite, as in the following example.
- name - A unique (within the object) name for the field.
- child_type - The type of composites contained within the collection.
- child_scalar_type - The type of scalars contained within the collection.
- comment - A helpful comment. This attribute is not required.
<collection name="related_pages" child_type="compound_document" comment="A related web page">
<scalar name="title"/>
<scalar name="location"/>
</composite>