learn-lang-diary/stitching.lyx

#LyX 2.2 created this file. For more info see http://www.lyx.org/
\lyxformat 508
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\begin_preamble
\usepackage{url} 
\end_preamble
\use_default_options false
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding utf8
\fontencoding global
\font_roman "times" "default"
\font_sans "helvet" "default"
\font_typewriter "courier" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref true
\pdf_bookmarks true
\pdf_bookmarksnumbered false
\pdf_bookmarksopen false
\pdf_bookmarksopenlevel 1
\pdf_breaklinks true
\pdf_pdfborder true
\pdf_colorlinks true
\pdf_backref false
\pdf_pdfusetitle true
\papersize default
\use_geometry false
\use_package amsmath 2
\use_package amssymb 2
\use_package cancel 1
\use_package esint 0
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 0
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 0
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Title
Stitching Together Vector Spaces
\end_layout

\begin_layout Author
Linas Vepštas
\end_layout

\begin_layout Date
Draft of 17 June 2018 - First Draft
\end_layout

\begin_layout Abstract
Applying machine learning to linguistics to extract both syntax and semantics,
 ideally via unsupervised algorithms is the goal of an incresingly popular
 quest.
 At this time, vector-space based approaches seem to be the best suited
 for this, and are thus increasingly commonplace and quite popular.
 This includes systems such as word2vec and and GloVe as exemplars.
 These systems fall short, however, as they fail to expose the syntactic
 structure of natural language.
 The reason for this, and how to move beyond this state of affairs, is discussed
 in this text.
\end_layout

\begin_layout Abstract
The key observation made here is that there is more than just one vector
 space onto which words can be mapped.
 The different vector spaces are necessaarily related to one-another through
 syntactic relations.
 When a set of words are clustered into a grammatical category, this clustering
 must be made in a consistent fashion, for each of the vector spaces.
 The vector spaces must be 
\begin_inset Quotes eld
\end_inset

stitched together
\begin_inset Quotes erd
\end_inset

 in a consistent fashion; this consistency condition is exactly the syntax
 of the grammar.
 The stitching obeys a certain set of axioms, which are the sheaf axioms.
 The best possible stitching together can be found by minimizing an overall
 cost function.
\end_layout

\begin_layout Section*
Introduction
\end_layout

\begin_layout Standard
Not written.
 See next section.
\end_layout

\begin_layout Standard
Also, the last half of this text is not written.
\end_layout

\begin_layout Section*
Meaning
\end_layout

\begin_layout Standard
So here's one approach to meaning.
 It is already clear that disjuncts are correlated with meaning, so one
 provisional approach might be to assign each disjunct a unique meaning.
 Alternately, this can be used as a doorway to the intentional meaning of
 a word.
 
\end_layout

\begin_layout Standard
Consider the phrases 
\begin_inset Quotes eld
\end_inset

the big balloon
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

the red balloon
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

the small ballon
\begin_inset Quotes erd
\end_inset

...
 The pseudo-disjuncts on balloon in these three cases would be 
\begin_inset Quotes eld
\end_inset

the- big-
\begin_inset Quotes erd
\end_inset

 
\begin_inset Quotes eld
\end_inset

the - red-
\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset

the- small-
\begin_inset Quotes erd
\end_inset

 (plus an additional connector to the verb).
 Examining this connector-by-connector, we expect that the MI for the word
 pair (the, balloon) to be small, while the MI for the word-pairs (big,
 balloon), (red, balloon) and (small, balloon) to be large(r).
 Its thus tempting to identify the set {big, red, small} as the set of intention
al attributes associated with 
\begin_inset Quotes eld
\end_inset

balloon
\begin_inset Quotes erd
\end_inset

.
 The strength of the MI values to each of the connectors might be taken
 as a judgement of how much that attribute is prototypical of the object
 (see other section on 
\begin_inset Quotes eld
\end_inset

prototype theory
\begin_inset Quotes erd
\end_inset

).
\end_layout

\begin_layout Standard
The disjuncts associated with 
\begin_inset Quotes eld
\end_inset

balloon
\begin_inset Quotes erd
\end_inset

 will also connect to a verb.
 These verb connectors may be taken as another set of intentional attributes,
 for example {floats, drifts, rose, popped}.
 It should be possible to distinguish these as an orthogonal set of attributes,
 in that one might observe 
\begin_inset Quotes eld
\end_inset

the- red- floats+
\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset

the- red- drifts+
\begin_inset Quotes erd
\end_inset

 but never observe 
\begin_inset Quotes eld
\end_inset

floats- drifts+
\begin_inset Quotes erd
\end_inset

.
\end_layout

\begin_layout Standard
Meaning bibliography: 
\end_layout

\begin_layout Itemize
\begin_inset Quotes eld
\end_inset

The Molecular Level of Lexical Semantics
\begin_inset Quotes erd
\end_inset

, EA Nida, (1997) International Journal of Lexicography, 10(4): 265–274.
 https://www.academia.edu/36534355/The_Molecular_Level_of_Lexical_Semantics_by_EA_
Nida
\end_layout

\begin_layout Section*
Vector Algebra
\end_layout

\begin_layout Standard
Here's the deal ...
 vectors.
\end_layout

\begin_layout Subsection*
Word vectors
\end_layout

\begin_layout Standard
The last section introduces the idea of decomposable basis vectors.
 Let try to make this more precise (a cleaned-up version of this and above
 belongs in the (unfinished) sheaf paper.).
 First, review the notion of a vector, and then show how to make vectors
 from word-disjuncts, N-grams, and skip-grams.
 Then review the idea of tensoring.
\end_layout

\begin_layout Standard
So, the textbook standard definition of a vector 
\begin_inset Formula $\vec{v}$
\end_inset

 is 
\begin_inset Formula 
\[
\vec{v}=a_{1}\widehat{e}_{1}+a_{2}\widehat{e}_{2}+\cdots+a_{n}\widehat{e}_{n}
\]

\end_inset

The 
\begin_inset Formula $a_{k}$
\end_inset

 are conventionally taken to be numbers; here, real numbers.
 The 
\begin_inset Formula $\widehat{e}_{k}$
\end_inset

 are called 
\begin_inset Quotes eld
\end_inset

basis vectors
\begin_inset Quotes erd
\end_inset

.
 They have the property of having unit length; that is 
\begin_inset Formula $\left\Vert \widehat{e}_{k}\right\Vert =1$
\end_inset

.
 For every word 
\begin_inset Formula $w$
\end_inset

, one can create several kinds of vectors.
 A widely-known vector is the 
\begin_inset Quotes eld
\end_inset

N-gram
\begin_inset Quotes erd
\end_inset

, where each 
\begin_inset Formula $\widehat{e}_{k}$
\end_inset

 corresponds to the N-word context in which the word 
\begin_inset Formula $w$
\end_inset

 was observed, and 
\begin_inset Formula $a_{k}$
\end_inset

 is the count of the number of observations.
 Thus, for example, given the toy corpus: "
\emph on
John knew the bird was there.
 John heard the bird.
 John saw the bird.
 Susan saw the bird too.
 Mary saw it also
\emph default
".
 Setting 
\begin_inset Formula $w=bird$
\end_inset

 and N=3, and the requirement that the word be in the middle of the N-gram,
 one has
\begin_inset Formula 
\begin{align*}
\widehat{e}_{1} & =\left[the\;*\;was\right]\\
\widehat{e}_{2} & =\left[the\;*\;.\right]\\
\widehat{e}_{3} & =\left[the\;*\;too\right]
\end{align*}

\end_inset

and observation counts 
\begin_inset Formula $a_{1}=1$
\end_inset

, 
\begin_inset Formula $a_{2}=2$
\end_inset

, 
\begin_inset Formula $a_{3}=1$
\end_inset

.
 Here, 
\begin_inset Formula $a_{2}=2$
\end_inset

 because there are two sentences containing the word-sequence 
\begin_inset Quotes eld
\end_inset

the bird.
\begin_inset Quotes erd
\end_inset

 In more compact form, one may write 
\begin_inset Formula 
\[
\overrightarrow{bird}=1\left[the\;*\;was\right]+2\left[the\;*\;.\right]+1\left[the\;*\;too\right]
\]

\end_inset

This can be extended in an obvious way to a larger corpus, to a larger N
 and one can drop the requirement that the word be in the middle.
 Skip-grams are similar, but allow some words to be skipped, and have a
 complex algorithm to determine when and how words can be skipped.
\end_layout

\begin_layout Standard
Disjuncts behave in a very similar manner.
 The word-pair counting and the MST pipeline creates disjuncts and the associate
d counts.
 For the above corpus, the disjuncts might possibly resemble these:
\begin_inset Formula 
\begin{align*}
\widehat{e}_{1} & =the\negthinspace-\;\&\;was\negthinspace+\\
\widehat{e}_{2} & =the\negthinspace-\;\&\;.\negthinspace+\\
\widehat{e}_{3} & =the\negthinspace-\;\&\;too\negthinspace+
\end{align*}

\end_inset

These disjuncts are essentially identical to the N-gram example above; only
 the notation is different.
 One might hope that the disjuncts coming out from the MST pipeline might
 actually be of higher quality.
 The support for this hope is several decades of research results on MST
 parsing of natural language.
 The point here is that N-grams, skip-grams and word-disjunct vectors are
 quite similar to one another, and that, perhaps, word-disjunct vectors
 are just a tiny bit more linguistically accurate.
\end_layout

\begin_layout Standard
What does one do with vectors? Well, one can apply any vector-based algorithm
 you can wrap your mind around.
 Some classics are clustering; SVM or K-means.
 Here, one commonly starts with a vector dot-product, leading to a cosine-distan
ce metric.
 Given two vectors 
\begin_inset Formula $\vec{a}=a_{1}\widehat{e}_{1}+a_{2}\widehat{e}_{2}+\cdots+a_{n}\widehat{e}_{n}$
\end_inset

 and 
\begin_inset Formula $\vec{b}=b_{1}\widehat{e}_{1}+b_{2}\widehat{e}_{2}+\cdots+b_{n}\widehat{e}_{n}$
\end_inset

, the dot-product is given by 
\begin_inset Formula 
\[
\vec{a}\cdot\vec{b}=a_{1}b_{1}+a_{2}b_{2}+\cdots+a_{n}b_{n}
\]

\end_inset

The cosine angle between these is given by
\begin_inset Formula 
\[
\left\Vert \vec{a}\right\Vert \left\Vert \vec{b}\right\Vert \cos\theta=\vec{a}\cdot\vec{b}
\]

\end_inset

where, as usual, the 
\begin_inset Formula $l_{2}$
\end_inset

 Hilbert-space norm is used: 
\begin_inset Formula $\left\Vert \vec{a}\right\Vert =\sqrt{\vec{a}\cdot\vec{a}}$
\end_inset

.
 The closer that 
\begin_inset Formula $\theta$
\end_inset

 is to zero, the more parallel the two vectors are.
 There are plenty of other interesting ways of measuring the similarity
 of vectors.
 The point here is that if the corpus included the sentences 
\begin_inset Quotes eld
\end_inset


\emph on
John heard the crow.
 John saw the crow.
\emph default

\begin_inset Quotes erd
\end_inset

 and one obtained a vector for 
\begin_inset Formula $\overrightarrow{crow}$
\end_inset

, one might typically discover that 
\begin_inset Formula $\overrightarrow{crow}$
\end_inset

 is similar to 
\begin_inset Formula $\overrightarrow{bird}$
\end_inset

: the cosine angle gives a concrete technology for measuring similarity.
\end_layout

\begin_layout Standard
Suppose one has a large number of vectors for a large number of words.
 Similarity metrics that assign a number to the informal idea of similarity
 then opens the door to automatically classifying words into bins, according
 to their similarity.
 A common favorite is the K-means clustering algorithm, which can take a
 large number of vectors, and assign them to one of K different bins, with
 all words in the same bin being quite 'similar'.
 One can then make general hand-waving arguments that these clusters correspond
 to 'grammatical classes'.
 The actual data, the actual results make this obvious: a quick skim of
 the clusters makes it clear that some clusters are dominated by nous, and
 others by verbs.
 
\end_layout

\begin_layout Standard
There are other approaches, too.
 Popular ones are the various deep learning approaches applied to neural
 nets.
 Word2Vec is a particularly famous one; its now a tensorflow tutorial.
 The algorithmic details are interesting, and sophisticated.
 However, conceptually, its not really all that different from what is described
 above: one obtains a collection of vectors, and then one pumps the vectors
 through a neural-net classifier, to obtain word-classes.
\end_layout

\begin_layout Standard
The entire point of this section is to drive home the idea that, whatever
 one can do with N-grams or with skip-grams, one can also do with word-disjunct
 vectors.
 At the appropriate conceptual level, they are really very, very similar
 to one-another.
\end_layout

\begin_layout Standard
As a minor side-point, one can hold out some hope that perhaps word-disjunct
 vectors are of (slightly) higher quality than skip-grams.
 It seems like they could be or should be, as reinforced by decades of MST-lingu
istics results.
 If one takes some stock, off-the-shelf linguistics-K-means algorithm, or
 some stock, off-the-shelf Word2vec algorithm, and instead plugs in word-disjunc
t vectors wherever the skip-grams or N-grams might go, then what happens?
 Maybe, possibly, one might get results that are an itsy-bitsy teensy-weensy
 bit better.
 Maybe.
 This is not known: there are no published results comparing these.
 It is reasonable (to me) to expect that the results would be quite similar,
 with perhaps word-disjunct vectors being just a smidgen higher-quality.
\end_layout

\begin_layout Standard
Due diligence suggests that such head-to-head comparisons should be performed
 and published.
 
\end_layout

\begin_layout Standard
A later sections describes how one can do much better than naive clustering,
 whether Word2vec or K-means.
 But first, some additional basic issues need to be reviewed.
\end_layout

\begin_layout Subsection*
Merging vectors
\end_layout

\begin_layout Standard
In the above example, it seems obvious that 
\begin_inset Quotes eld
\end_inset


\emph on
bird
\emph default

\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset


\emph on
crow
\emph default

\begin_inset Quotes erd
\end_inset

 should be merged: they are grammatically similar, given their vectors.
 So lets merge them into a class of 
\begin_inset Quotes eld
\end_inset


\emph on
THINGS
\emph default

\begin_inset Quotes erd
\end_inset

.
 How should that class be created, formally? This is a non-trivial question,
 since additional merges affect the class.
 Suppose, for example, the corupus contained the sentence 
\begin_inset Quotes eld
\end_inset


\emph on
John saw the book
\emph default

\begin_inset Quotes erd
\end_inset

.
 Is it plausible to merge 
\begin_inset Quotes eld
\end_inset

book
\begin_inset Quotes erd
\end_inset

 into 
\begin_inset Quotes eld
\end_inset

THINGS
\begin_inset Quotes erd
\end_inset

? If so, then how? One way is to compute the cosine similarity 
\begin_inset Formula $\cos\left(\overrightarrow{book},\overrightarrow{bird}\right)$
\end_inset

 and 
\begin_inset Formula $\cos\left(\overrightarrow{book},\overrightarrow{crow}\right)$
\end_inset

 and merge only if both are large enough.
 The other is to compute 
\begin_inset Formula $\cos\left(\overrightarrow{book},\overrightarrow{THINGS}\right)$
\end_inset

 but this requires the vector 
\begin_inset Formula $\overrightarrow{THINGS}$
\end_inset

.
 
\end_layout

\begin_layout Standard
There are several ways to create this vector.
 One way is by explicit vector addition 
\begin_inset Formula 
\[
\overrightarrow{THINGS}_{sum}=\overrightarrow{bird}+\overrightarrow{crow}
\]

\end_inset

This seems to be an adequate way of creating this grammatical class.
 Recall that
\begin_inset Formula 
\[
\overrightarrow{bird}=1\left|the\negthinspace-\;\&\;was\negthinspace+\right\rangle +2\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle +1\left|the\negthinspace-\;\&\;too\negthinspace+\right\rangle 
\]

\end_inset

where we switched to disjunct notation.
 The vector for 
\begin_inset Formula $\overrightarrow{crow}$
\end_inset

 is
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\overrightarrow{crow}=2\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle 
\]

\end_inset

Comparing these two, directly like this, suggests that other merge strategies
 are possible.
 For example, since 
\begin_inset Formula $\overrightarrow{crow}$
\end_inset

 has neither 
\begin_inset Formula $\left|the-\&was+\right\rangle $
\end_inset

 nor 
\begin_inset Formula $\left|the-\&too+\right\rangle $
\end_inset

 on it, maybe these don't belong on the merged class: perhaps an intersection
 of the basis vectors should be used, so that the combined vector is non-zero
 only on the intersection of the basis elements.
 Thus, perhaps a better definition would be
\begin_inset Formula 
\[
\overrightarrow{THINGS}_{intersect}=4\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle 
\]

\end_inset

This is narrow definition is nice, because we have that 
\begin_inset Formula 
\[
\overrightarrow{book}=1\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle 
\]

\end_inset

and so clearly the cosine between 
\begin_inset Formula $\overrightarrow{THINGS}_{intersect}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{book}$
\end_inset

 is one, and they're mergeable.
 By contrast, the cosine between 
\begin_inset Formula $\overrightarrow{THINGS}_{sum}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{book}$
\end_inset

 is much less than one - which also makes sense, because books are not very
 bird-like.
\end_layout

\begin_layout Standard
Another possibility is to combine some fraction 
\begin_inset Formula $0\le\alpha\le1$
\end_inset

 of the basis elements that don't overlap.
 This suggests a grammatical class of 
\begin_inset Formula 
\[
\overrightarrow{THINGS}_{\alpha}=\alpha\left|the\negthinspace-\;\&\;was\negthinspace+\right\rangle +4\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle +\alpha\left|the\negthinspace-\;\&\;too\negthinspace+\right\rangle 
\]

\end_inset

so that 
\begin_inset Formula 
\[
\overrightarrow{THINGS}_{\alpha=0}=\overrightarrow{THINGS}_{intersect}
\]

\end_inset

while
\begin_inset Formula 
\[
\overrightarrow{THINGS}_{\alpha=1}=\overrightarrow{THINGS}_{sum}
\]

\end_inset

That is, for 
\begin_inset Formula $\alpha=1$
\end_inset

, 
\begin_inset Formula $\overrightarrow{THINGS}_{\alpha}$
\end_inset

 is rather bird-like, whereas for 
\begin_inset Formula $\alpha=0$
\end_inset

, its more generic.
 
\end_layout

\begin_layout Standard
Note that 
\begin_inset Formula $\overrightarrow{THINGS}_{\alpha}$
\end_inset

 is not a linear combination of 
\begin_inset Formula $\overrightarrow{bird}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{crow}$
\end_inset

.
 It is non-linear, and, depending on the actual corpus, could be highly
 non-linear.
 
\end_layout

\begin_layout Standard
The choice of merge strategy alters the contents of the grammatical clusters.
 This is not a surprise; the goal here is to illustrate that a number of
 different merge strategies are possible.
 it is not 
\emph on
a priori
\emph default
 obvious which is the best.
\end_layout

\begin_layout Subsection*
Merge Strategies and Semantics
\end_layout

\begin_layout Standard
The different strategies for merging vectors lead in a natural way to the
 automated discovery of word-meaning.
 This can be illustrated by example; although a different example from the
 above is needed.
\end_layout

\begin_layout Standard
Consider the word 
\begin_inset Quotes eld
\end_inset


\emph on
saw
\emph default

\begin_inset Quotes erd
\end_inset

.
 After observing a sufficient amount of text, one will find a vector 
\begin_inset Formula $\overrightarrow{saw}$
\end_inset

 that contains disjuncts for appropriate for usages as a "cutting tool",
 "the verb cut", and "the past tense of to see".
 Suppose that there is an existing class 
\begin_inset Formula $\overrightarrow{TOOLS}$
\end_inset

, consisting of nouns, and that the similarity measure judges that 
\begin_inset Formula $\overrightarrow{saw}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{TOOLS}$
\end_inset

 are similar.
 Clearly, a merge strategy of expanding an existing cluster by linear addition
 is incorrect.
 The expanded class
\begin_inset Formula 
\[
\overrightarrow{TOOLS}_{linear-expand}=\overrightarrow{TOOLS}+\overrightarrow{saw}
\]

\end_inset

is incorrect, or at least, not very good, as it now contains disjuncts for
 "the verb cut" and "the past tense of to see".
 Two bad things happen: (a) the noun cluster is polluted with verb-vector
 components, and (b) the vector has not been factorized, and so 
\begin_inset Quotes eld
\end_inset


\emph on
saw
\emph default

\begin_inset Quotes erd
\end_inset

 cannot also be placed into other clusters as well.
 One looses the ability to distinguish different semantic meanings based
 on grammatical usage.
\end_layout

\begin_layout Standard
We know that, 
\emph on
a priori
\emph default
, the correct way of thinking about 
\begin_inset Formula $\overrightarrow{saw}$
\end_inset

 is that it has the components
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\overrightarrow{saw}=\vec{v}_{tool}+\vec{v}_{past-tense}+\vec{v}_{cutting}
\]

\end_inset

However, we do not know (
\emph on
a priori
\emph default
) what the components 
\begin_inset Formula $\vec{v}_{tool}$
\end_inset

,
\begin_inset Formula $\vec{v}_{past-tense}$
\end_inset

 and 
\begin_inset Formula $\vec{v}_{cutting}$
\end_inset

 are.
 A good merge strategy would be able to factorize them out, to discover
 these automatically.
 
\end_layout

\begin_layout Standard
The best merge strategy is hardly obvious.
 The three vectors 
\begin_inset Formula $\vec{v}_{tool}$
\end_inset

,
\begin_inset Formula $\vec{v}_{past-tense}$
\end_inset

 and 
\begin_inset Formula $\vec{v}_{cutting}$
\end_inset

 are not orthogonal (or rather, in general, won't be), so even if there
 were pre-existing grammatical classes for these three, an orthogonal decompose
ion of 
\begin_inset Formula $\overrightarrow{saw}$
\end_inset

 into components is not unique.
 What's more, in early stages, a class for 
\begin_inset Formula $\overrightarrow{TOOLS}$
\end_inset

 might exist, but not yet one for 
\begin_inset Formula $\overrightarrow{CUTTING}$
\end_inset

; its not even clear that a class of the form 
\begin_inset Formula $\overrightarrow{CUTTING}$
\end_inset

 might even be formable.
\end_layout

\begin_layout Standard
There are several possible non-linear merge strategies that are possible.
 One is given in the next section.
 However, the point here is not to propose the best-posible merge strategy,
 but rather to point out that this is where word-sense disambiguation comes
 from, and that this is how it can be done.
 Different word-senses are already encoded in how words are used in sentences.
 Word-senses can be distinguished by paying attention to the disjuncts attached
 on a word.
 Picking the correct merge strategy picks out the word-senses.
 Picking the wrong strategy blurs them all together.
\end_layout

\begin_layout Subsection*
An example non-linear merge strategy
\end_layout

\begin_layout Standard
The earlier section illustrated a non-linear merge, using 
\begin_inset Formula $\overrightarrow{THINGS}_{\alpha}$
\end_inset

 as an example.
 This section formalizes that example.
 The result is a maybe-OK merge strategy, but not an obviously great one.
 Its worth writing down only because its simple.
\end_layout

\begin_layout Standard
The idea starts with the computation of the intersection and the union of
 the support for the vectors to be merged, and then taking a fraction 
\begin_inset Formula $0\le\alpha\le1$
\end_inset

 on the union of the support when merging.
 This merge strategy seems to at least partly overcome the problem of the
 erasure of word-senses noted above.
\end_layout

\begin_layout Standard
Suppose 
\begin_inset Formula $\overrightarrow{CLASS}$
\end_inset

 is some existing word-class, and 
\begin_inset Formula $\overrightarrow{word}$
\end_inset

 is a word that was judged 
\begin_inset Quotes eld
\end_inset

mergeable
\begin_inset Quotes erd
\end_inset

 into 
\begin_inset Formula $\overrightarrow{CLASS}$
\end_inset

.
 Define the support of a vector 
\begin_inset Formula $\vec{v}$
\end_inset

 as
\begin_inset Formula 
\[
\mathcal{I}_{\vec{v}}=\left\{ \widehat{e}_{k}\mbox{ such that }\vec{v}=a_{1}\widehat{e}_{1}+\cdots+a_{n}\widehat{e}_{n}\mbox{ and }a_{k}\ne0\right\} 
\]

\end_inset

The common support for 
\begin_inset Formula $\overrightarrow{CLASS}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{word}$
\end_inset

 would then be
\begin_inset Formula 
\[
\mathcal{I}_{\overrightarrow{CLASS}}\,\cap\,\mathcal{I}_{\overrightarrow{word}}
\]

\end_inset

while the union would be
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\mathcal{I}_{\overrightarrow{CLASS}}\,\cup\,\mathcal{I}_{\overrightarrow{word}}
\]

\end_inset

with 
\begin_inset Formula $\cap$
\end_inset

 and 
\begin_inset Formula $\cup$
\end_inset

 denoting set-intersection and set-union, respectively.
 A plausible merge strategy for 
\begin_inset Formula $\vec{v}=a_{1}\widehat{e}_{1}+\cdots+a_{n}\widehat{e}_{n}$
\end_inset

 and 
\begin_inset Formula $\vec{w}=b_{1}\widehat{e}_{1}+\cdots+b_{n}\widehat{e}_{n}$
\end_inset

 might be
\begin_inset Formula 
\[
\mbox{merge}\left(\vec{v},\vec{w},\alpha\right)=c_{1}\widehat{e}_{1}+\cdots+c_{n}\widehat{e}_{n}
\]

\end_inset

with 
\begin_inset Formula 
\[
c_{k}=\begin{cases}
a_{k}+b_{k} & \mbox{if }\widehat{e}_{k}\in\mathcal{I}_{\overrightarrow{v}}\\
\alpha b_{k} & \mbox{if }\widehat{e}_{k}\notin\mathcal{I}_{\overrightarrow{v}}
\end{cases}
\]

\end_inset

This is designed so that, if 
\begin_inset Formula $\vec{v}=\overrightarrow{CLASS}$
\end_inset

 and 
\begin_inset Formula $\vec{w}=\overrightarrow{word}$
\end_inset

 then those disjuncts in 
\begin_inset Formula $\overrightarrow{word}$
\end_inset

 that are already in 
\begin_inset Formula $\overrightarrow{CLASS}$
\end_inset

 are folded in, in full, while disjuncts from 
\begin_inset Formula $\overrightarrow{word}$
\end_inset

 that are not yet in 
\begin_inset Formula $\overrightarrow{CLASS}$
\end_inset

 are only folded in a little bit, thus expanding the support for 
\begin_inset Formula $\overrightarrow{CLASS}$
\end_inset

, but not putting a lot of weight into the expanded support.
 
\end_layout

\begin_layout Standard
If 
\begin_inset Formula $\vec{v}$
\end_inset

 is a word, and not a class, and one wishes to for a class for the first
 time, out of two words, one might instead choose
\begin_inset Formula 
\[
c_{k}=\begin{cases}
a_{k}+b_{k} & \mbox{if }\widehat{e}_{k}\in\mathcal{I}_{\overrightarrow{v}}\cap\mathcal{I}_{\vec{w}}\\
\alpha\left(a_{k}+b_{k}\right) & \mbox{otherwise}
\end{cases}
\]

\end_inset

What should the 'constant' 
\begin_inset Formula $\alpha$
\end_inset

 be? Certainly, a small 
\begin_inset Formula $\alpha$
\end_inset

 is very conservative, not expanding the meaning very much beyond the overlappin
g set.
 Perhaps a variable 
\begin_inset Formula $\alpha$
\end_inset

 would be better; for example, taking 
\begin_inset Formula $\alpha=\left(\cos\left(\vec{v},\vec{w}\right)-\beta\right)/\left(1-\beta\right)$
\end_inset

 where 
\begin_inset Formula $\beta$
\end_inset

 is the smallest cosine that allows merging.
 This makes the merges more conservative, the more different the words are,
 but is very accepting when the words are similar.
\end_layout

\begin_layout Standard
To avoid expanding the support in unwise ways, perhaps it would be best
 to assemble a set of mergeable words, first, and only then merge them all
 together, in one shot, rather than merging pair-wise.
\end_layout

\begin_layout Standard
The merge strategy here seems OK, but probably not great.
 A better merge strategy is proposed below.
 The point here is that merge strategies that do perform word-sense disambiguati
on (that is, semantic extraction) are in principle possible.
\end_layout

\begin_layout Subsection*
Syntactic Broadening and Generalization
\end_layout

\begin_layout Standard
The act of merging together also broadens or generalizes the syntactic structure
 of the language.
 It extracts grammatical generalities in addition to semantic particulars.
 This is again best illustrated by example.
\end_layout

\begin_layout Standard
Take up the previous example, of a nature scene with birds, and consider
 the verbs 
\begin_inset Quotes eld
\end_inset


\emph on
saw
\emph default

\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset


\emph on
heard
\emph default

\begin_inset Quotes erd
\end_inset

 – John, Susan and others are seeing and hearing birds and crows.
 The word-vectors for 
\begin_inset Quotes eld
\end_inset


\emph on
saw
\emph default

\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset


\emph on
heard
\emph default

\begin_inset Quotes erd
\end_inset

 will include the disjuncts:
\begin_inset Formula 
\begin{align*}
\widehat{e}_{4} & =\left|John\negthinspace-\;\&\;bird\negthinspace+\right\rangle \\
\widehat{e}_{5} & =\left|Susan\negthinspace-\;\&\;bird\negthinspace+\right\rangle \\
\widehat{e}_{6} & =\left|John\negthinspace-\;\&\;crow\negthinspace+\right\rangle 
\end{align*}

\end_inset

Working explicitly from this example, one would have the vector
\begin_inset Formula 
\[
\overrightarrow{saw}=1\left|John\negthinspace-\;\&\;bird\negthinspace+\right\rangle +1\left|Susan\negthinspace-\;\&\;bird\negthinspace+\right\rangle +1\left|John\negthinspace-\;\&\;crow\negthinspace+\right\rangle 
\]

\end_inset


\end_layout

\begin_layout Standard
If 
\begin_inset Quotes eld
\end_inset


\emph on
bird
\emph default

\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset


\emph on
crow
\emph default

\begin_inset Quotes erd
\end_inset

 are merged into the grammatical class 
\begin_inset Quotes eld
\end_inset


\emph on
THINGS
\emph default

\begin_inset Quotes erd
\end_inset

, then none of the basis vectors 
\begin_inset Formula $\widehat{e}_{4}$
\end_inset

, 
\begin_inset Formula $\widehat{e}_{5}$
\end_inset

 or 
\begin_inset Formula $\widehat{e}_{6}$
\end_inset

 are entirely appropriate any more.
 They should be replaced by basis vectors made of the combined class; possibly,
 for example,
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\overrightarrow{saw}=1\left|John\negthinspace-\;\&\;THINGS\negthinspace+\right\rangle +1\left|Susan\negthinspace-\;\&\;THINGS\negthinspace+\right\rangle 
\]

\end_inset

But there's something odd in the above: Susan never saw a crow, in the sample
 corpus; so the above has certainly enlarged or generalized the class of
 things that ar see-able, at least with respect to Susan.
 
\end_layout

\begin_layout Standard
That is, given specific examples, the act of deducing grammatical categories
 enlarges the generative size of the grammar.
 This is a technical statement, and needs some exmplanation.
 Formally, a 'grammar' is a collection of rules that define how sentences
 can be parsed.
 A 'language' is the set of all possible sentences that the grammar allows.
 The original corpus is a sampling of the language; deduction of the class
 of 
\begin_inset Quotes eld
\end_inset


\emph on
THINGS
\emph default

\begin_inset Quotes erd
\end_inset

 expands the language, because, in this expanded grammar, the sentence 
\begin_inset Quotes eld
\end_inset


\emph on
Susan saw the crow
\emph default

\begin_inset Quotes erd
\end_inset

 is now allowed, is part of the language, whereas previously it was not.
\end_layout

\begin_layout Subsection*
Parse Ranking
\end_layout

\begin_layout Standard
In the above, the expansion of the grammar from 
\begin_inset Formula 
\[
saw:\,Susan\negthinspace-\;\&\;bird\negthinspace+;
\]

\end_inset

to 
\begin_inset Formula 
\[
saw:\,Susan\negthinspace-\;\&\;THINGS\negthinspace+;
\]

\end_inset

was implied to be an all-or-nothing act.
 In fact, one can apply a likilihood or probability to this expansion, with
 the intent of keeping undesirable expansion at bay; of minimizing the risk
 of incorrect expansion.
 This is done by replacing the lexical set that is the grammatical class
 with a proabilisitically-weighted set.
 That is, instead of writing
\begin_inset Formula 
\[
THINGS:\;\left\{ bird,\,crow\right\} 
\]

\end_inset

as a named set, one could introduce a probability or cost:
\begin_inset Formula 
\[
THINGS:\;\left\{ \left(bird,h_{bird}\right),\,\left(crow,h_{crow}\right)\right\} 
\]

\end_inset

with 
\begin_inset Formula $h_{crow}=-\log_{2}p\left(crow\right)$
\end_inset

 and likewise for 
\begin_inset Formula $h_{bird}$
\end_inset

.
 The logarithm is used to keep the costs additive, instead of multiplicative;
 this is a detail that tends to simplify things.
 The value 
\begin_inset Formula $p\left(crow\right)$
\end_inset

 is a pseudo-probability; there are several plausible ways to define it.
 One might consider, for example, setting 
\begin_inset Formula 
\[
p\left(crow\right)=\cos\left(\overrightarrow{THINGS},\overrightarrow{crow}\right)
\]

\end_inset

that indicates how well 
\begin_inset Quotes eld
\end_inset


\emph on
crow
\emph default

\begin_inset Quotes erd
\end_inset

 aligned with the class of 
\begin_inset Quotes eld
\end_inset


\emph on
THINGS
\emph default

\begin_inset Quotes erd
\end_inset

.
 
\end_layout

\begin_layout Standard
But this is naive – too naive: the cosine product 
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none

\begin_inset Formula $\cos\left(\overrightarrow{THINGS},\overrightarrow{crow}\right)$
\end_inset

 was derived from vectors that used the grammatical context of 
\begin_inset Quotes eld
\end_inset


\family default
\series default
\shape default
\size default
\emph on
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
crow
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none

\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset


\family default
\series default
\shape default
\size default
\emph on
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
bird
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none

\begin_inset Quotes erd
\end_inset

.
 It did NOT take into account that, in the corpus, Susan had not yet seen
 any birds.
 Its as if the vector for 
\begin_inset Formula $\overrightarrow{crow}$
\end_inset

 is incorrect – it should be taking Susan into account, in some way, but
 its not, because the word 
\begin_inset Quotes eld
\end_inset

Susan
\begin_inset Quotes erd
\end_inset

 is too far away in the sentence.
 For the disjuncts, 
\begin_inset Quotes eld
\end_inset


\family default
\series default
\shape default
\size default
\emph on
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
Susan
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none

\begin_inset Quotes erd
\end_inset

 is the subject of a verb, 
\begin_inset Quotes eld
\end_inset


\family default
\series default
\shape default
\size default
\emph on
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
crow
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none

\begin_inset Quotes erd
\end_inset

 is the object, and there is no direct linkage of subject and object.
 For N-grams, 
\begin_inset Quotes eld
\end_inset


\family default
\series default
\shape default
\size default
\emph on
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
Susan
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none

\begin_inset Quotes erd
\end_inset

 is simply too far away from the object, unless N is raised to an obscenely
 large value.
 Skip-grams do not materially improve on this issue.
 In all cases, the subject is too far away removed.
 
\end_layout

\begin_layout Standard

\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none
Can this be remedied? Yes, it can.
 The key observation will be that there are other (kinds of) vectors, besides
 the vectors given above.
 All of the vectors discussed so far are best considered to be 'naive' vectors.
 They are 'naive' because they ignore the structure of the basis elements
 themselves.
 Acknowledging this structure allows one to move beyond the narrow constraints
 of linear algebra.
 Acknowledging this structure leads to the concept of a 
\begin_inset Quotes eld
\end_inset

sheaf on a graph
\begin_inset Quotes erd
\end_inset

 as a lingustically appropriate generalization of a vector space; it will
 replace the vector spaces, and the linear algebra, by a more general concept.
 This is, roughly, one of collections of vector spaces, sewn together at
 the edges.
 The word 
\begin_inset Quotes eld
\end_inset

sheaf
\begin_inset Quotes erd
\end_inset

 comes from the idea that the axioms defining how the sewing-together is
 to be performed are the same axioms as those of 
\begin_inset Quotes eld
\end_inset

sheaf theory
\begin_inset Quotes erd
\end_inset

.
 But first, before we get to that, some more examples need to be developed.
\end_layout

\begin_layout Section*
Disjuncts are Tensors 
\end_layout

\begin_layout Standard
The issue with 
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\uuline off
\uwave off
\noun off
\color none
'naive' vectors is that they bring the story to a close.
 
\family default
\series default
\shape default
\size default
\emph default
\bar default
\strikeout default
\uuline default
\uwave default
\noun default
\color inherit
In ordinary linear algebra, the basis vectors 
\begin_inset Formula $\widehat{e}_{k}$
\end_inset

 are indecomposable, structure-less, atomic.
 Nothing more can be said about them.
 However, in linguistics, the basis vectors 
\begin_inset Formula $\widehat{e}_{k}$
\end_inset

 are not structure-less; they are made out of words! And that changes everything.
 
\end_layout

\begin_layout Standard
The concept of word-vectors, such as N-grams, skip-grams or word-disjunct
 vectors, is just fine, but needs to be recognized as a flawed oversimplificatio
n for the structure of language.
 They're reasonably OK concepts, for a first pass, giving OK results.
 Clearly, word2vec and it's cousins have made a big impression on the industry.
 That counts for something and should not be dismissed.
 But (I believe that) one can do better by not ignoring the structure of
 the basis vectors.
 These ideas are developed next.
\end_layout

\begin_layout Standard
An earlier section wrote down the basis vector
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\widehat{e}_{4}=\left|John\negthinspace-\;\&\;crow\negthinspace+\right\rangle 
\]

\end_inset

The notation used in writing this was modelled on the usual Link Grammar
 notation for disjuncts.
 A tensor-style notation would be to write
\begin_inset Formula 
\[
\widehat{e}_{4}=\left|John\negthinspace-\right\rangle \otimes\left|crow\negthinspace+\right\rangle 
\]

\end_inset

which makes it perhaps a bit more clear that 
\begin_inset Formula $\widehat{e}_{4}$
\end_inset

 consists of several parts.
 When the 'naive' word-vectors were constructed, these parts were ignored.
 In particular, their contribution to the similarity was ignored.
 The can be remedied by defining a different kind of vector, the cross-connector
 vector.
\end_layout

\begin_layout Subsection*
Cross-connector vectors
\end_layout

\begin_layout Standard
Returning to the nature-watching example, the corpus of seven sentences
 produces a dataset of observed disjuncts that includes the following:
\begin_inset Formula 
\begin{align*}
knew: & John\negthinspace-\;\&\;bird\negthinspace+;\\
heard: & John\negthinspace-\;\&\;bird\negthinspace+;\\
saw: & John\negthinspace-\;\&\;bird\negthinspace+;\\
saw: & Susan\negthinspace-\;\&\;bird\negthinspace+;\\
heard: & John\negthinspace-\;\&\;crow\negthinspace+;\\
saw: & John\negthinspace-\;\&\;crow\negthinspace+;
\end{align*}

\end_inset

All of the above are observed exactly once, each.
 This is in addition to the observations
\begin_inset Formula 
\begin{align*}
bird: & the\negthinspace-\;\&\;was\negthinspace+;\\
bird: & the\negthinspace-\;\&\;.\negthinspace+;\qquad\mbox{ (seen twice)}\\
bird: & the\negthinspace-\;\&\;too\negthinspace+;\\
crow: & the\negthinspace-\;\&\;.\negthinspace+;\qquad\mbox{ (seen twice)}
\end{align*}

\end_inset


\end_layout

\begin_layout Standard
We take up the question, again, can the words 
\begin_inset Quotes eld
\end_inset


\emph on
bird
\emph default

\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset


\emph on
crow
\emph default

\begin_inset Quotes erd
\end_inset

 be merged into a single common grammatical class? Previously, this determinatio
n was made, using the 'naive' word-disjunct vectors
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\overrightarrow{bird}_{naive}=1\left|the\negthinspace-\;\&\;was\negthinspace+\right\rangle +2\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle +1\left|the\negthinspace-\;\&\;too\negthinspace+\right\rangle 
\]

\end_inset

and 
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\overrightarrow{crow}_{naive}=2\left|the\negthinspace-\;\&\;.\negthinspace+\right\rangle 
\]

\end_inset

The verb constructions allow a different kind of vector, that 
\begin_inset Quotes eld
\end_inset

crosses over
\begin_inset Quotes erd
\end_inset

:
\begin_inset Formula 
\begin{align*}
\overrightarrow{bird}_{obj}= & 1\left|knew:John\negthinspace-\;\&\;*\negthinspace+\right\rangle +1\left|heard:John\negthinspace-\;\&\;*\negthinspace+\right\rangle \\
 & \quad+1\left|saw:John\negthinspace-\;\&\;*\negthinspace+\right\rangle +1\left|saw:Susan\negthinspace-\;\&\;*\negthinspace+\right\rangle 
\end{align*}

\end_inset

The wild-card * is an explicit place-holder for the word in the disjunct.
 The corresponding vector for 
\begin_inset Quotes eld
\end_inset


\emph on
crow
\emph default

\begin_inset Quotes erd
\end_inset

 would be
\begin_inset Formula 
\[
\overrightarrow{crow}_{obj}=1\left|knew:John\negthinspace-\;\&\;*\negthinspace+\right\rangle +1\left|heard:John\negthinspace-\;\&\;*\negthinspace+\right\rangle 
\]

\end_inset

Both of these are again vectors, but very different in form and shape from
 before.
 
\end_layout

\begin_layout Standard
The mergability decision is going to be different, as a result.
 Consider, for example, the cosines.
 Computing explicitly:
\begin_inset Formula 
\[
\cos\left(\overrightarrow{bird}_{naive},\overrightarrow{crow}_{naive}\right)=\frac{4}{\sqrt{6\cdot4}}=\frac{2}{\sqrt{6}}\approx0.8165
\]

\end_inset

while
\begin_inset Formula 
\[
\cos\left(\overrightarrow{bird}_{obj},\overrightarrow{crow}_{obj}\right)=\frac{2}{\sqrt{4\cdot2}}=\frac{1}{\sqrt{2}}\approx0.7071
\]

\end_inset

So how similar are crows and birds? What should the similarity measure be
 now? Should one take the average of these? Do something else? Perhaps one
 might consider a total vector:
\begin_inset Formula 
\[
\overrightarrow{bird}_{total}=\overrightarrow{bird}_{naive}+\overrightarrow{bird}_{obj}
\]

\end_inset

This is appealing, but for one important property: the subspaces 
\begin_inset Formula $\overrightarrow{word}_{naive}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{word}_{obj}$
\end_inset

 are always orthogonal to one-another, always, for any word.
 These are really distinct vector spaces; they don't mix.
 Also, there are more than just these two.
 Consider the sentence 
\begin_inset Quotes eld
\end_inset


\emph on
The bird flew away.
\emph default

\begin_inset Quotes erd
\end_inset

 This suggests a vector
\begin_inset Formula 
\[
\overrightarrow{bird}_{subj}=1\left|flew:*\negthinspace-\;\&\;away\negthinspace+\right\rangle 
\]

\end_inset

where the wild-card is now in the first position, not the second.
 Clearly 
\begin_inset Formula $\overrightarrow{word}_{subj}$
\end_inset

 is always orthgonal to 
\begin_inset Formula $\overrightarrow{word}_{naive}$
\end_inset

 and 
\begin_inset Formula $\overrightarrow{word}_{obj}$
\end_inset

, for any word.
 For any disjunct of length 
\begin_inset Formula $N$
\end_inset

, there are at least 
\begin_inset Formula $N$
\end_inset

 distinct, orthogonal vector spaces, because the wild-card can occur in
 any one of 
\begin_inset Formula $N$
\end_inset

 distinct locations in the disjunct.
 The wild-card can also occur with a + attachment, or a - attachment, so
 there are at least 
\begin_inset Formula $2N$
\end_inset

 distinct vector spaces.
 Finally, if the disjunct has 
\begin_inset Formula $k$
\end_inset

 attachments that are -, and 
\begin_inset Formula $N-k$
\end_inset

 that are +, then the 
\begin_inset Formula $*\negthinspace-$
\end_inset

 can occur in any of 
\begin_inset Formula $m$
\end_inset

 locations, while 
\begin_inset Formula $*\negthinspace+$
\end_inset

 can occur in any of 
\begin_inset Formula $N-k$
\end_inset

 locations.
 Adding up these possibilities, disjuncts of length 
\begin_inset Formula $N$
\end_inset

 span a total of 
\begin_inset Formula $N\left(N+1\right)$
\end_inset

 mututally pair-wise orthogonal subspaces.
 That's a lot of different subspaces to consider.
\end_layout

\begin_layout Standard
Despite this, one expects an overall consistency in the grammatical classificati
on of a word: if one decides that birds are like crows, then the unified
 grammatical class of THINGS that birds and crows belong to must behave
 properly, when placed in any particular grammatical context.
 Before, John heard and saw birds; now John can hear and see THINGS, and
 this needs to hold true for all of the various possible grammatical relations:
\begin_inset Formula 
\begin{align*}
heard: & John\negthinspace-\;\&\;THINGS\negthinspace+;\\
THINGS: & the\negthinspace-\;\&\;was\negthinspace+;
\end{align*}

\end_inset

It must necessarily be the same word-class 
\begin_inset Quotes eld
\end_inset


\emph on
THINGS
\emph default

\begin_inset Quotes erd
\end_inset

 in both of these locations.
 It is grammatically inconsistent to have these being distinct from one-another.
\end_layout

\begin_layout Standard
Thus one concludes: (a) there is more than one vector space available, over
 which similarity comparisons can be made; (b) the decision to merge must
 be made consistently over all available vector spaces; (c) the merge itself
 must still be non-linear, in order to differentiate between different word-sens
es attached to the same word.
 How this may be accomplished is written up in the next section.
 
\end_layout

\begin_layout Standard
The important constraint here is that of (b) – that the resulting grammatical
 classes must be consistent, in all of the syntactic roles that they can
 occur in.
 The various syntactic vector spaces are not independent of one-another,
 but stitch together.
\end_layout

\begin_layout Subsection*
Merge decisions, redux
\end_layout

\begin_layout Standard
Should one take the average of these?
\end_layout

\begin_layout Subsection*
Replacing Cosines by Entropy?
\end_layout

\begin_layout Standard
Cosines are not additive.
 And that's a big problem when trying to add together contributions from
 cross-connectors.
\end_layout

\begin_layout Standard
Consider instead a different word-similarity measure.
 As always, let 
\begin_inset Formula $N(w,d)$
\end_inset

 be the count of the number of times the disjunct 
\begin_inset Formula $d$
\end_inset

 was observed on word 
\begin_inset Formula $w$
\end_inset

.
 Define the right-product
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
f(w,u)=\sum_{d}N(w,d)N(u,d)
\]

\end_inset

with the sum ranging over all disjuncts shared in common between the two
 words.
 This is just the dot-product for the two vectors 
\begin_inset Formula $\vec{w}$
\end_inset

 and 
\begin_inset Formula $\vec{u}$
\end_inset

 – that is, 
\begin_inset Formula $f(w,u)=\vec{w}\cdot\vec{u}$
\end_inset

 and so the cosine similarity of two word-disjunct vectors is just
\begin_inset Formula 
\[
\cos\left(\vec{w},\vec{u}\right)=\frac{f\left(w,u\right)}{\sqrt{f\left(w,w\right)f\left(u,u\right)}}
\]

\end_inset

Consider instead a similar quantity
\begin_inset Formula 
\[
p\left(w,u\right)=\frac{f\left(w,u\right)}{f\left(*,*\right)}
\]

\end_inset

where 
\begin_inset Formula $f\left(*,*\right)=\sum_{w,u}f\left(w,u\right)$
\end_inset

 is a normalization, a total count.
 The quantity 
\begin_inset Formula $p(w,u)$
\end_inset

 can be interpreted as a probability: it clearly sums to one.
 It is symmetric: 
\begin_inset Formula $p(w,u)=p(u,w)$
\end_inset

 and one can thus have traditional marginal probabilities: 
\begin_inset Formula 
\[
p\left(w\right)=p\left(w,*\right)=\sum_{u}p\left(w,u\right)
\]

\end_inset

This suggests a natural form for the mutual information between words:
\begin_inset Formula 
\[
MI_{d}\left(w,u\right)=\log_{2}\frac{p\left(w,u\right)}{p\left(w\right)p\left(u\right)}
\]

\end_inset

The subscript 
\begin_inset Formula $d$
\end_inset

 on 
\begin_inset Formula $MI_{d}$
\end_inset

 serves to remind that this variant of mutual information is derived from
 the disjunct product, and not from word-pair observations.
 Unlike the word-pair observations, this value of MI is symmetric under
 word-interchange: the word-order does not matter.
\end_layout

\begin_layout Subsection*
Replacing Cosines by Surprisingness?
\end_layout

\begin_layout Standard
Can this work?
\end_layout

\begin_layout Section*
Conclusion
\end_layout

\begin_layout Standard
Not yet written.
\end_layout

\begin_layout Section*
The End
\end_layout

\begin_layout Standard
\begin_inset CommandInset bibtex
LatexCommand bibtex
bibfiles "lang"
options "alpha"

\end_inset


\end_layout

\end_body
\end_document