Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work-in-Progress on PL/Java refactoring, API modernization #399

Open
wants to merge 120 commits into
base: REL1_7_STABLE
Choose a base branch
from

Conversation

jcflack
Copy link
Contributor

@jcflack jcflack commented Jan 24, 2022

As a work-in-progress pull request, this is not expected to be imminently merged, but is here to document the objectives and progress of the ongoing work.

Why needed

A great advantage promised by a PL based on the JVM is the large ecosystem of languages other than Java that can be supported on the same infrastructure, whether through the Java Scripting (JSR 223) API, or through the polyglot facilities of GraalVM, or simply via separate compilation to the class file format and loading as jars.

However, PL/Java, with its origins in 2004 predating most of those developments, has architectural limitations that stand in the way.

JDBC

One of the limitations is the centrality of the JDBC API. To be sure, it is a standard in the Java world for access to a database, and for PL/Java to conform to ISO SQL/JRT, the JDBC API must be available. But it is not necessarily a preferred or natural database API for other JVM or GraalVM languages, and its design goal is to abstract away from the specifics of an underlying database, which ends up complicating or even preventing access to advanced PostgreSQL capabilities that could be prime drivers for running server-side code in the first place.

The problem is not that JDBC is an available API in PL/Java, but that it is the fundamental API in PL/Java, with its tentacles reaching right into the native C language portion of PL/Java's implementation. That has made alternative interface options impractical, and multiplied the maintenance burden of even simple tasks like adding support for new datatype mappings or fixing simple bugs. There are significant portions of JDBC 4 that remain unimplemented in PL/Java.

Experience building an implementation of ISO SQL/XML XMLQUERY showed that certain requirements of the spec were simply unsatisfiable atop JDBC, either because of inherent JDBC limitations or limits in PL/Java's implementation of it. An example of each kind:

  • The INTERVAL data type cannot be mapped as SQL/XML requires, because the only ResultSetMetadata methods JDBC defines for access to a type modifier are precision and scale, which apply to numeric values; the API defines no standard way to learn what the modifier of an INTERVAL says about whether months or days are present.
  • The DECIMAL type cannot be mapped as SQL/XML requires; for that case, the fault is not with JDBC (which defines the precision and scale methods), but with their incomplete implementation in PL/Java.

Those cases also illustrate that mapping some PostgreSQL data types to those of another language can be complex. An arbitrary PostgreSQL INTERVAL is representable as neither a java.time.Period nor a java.time.Duration alone (though a pair of the two can be used, a type that PGJDBC-NG offers). One or the other can suffice if the type modifier is known and limits the fields present. A PostgreSQL NUMERIC value has not-a-number and signed infinity values that some candidate language-library type might not, and an internal precision that its text representation does not reveal, which might need to be preserved for a mathematically demanding task. The details of converting it to another language's similar type need to be knowable or controllable by an application.

It is a goal of this work to give PL/Java an API that does not obscure or abstract from PostgreSQL details, but makes them accessible in a natural Java idiom, and that such a "natural PostgreSQL" API should be adequate to allow building a JDBC layer in pure Java above it. (The work of building such a JDBC layer is not in the scope of this pull request.)

Parameter and return-value mapping

PL/Java uses a simple, Java-centric approach where a Java method is declared naturally, giving ordinary Java types for its parameters and return, and the mappings from these to the PostgreSQL parameter and return types are chosen by PL/Java and applied transparently (and much of that happens deep in PL/Java's C code).

While convenient, that approach isn't easily adapted to other JVM languages that may offer other selections of types. Even for Java, it stands in the way of doing certain things possible in PostgreSQL, like declaring VARIADIC "any" functions.

In a modernized API, it needs to be possible to declare a function whose parameter represents the PostgreSQL FunctionCallInfo, so that the parameters and their types can be examined and converted in Java. That will make it possible to write language handlers in Java, whether for other JVM languages or for the existing PL/Java calling conventions that at present are tangled in C.

Elements of new API

Identification of data types

A PostgreSQL-specific API must be able to refer unambiguously to any type known to the database, so it cannot rely on any fixed set of generic types such as JDBCType. To interoperate with a JDBC layer, though, the identifier for types should implement JDBC's SQLType interface.

The API should support retrieving enough metadata about the type for a JDBC layer implemented above it to be able to report complete ResultSetMetaData information.

The new class serving this purpose is RegType.

As RegType implements the java.sql.SQLType interface, an aliasing issue arises for a JDBC layer. Such a layer should accept JDBCType.VARCHAR as an alias for RegType.VARCHAR, for example. JDBC itself has no methods that return an SQLType instance, so the question of whether it should return the generic JDBC type or the true RegType does not arise. A PL/Java-specific API is needed for retrieving the type identifier in any case.

The details of which JDBC types are considered aliases of which RegTypes will naturally belong in a JDBC API layer. At the level of this underlying API, a RegType is what identifies a PostgreSQL type.

While RegType includes convenience final fields for a number of common types, those by no means limit the RegTypes available. There is a RegType that can be obtained for every type known to the database, whether built in, extension-supplied, or user-defined.

Other PostgreSQL catalog objects and key abstractions

RegType is one among the types of PostgreSQL catalog objects modeled in the org.postgresql.pljava.model package.

Along with a number of catalog object types, the package also contains:

  • TupleDescriptor and TupleTableSlot, the key abstractions for fetching and storing database values. TupleTableSlot in PostgreSQL is already a useful abstraction over a few different representations; in PL/Java it is further abstracted, and can present with the same API other collections of typed, possibly named, items, such as arrays, the arguments in a function call, etc.
  • MemoryContext and ResourceOwner, both subtypes of Lifespan, usable to guard Java objects that have native state whose validity is bounded in time
  • CharsetEncoding

Mapping PostgreSQL data types to what a PL supports

The Adapter class

A mapping between a PostgreSQL data type and a suitable PL data type is an instance of the Adapter class, and more specifically of the reference-returning Adapter.As<T,U> or one of the primitive-returning Adapter.AsInt<U>, Adapter.AsFloat<U>, and so on (one for each Java primitive type). The Java type produced is T for the As case, and implicit in the class name for the AsFoo cases.

The basic method for fetching a value from a TupleTableSlot is get(Attribute att, Adapter adp), and naturally is overloaded and generic so that get with an As<T,?> adapter returns a T, get with an AsInt<?> adapter returns an int, and so on. (But see this later comment below for a better API than this item-at-a-time stuff.) (The U type parameter of an adapter plays a role when adapters are combined by composition, as discussed below, and is otherwise usually uninteresting to client code, which may wildcard it, as seen above.)

A manager class for adapters

Natural use of this idiom presumes there will be some adapter-manager API that allows client code to request an adapter for some PostgreSQL type by specifying a Java witness class Class<T> or some form of super type token, and returns the adapter with the expected compile-time parameterized type.

That manager hasn't been built yet, but the requirements are straightforward and no thorny bits are foreseen. (Within the org.postgresql.pljava.internal module itself, things are simpler; no manager is needed, and code refers directly to static final INSTANCE fields of existing adapters.)

Extensibility

PL/Java has historically supported user-defined types implemented in Java, a special class of data types whose Java representations must implement a certain JDBC interface and import and export values through a matching JDBC API. In contrast, PL/Java's first-class PostgreSQL data type support—the mappings it supplies between PostgreSQL and ordinary Java types that don't involve the specialized JDBC user-defined type APIs—has been hardcoded in C using Java Native Interface (JNI) calls, and not straightforward to extend. That's a pain point for several situations:

  • A mapping for another PostgreSQL data type (either a type newly added to PostgreSQL, or simply one that PL/Java does not yet have a mapping for) is not easily added for an application that needs it, but generally must be added in PL/Java's C/JNI internals and made available in a new PL/Java build.
  • A mapping of an existing PostgreSQL data type to a new or different Java type—same story. When Java 8 introduced the java.time package, developers wishing to have PL/Java map PostgreSQL's date and time types to the improved Java types instead of the older java.sql ones had to open issues requesting that ability and wait for a PL/Java release to include it.
  • Not every PostgreSQL data type has a single best PL type to be mapped to. One application using the geometric types might want them mapped to the Java types in the PGJDBC library, while another might prefer the 2D classes supplied by some Java geometry library. One application might want a PostgreSQL array mapped to a flat Java List, another to a multi-dimensioned Java array, another to a matrix class from a scientific computation library. The choices multiply when considering the data types not only of Java but of other JVM languages. C coding and rebuilding of PL/Java should not be needed to tailor these mappings.

Adapters implementable in pure Java

With this PR, code external to PL/Java's implementation can supply adapters, built against the service-provider API exposed in org.postgresql.pljava.adt.spi.

Leaf adapters

A "leaf" adapter is one that directly knows the PostgreSQL datum format of its data type, and maps that to a suitable PL type. Only a leaf adapter gets access to PostgreSQL datums, which it should not leak to other code. Code that defines leaf adapters must be granted a permission in pljava.policy.

Composing adapters

A composing, or non-leaf, adapter is one meant to be composed over another adapter. An example would be an adapter that composes over an adapter returning type T (possibly null) to form an adapter returning Optional<T>. With a selection of common composing adapters (there aren't any in this pull request, yet), it isn't necessary to provide leaf adapters covering all the ways application code might want data to be presented. No special permission is needed to create a composing adapter.

Java's generic types are erased to raw types for runtime, but the Java compiler saves the parameter information for runtime access through Java reflection. As adapters are composed, the Adapter class tracks the type relationships so that, for example, an Adapter<Optional<T>,T> composed over an Adapter<String,Void> is known to produce Optional<String>.

It is that information that will allow an adapter manager to satisfy a request to map a given PostgreSQL type to some PL type, by finding and composing available adapters.

Contract-based adapters

For a PostgreSQL data type that doesn't have one obvious best mapping to a PL type (perhaps because there are multiple choices with different advantages, or because there is no suitable type in the PL's base library, and any application will want the type mapped to something in a chosen third-party library), a contract-based adapter may be best. An Adapter.Contract is a functional interface with parameters that define the semantically-important components of the PostgreSQL type, and a generic return type, so an implementation can return any desired representation for the type.

A contract-based adapter is a leaf adapter class with a constructor that accepts a Contract, producing an adapter between the PostgreSQL type and whatever PL type the contract maps it to. The adapter encapsulates the internal details of how a PostgreSQL datum encodes the value, and the contract exposes the semantic details needed to faithfully map the type. Contracts for many existing PostgreSQL types are provided in the org.postgresql.pljava.adt package.

ArrayAdapter

The one supplied ArrayAdapter is contract-based. While a Contract.Array has a single abstract method, and therefore could serve as a functional interface, in practice it is not directly implementable by a lambda; there must be a subclass or subinterface (possibly anonymous) whose type parameterization the Java compiler can record. (A lambda may then be used to instantiate that.) An instance of ArrayAdapter is constructed by supplying an adapter for the array's element type along with an array contract targeting some kind of collection of the mapped type. As with a composing adapter, the Adapter class substitutes the element adapter's target Java type through the type parameters of the array contract, to arrive at the actual parameterized type of the resulting array or collection.

PostgreSQL arrays can be multidimensional, and are regular (not "jagged"; all sub-arrays at a given dimension match in size). They can have null elements, which are tracked in a bitmap, offering a simple way to save some space for arrays that are sparse; there are no other, more specialized sparse-array provisions.

Array indices need not be 0- or 1-based; the base index as well as the index range can be given independently for each dimension. PostgreSQL creates 1-based arrays by default. This information is stored with the array value, not with the array type, so a column declared with an array type could conceivably have values of different cardinalities or even dimensionalities.

The adapter is contract-based because there are many ways application code could want a PostgreSQL array to be presented: as a List or single Java array (flattening multiple dimensions, if present, to one, and disregarding the base index), as a Java array-of-arrays, as a JDBC Array object (which does not officially contemplate more than one array dimension, but PostgreSQL's JDBC drivers have used it to represent multidimensioned arrays), as the matrix type offered by some scientific computation library, and so on.

For now, one predefined contract is supplied, AsFlatList, and a static method, nullsIncludedCopy, that can be used (via method reference) as one implementation of that contract.

Java array-of-arrays

While perhaps not an extremely efficient way to represent multidimensional arrays, the Java array-of-arrays approach is familiar, and benefits from a bit of dedicated support for it in Adapter. Therefore, if you have an Adapter a that renders a PostgreSQL type Foo as Java type Bar, you can use, for example, a.a2().build() to obtain an Adapter from the PostgreSQL array type Foo[] to the Java type Bar[][], requiring the PostgreSQL array to have two dimensions, allowing each value to have different sizes along those dimensions, but disregarding the PostgreSQL array's start indices (all Java arrays start at 0).

Because PostgreSQL stores the dimension information with each value and does not enforce it for a column as a whole, it could be possible for a column of array values to include values with other numbers of dimensions, which an adapter constructed this way will reject. On the other hand, the sizes along each dimension are also allowed by PostgreSQL to vary from one value to the next, and this adapter accommodates that, as long as the number of dimensions doesn't change.

The existing contract-based ArrayAdapter is used behind the scenes, but build() takes care of generating the contract. Examples are provided.

Adapter maintainability

Providing pure-Java adapters that know the internal layouts of PostgreSQL data types, without relying on JNI calls and the PostgreSQL native support routines, entails a parallel-implementation maintenance responsibility roughly comparable to that of PostgreSQL client drivers that support binary send and receive. (The risk is slightly higher because the backend internal layouts are less committed than the send/receive representations. Because they are used for data on disk, though, historically they have not changed often or capriciously.)

The engineering judgment is that the resulting burden will be manageable, and the benefits in clarity and maintainability of the pure-Java implementations, compared to the brittle legacy Java+C+JNI approach, will predominate. The process of developing clear contracts for PostgreSQL types already has led to discovery of one bug (#390) that could be fixed in the legacy conversions.

For the adapters supplied in the org.postgresql.pljava.internal module, it is possible to use ModelConstants.java/ModelConstants.c to ensure that key constants (offsets, flags, etc.) stay synchronized with their counterparts in the PostgreSQL C code.

Adapter is a class in the API module, with the express intent that other adapters can be developed, and found by the adapter manager through a ServiceLoader API, without being internal to PL/Java. Those might not have the same opportunity for build-time checking against PostgreSQL header files, and will have to rely more heavily on regression tests for key data values, much as binary-supporting client drivers must. The same can be true even for PL/Java internal adapters for a few PostgreSQL data types whose C implementations are so strongly encapsulated (numeric comes to mind) that necessary layouts and constants do not appear in .h files.

Known open items

In no well-defined order ....

  • The to-PostgreSQL direction for Adapter, TupleTableSlot, and Datum.Accessor. These all have API and implementation for getting PostgreSQL values and presenting them in Java. Now the other direction is needed.
  • Provide API and implementation for a unified list-of-slots representation for a variety of list-of-tuple representations used in PostgreSQL, by:
    • factoring out the list-of-TupleTableSlot classes currently found as preliminary scaffolding in TupleTableSlot.java
    • providing such a representation for SPITupleTable ...
    • CatCList ...
    • Tuplestore? ...
    • ...?
  • Implement some form of offset memoization so fetching attributes from a heap TupleTableSlot stays subquadratic
  • Finish the unimplemented grants methods of RegRole and the unmplemented unary one of CatalogObject.AccessControlled. (Needs the CatCList support, for pg_auth_members searches.)
  • A NullableDatum flavor of TupleTableSlot. One of the last prerequisites to enable pure-Java language-handler implementations, to which the function arguments will appear as a TupleTableSlot.
  • Complete the implementation of isSubtype with the rules from Java Language Specification 4.10. (At present it is a stub that only checks erased subtyping, enough to get things initially going.)
  • The adapter manager described above. (Requires isSubtype.)
  • Adapters for PostgreSQL types that don't have them yet (starting, perhaps, with the ones that already have contracts defined in org.postgresql.pljava.adt).
  • TextAdapter does not yet support the type modifiers for CHAR and VARCHAR. It needs a contract-based flavor that does.
  • ArrayAdapter (or Contract.Array) should supply at least one convenience method, taking a dimsAndBounds array parameter and generating an indexing function (a MethodHandle?) that has nDims integer parameters and returns an integer flat index. Other related operations? An index enumerator, etc.?
  • A useful initial set of composing adapters, such as:
    • one of the form As<Optional<T>,T>
      • implement in an example class
      • integrate into PL/Java proper
    • one extending As<T,T> that returns null for null and values unchanged
      • why? because with adapter autoboxing, it can be composed over any primitive-returning adapter to enable it to handle null, by returning its boxed form
      • implement in an example class
      • integrate into PL/Java proper
    • a set composing over primitive adapters to use a specified value in the primitive's value space to represent null.
      • implement in an example class
      • complete the set and integrate into PL/Java proper
  • More work on CatalogObject invalidation. RegClass and RegType are already invalidated selectively; probably RegProcedure should be also. PostgreSQL has a limited number of callback slots, so it would be antisocial to grab them for all the supported classes: less critical ones just depend on the global switchpoint; come up with a good story for invalidating those. Also for how TupleDescriptor should behave upon invalidation of its RegClass. See commit comments for 5adf2c8.
  • Better define and implement the DualState behavior of TupleTableSlot.
  • Reduce the C-centricity of VarlenaWrapper. Goal: DatumUtils.mapVarlena doing more in Java, less in C.
    • more of VarlenaWrapper's functionality moved to DatumImpl
    • client code no longer casting Datum.Input to VarlenaWrapper to use it.
  • Adapter should have control over the park/fetch/decompress/lifespan decisions for VarlenaWrapper; currently the behavior is hardcoded for top-transaction lifespan, lazy detoasting, appropriate for SQLXML, which was the first VarlenaWrapper client.
  • Add MBeans with statistics for the new caches

And then

  • Choose some interesting JVM language foo and implement a simple PL/foo in pure Java, using these facilities.
  • Reimplement PL/Java's own language handler the same way.

Tweak invocation.c so the stack-allocated space provided by the caller
is used to save the prior state rather than to construct the new state.
This way, the current state can have a fixed address (currentInvocation
is a constant pointer) and can be covered by a single static
ByteBuffer that Invocation.java can read/write through without relying
on JNI methods.

As Invocation isn't a JDBC-specific concept or class, it has never
made much sense to have it in the .jdbc package. Move it to .internal.
Both values have just been stashed by stashCallContext.
Both will be restored 14 lines later by _closeIteration.
And nothing in those 14 lines cares about them.
After surveying the code for where function return values can
be constructed, add one switchToUpperContext() around the construction
of non-composite SRF return values, where it was missing, so such values
can be returned correctly after SPI_finish(), and so the former,
very hacky, cross-invocation retention of SPI contexts can be sent
to pasture.

For the record, these are the notes from that survey of the code:

Function results, non-set-returning:
 Type_invoke:
  the inherited _Type_invoke calls ->coerceObject, within sTUC.
  sub"class"es that override it:
   Boolean,Byte,Double,Float,Integer,Long,Short,Void:
   - overridden in order to use appropriately-typed JNI invoke method
   - Double,Float,Long have _asDatum that does sTUC;
     . historical artifact; those types were !byval before PG 8.4
   - the rest do not sTUC; should be ok, all byval
   Coerce: does sTUC
   Composite: does sTUC around _getTupleAndClear
 Arrays:
  createArrayType (extern, in Array.c) does sTUC. So far so good.
  What about !byval elements stored into the array?
   the non-primitive/any types don't override _Array_coerceObject,
   which is where Type_coerceObject on each element, and construct_md_array
   are called. With no sTUC. Around construct_md_array is really where it's
   needed.
   But then, _Array_coerceObject is still being called within sTUC
   of _Type_invoke. All good.
   Hmm: !byval elements of values[] are leaked when pfree(values) happens.
   They should be pfree'd unconditionally; construct_md_array copies them.
 What about UDTs?
  They don't override _Type_invoke.
  So they inherit the one that calls ->coerceObject, within sTUC.
  That ought to be enough. UDT.c's coerceScalarObject itself also sTUCs,
  inconsistently, for fixed-length and varlena types but not NUL-terminated.
  That should be ok, and merely redundant. In coerceTupleObject, no sTUC
  appears. Again, by inheritance of coerceObject, that should be ok.
  Absent that, sTUC around the SQLOutputToTuple_getTuple should be adequate;
  only if that could produce a tuple with TOAST pointers would it also be
  necessary around the HeapTupleGetDatum.


Function results, set-returning:
 _datumFromSRF is applied to each row result
 The inherited _datumFromSRF calls Type_coerceObject, NOT within sTUC
  XXX this, at least, definitely needs a sTUC added.
 sub"class"es that override it:
  only Composite: calls _getTupleAndClear, NOT within sTUC. But it
  works out, just because TupleDesc.java's native _formTuple method uses
  JavaMemoryContext. Spooky action at a distance?


Results from triggers:
 Function.c's invokeTrigger does sTUC around the getTriggerReturnTuple.
In passing, fix a long-standing thinko in Invocation_popInvocation:
the memory context that was current on entry is stored in upperContext
of *this* Invocation, but popInvocation was 'restoring' the one that was
saved in the *previous* Invocation.

Also in passing, move the cleanEnqueuedInstances step later in the
pop sequence, improving its chance of seeing instances that could become
unreachable through the release of SPI contexts or the JNI local frame.
This can reveal issues with the nesting of SPI 'connections' or
management of their associated memory contexts.
Without the special treatment, the instance of the Java class
Invocation, if any, that corresponds to the C Invocation, has its
lifetime simply bounded to that of the C Invocation, rather than
artificially extended across a sequence of SRF value-per-call
invocations. It is simpler, does not break any existing tests, and
is less likely to be violating PostgreSQL assumptions on correct
behavior.
The commits merged here into this branch simplify PL/Java's management
of the PostgreSQL-to-PL/Java-function invocation stack, and especially
simplify the handling of SPI (PostgreSQL's Server Programming Interface)
and set-returning functions.

SPI includes "connect" and "finish" operations normally used in a simple
pattern: connect before using SPI functions, finish when done and before
returning to the caller, and if anything allocated while "connected" is
to be returned to the caller, be sure to allocate that in the "upper
executor" memory context (that is, the context that was current before
SPI_connect).

PL/Java has long diverged from that approach, especially for the case
of set-returning functions using the value-per-call protocol (the only
one PL/Java currently supports). If SPI was connected during one call
in the sequence, PL/Java has sought to save and reuse that connection
and its memory contexts over later calls (where a simpler, "by the book"
implementation would simply SPI_connect and SPI_finish within the
individual calls as needed).

It never seemed altogether clear that was a good idea, but at the same
time there weren't field reports of failure. It turns out, though, not
hard to construct tests showing the apparent success was all luck.

It has not been much trouble to reorganize that code so that SPI is used
in the much simpler, by-the-book fashion. b2094ba fixes one place where
a needed switchToUpperContext was missing but the error was masked
by the former SPI juggling, and with that fixed, all the tests in
the CI script promptly passed, with SPI used in the purely nested way
that it expects.

One other piece of complexity that has been removed was the handling of
Java Invocation objects during set-returning functions. Although
the stack-allocated C invocation struct naturally lasts only through one
actual call, PL/Java's SRF code took pains to keep its Java counterpart
alive, as if the one instance represented the entire sequence of actual
calls while returning a set. Eliminating that behavior has simplified
the code and shown no adverse effect in the available tests.

As these are changes of some significance that might possibly alter
some behavior not tested here, they have not been made in the 1.6 or
1.5 branches. But the simplification seems to make a less brittle base
for the development going forward on this branch.
CacheMap is a generic class useful for (possibly weak or soft)
canonicalizing caches of things that are identified by one or more
primitive values. (Writing the key values into a ByteBuffer avoids
the allocation involved in boxing them; however, the API as it
currently stands might be exceeding that cost with instantiation
of lambdas. It should eventually be profiled, and possibly revised
into a less tidy, but more efficient, form.)

SwitchPointCache is intended for lazily caching numerous values
of diverse types, groups of which can be associated with a single
SwitchPoint for purposes of invalidation.

As currently structured, the SwitchPoints (and their dependent
GuardWithTest nodes) do not get stored in static final fields;
this may limit HotSpot's ability to optimize them as fully as
it could if they did.
Adapter is the abstract ancestor of all classes that implement
PostgreSQL datatypes for PL/Java, and the adt.spi package contains
classes that will be of use to datatype-implementing code:
in particular, Datum. PostgreSQL datums are only exposed
to Adapters, and the Adapter's job is to reliably convert between
the PostgreSQL type and some appropriate Java representation.

For some datatypes, there is a single or obvious appropriate Java
representation, and an Adapter may be provided that simply produces
that. For other datatypes, there may be no single obvious choice
of Java representation, either because there is no good match or
because there are several; an application might want to map types
to specialized classes available in some domain-specific library.
To serve those cases, Adapters can be defined in terms of
Adapter.Contract subinterfaces, which are simply functional interfaces
that document and expose the semantic components of the PostgreSQL
type. For example, a contract for PostgreSQL INTERVAL would expose
a 64-bit microseconds component, a 32-bit day count, and a 32-bit
month count. The division of responsibility is that the Adapter
encapsulates how to extract those components given a PostgreSQL
datum, but the contract fixes the semantics of what the components
are. It is then simple to use the Adapter, with any lambda that
conforms to the contract, to produce any desired Java representation
of the type.

Dummy versions of Attribute, RegClass, RegType, TupleDescriptor,
and TupleTableSlot break ground here on the model package, which
will consist of a set of classes modeling key PostgreSQL abstractions
and a useful subset of the PostgreSQL system catalogs.

RegType also implements java.sql.SQLType, making it usable in
(a suitable implementation of) JDBC to specify PostgreSQL types
precisely.

adt.spi.AbstractType needs the specialization() method that was
earlier added to internal.Function in anticipation of needing it
someday.
The org.postgresql.pljava.adt package contains 'contracts'
(subinterfaces of Adapter.Contract.Scalar or Adapter.Contract.Array),
which are functional interfaces that document and expose the exact
semantic components of PostgreSQL data types.

Adapters are responsible for the internal details of PostgreSQL's
representation that aren't semantically important, and code that
simply needs to construct some semantically faithful representation
of the type only needs to be concerned with the contract.
CharsetEncoding is not really a catalog object (the available
encodings in PostgreSQL are hardcoded) but is exposed here as
a similar kind of object with useful operations, including
encoding and decoding using the corresponding Java codec when
known.

CatalogObject is, of course, the superinterface of all things
that really are catalog objects (identified by a classId, an objectId,
and rarely a subId). This commit brings in RegNamespace and RegRole
as needed for CatalogObject.Namespaced and CatalogObject.Owned.
RolePrincipal is a bridge between a RegRole and Java's Principal
interface.

CatalogObject.Factory is a service interface 'used' by the API
module, and will be 'provided' by the internals module to supply
the implementations of these things.
And convert other code to use CharsetEncoding.SERVER_ENCODING
where earlier hacks were used, like the implServerCharset()
added to Session in 1.5.1.

In passing, fix a bit of overlooked java7ification in SQLXMLImpl.

The new CharsetEncodings example provides two functions:

SELECT * FROM javatest.charsets();

returns a table of the available PostgreSQL encodings, and what Java
encodings they could be matched up with.

SELECT * FROM javatest.java_charsets(try_aliases);

returns the table of all available Java charsets and the PostgreSQL ones
they could be matched up with, where the boolean try_aliases indicates
whether to try Java's known aliases for a charset when nothing in
PostgreSQL matched its canonical name. False matches happen when
try_aliases is true, so that's not a great idea.
These PostgreSQL notions will have to be available to Java code
for two reasons.

First, even code that has no business poking at them can still need
to know which one is current, to set an appropriate lifetime on
a Java object that corresponds to something in PostgreSQL allocated
in that context or registered to that owner. For that purpose, they
both will be exposed as subtypes of Lifespan, and the existing
PL/Java DualState class will be reworked to accept any Lifespan to
bound the validity of the native state.

Second, Adapter code could very well need to poke at such objects
(MemoryContexts, anyway): either to make a selected one current for
when allocating some object, or even to create and manage one.
Methods for that will not be exposed on MemoryContext or ResourceOwner
proper, but could be protected methods of Adapter, so that only
an Adapter can use them.
In addition to MemoryContextImpl and ResourceOwnerImpl proper, this step
will require reworking DualState so state lives are bounded by Lifespan
instances instead of arbitrary pointer values. Invocation will be made
into yet another subtype of Lifespan, appropriate for the life of an
object passed by PostgreSQL in a call and presumed good while the call
is in progress.

The DualState change will have to be rototilled through all of its
clients. That will take the next several commits.

The DualState.Key requirement that was introduced in 1.5.1 as a way to
force DualState-guarded objects to be constructed only in upcalls from C
(as a hedge against Java code inadvertently doing it on the wrong
thread) will go away. We *want* Adapters to be able to easily construct
things without leaving Java. Just don't do it on the wrong thread.
Never very well publicized upstream, reading the examples of plpgsql,
plperl, and plpython, when using BeginInternalSubTransaction, there
is a certain pattern of saving and restoring the memory context and
resource owner that PL/Java has not been doing.

Now it is easy to implement that.

https://www.postgresql.org/message-id/619EA06D.9070806%40anastigmatix.net
The current invocation can be the right Lifespan to specify for
a DualState that's guarding some object PostgreSQL passed in to
the call, which is expected to be good for as long as the call
is in progress.

In other, but related, news, Invocation can now return the
"upper executor" memory context: that is, whatever context was
current at entry, even if a later use of SPI changes the context
that is current.

It can appear tempting to eliminate the special treatment of PgSavepoint
in Invocation, and simply make it another DualState client, but because
of the strict nesting imposed on savepoints, keeping just the one
reference to the first one set suffices, and is more efficient.
Simplify these: their C callers were passing unconditional null
as the ResourceOwner before, which their Java constructors passed
along unchanged. Now just have the Java constructor pass null
as the Lifespan.
These DualState clients were previously passing the address of
the current invocation struct as their "resource owner", again from
the C code, passed along by the Java constructor. Again simplify
to call Invocation.current() right in the Java constructor and use
that as the Lifespan.

On a side note, the legacy Relation class included here (and its
legacy Tuple and TupleDesc) will naturally be among the first
candidates for retirement when this new model API is ready.
This legacy Portal class is called from C and passed the address
of the PostgreSQL ResourceOwner associated with the Portal itself.
This is only an intermediate refactoring of VarlenaWrapper.
Construction of one is still set in motion from C. Ultimately,
it should implement Datum and be something that a Datum.Accessor
can construct with a minimum of fuss.
Originally a hedge against coding mistakes during the introduction
of DualState for 1.5.1 (which had to support Java < 9), it is less
necessary now that the internals are behind JPMS encapsulation, and
the former checks for the cookie can be replaced with assertions that
the action is happening on the right thread. The CI tests run with
assertions enabled, so this should be adequate.
The commits grouped under this merge add API to expose in Java
the PostgreSQL notions of MemoryContext and ResourceOwner, and then
rework PL/Java's DualState class (which manages objects that combine
some Java state and some native state, and may need specified actions
to occur if the Java state becomes unreachable or explicitly released
or if a lifespan bounding the native state expires). A DualState now
accepts a Lifespan, of which MemoryContext and ResourceOwner are both
subtypes. So is Invocation, an obvious lifespan for things PostgreSQL
passes in that are expected to be valid for the duration of the call.

The remaining commits in this group propagate the changes through
the affected legacy code.
Fitting it into the new scheme is not entirely completed here;
for example, newReadable takes a Datum.Input parameter, but still
casts it internally to VarlenaWrapper.Input. Making it interoperate
with any Datum.Input may be a bit more work.

Likewise, newReadable with synthetic=true still encapsulates all
the knowledge of what datatypes there is synthetic-XML coverage
for and selecting the right VarlenaXMLRenderer for it (there's
that varlena-specificity again!). More of that should be moved
out of here and into an Adapter.

In passing, fix a couple typos in toString() methods, and add
a serviceable, if brute-force, getString() method to Synthetic.
It would be better for SyntheticXMLReader to gain the ability to
produce character-stream output efficiently, but until that
happens, there needs to be something for those moments when you
just want a string to look at and shouldn't have to fuss to get it.

For now, VarlenaWrapper.Input and .Stream still extend, and add small
features like toString(Object) to, DatumImpl. Later work can probably
migrate those bits so VarlenaWrapper will only contain logic specific
to varlenas.

An adt.spi interface Verifier is added, though Datum doesn't yet
expose any way to use it; in this commit, only one method accepting
Verifier.OfStream is added in DatumImpl.Input.Stream, the minimal
change needed to get things working.
As before, JNI methods for this 'model' framework continue to
be grouped together in ModelUtils.c; their total number and
complexity is expected to be low enough for that to be practical,
and then they can all be seen in one place.

RegClassImpl and RegTypeImpl acquire m_tupDescHolder arrays in
this commit, without much explanation; that will come a few commits
later.
There are two flavors so far, Deformed and Heap. Deformed works
with whatever a real PostgreSQL TupleTableSlot can work with,
relying on the PostgreSQL implementation to 'deform' it into
separate datum and isnull arrays. (That doesn't have to be a
PostgreSQL 'virtual' TupleTableSlot; it can do the deforming
independently of the type of slot. When the time comes to
implement the reverse direction and produce tuples, a virtual
slot will be the way to go for that, using the PostgreSQL C code
to 'form' it once populated.)

The Heap flavor knows enough about that PostgreSQL tuple format
to 'deform' it in Java without the JNI calls (except where some
out-of-line value has to be mapped, or for varlena values until
VarlenaWrapper sheds more of its remaining JNI-centricity). The
Heap implementation does not yet do anything clever to memoize
the offsets into the tuple, which makes the retrieval of all
the tuple's values an O(n^2) proposition; there is a
low-hanging-fruit optimization opportunity there. For now, it gets
the job done.

It might be interesting to see how the two flavors compare on
typical heap tuples: Deformed, making more JNI calls but relying
on PostgreSQL's fast native deforming, or Heap, which can avoid
more JNI calls, and also avoids deforming something into a fresh
native memory allocation if the only thing it will be used for is
to immediately construct some Java object.

The Heap flavor can do one thing the Deformed flavor definitely
cannot: it can operate on heap-tuple-formatted contents of an
arbitrary Java byte buffer, which in theory might not even be
backed by native memory. (Again, for now, this is slightly science
fiction where varlena values are concerned, because VarlenaWrapper
retains a lot of its native dependencies. A ByteBuffer "heap tuple"
with varlenas in it will have to be native-backed for now.) The
selection of the DualState guard by heapTupleGetLightSlot() is
currently more hardcoded than that would suggest; it assumes the
buffer is mapping memory that can be heap_free_tuple'd.

The 'light' in heapTupleGetLightSlot really means that there isn't
an underlying PostgreSQL TupleTableSlot constructed.

The whole business of how to apply and use DualState guards on these
things still needs more attention.

There is also Heap.Indexed, which is the thing needed for arrays.
When the element type is fixed-length, it achieves O(1) access
(plus null-bitmap processing if there are nulls). It uses a "count
preceding null bits ahead of time" strategy that could also easily
be adopted in Heap.

A NullableDatum flavor is also needed, which would be the thing for
mapping (as one prominent example) function-call arguments.

The HeapTuples8 and HeapTuples4 classes at the end are scaffolding
and ought to be factored out into something with a decent API, as
hinted at in the comment preceding them.

A Heap instance still inherits the values/nulls array fields used
in the deformed case, without (at present) making any use of them.
It is possible some use could be made (as, again, an underlying PG
TupleTableSlot could be used in deforming a heap tuple), but it's
also possible that won't ever be needed, and the class could be
refactored to a simpler form.
Here's how this is going to work.

The "exists because mentioned" aspect of a CatalogObject is
a lightweight operation, just caching/returning a singleton with
the mentioned values of classId/objId/(subId?).

For a bare CatalogObject (objId unaccompanied by classId), that's
all there is. But for any CatalogObject.Addressed subtype, the
classId and objId together identify a tuple in a particular system
catalog (or, that is, identify a tuple that could exist in that
catalog). And the methods on the Java class that return information
about the object get the information by fetching attributes from
that tuple, then constructing whatever the Java representation
will be.

Not to duplicate the work of fetching (the tuple itself, and then
an attribute from the tuple) and constructing the Java result, an
instance will have an array of SwitchPointCache-managed "slots"
that will cache, lazily, the constructed results. Five of those
slots have their indices standardized right here in CatalogObjectImpl,
to account for the name, namespace, owner, and ACL of objects that
have those things. Slot 0 is for the tuple itself.

When an uncached value is requested, the "computation method" set up
for that slot will execute (always on the PG thread, so it can
interact with PostgreSQL with no extra ceremony). Most computation
methods will begin by calling cacheTuple() to obtain the tuple
itself from slot 0, and then will fetch the wanted attribute from it
and construct the result. The computation method for cacheTuple(),
in turn, will obtain the tuple if that hasn't happened yet, usually
from the PostgreSQL syscache. We copy it to a long-lived memory
context where we can keep it until its invalidation.

The most common way the cacheTuple is fetched is by a one-argument
syscache search by the object's Oid. When that is all that is needed,
the Java class need only implement cacheId() to return the number
of the PostgreSQL syscache to search in. For exceptional cases
(attributes, for example, require a two-argument syscache search),
a class should just provide its own cacheTuple computation method.

The slots for an object are associated with a Java SwitchPoint,
and the mapping from the object to its associated SwitchPoint
is a function supplied to the SwitchPointCache.Builder. Some
classes, such as RegClass and RegType, will allocate a SwitchPoint
per object, and can be selectively invalidated. Otherwise, by
default, the s_globalPoint declared here can be used, which will
invalidate all values of all slots depending on it.
They are the two CatalogObjects with tupleDescriptor() methods.

You can get strictly more tuple descriptors by asking RegType;
a RegType.Blessed can give you a tuple descriptor that has been
interned in the PostgreSQL typcache and corresponds to nothing
in the system catalogs. But whenever a RegType t is an ordinary
cataloged composite type or the row type of a cataloged relation,
then there is a RegClass c such that c == t.relation() and
t == c.type(), and you will get the same tuple descriptor from
the tupleDescriptor() method of either c or t.

In all but one such case, c delegates to c.type().tupleDescriptor()
and lets the RegType do the work, obtaining the descriptor from
the PG typcache.

The one exception is when the tuple descriptor for pg_class itself
is wanted, in which case the RegClass does the work, obtaining the
descriptor from the PG relcache, and RegType delegates to it for
that one exceptional case. The reason is that RegClass will see
the first request for the pg_class tuple descriptor, and before that
is available, c.type() can't be evaluated.

In either case, whichever class looked it up, a cataloged tuple
descriptor is always stored on the RegClass instance, and RegClass
will be responsible for its invalidation if the relation is altered.
(A RegType.Blessed has its own field for its tuple descriptor,
because there is no corresponding RegClass for one of those.)

Because of this close connection between RegClass and RegType,
the methods RegClass.type() and RegType.relation() use a handshake
protocol to ensure that, whenever either method is called, not only
does it cache the result, but its counterpart for that result instance
caches the reverse result, so the connection can later be traversed
in either direction with no need for a lookup by oid.

In the static initializer pattern introduced here, the handful of
SwitchPointCache slots that are predefined in CatalogObject.Addressed
are added to, by starting an int index at Addressed.NSLOTS,
incrementing it to initialize additional slot index constants, then
using its final value to define a new NSLOTS that shadows the original.
An Attribute is most often obtained from a TupleDescriptor
(in this API, that's how it's done), and the TupleDescriptor
can supply a version of Attribute's tuple directly; no need
to look it up anywhere else. That copy, however, cuts off
at ATTRIBUTE_FIXED_PART_SIZE bytes. The most commonly needed
attributes of Attribute are found there, but for others beyond
that cutoff, the full tuple has to be fetched from the syscache.

So AttributeImpl has the normal SLOT_TUPLE slot, used for the
rarely-needed full tuple, and also its own SLOT_PARTIALTUPLE,
for the truncated version obtained from the containing tuple
descriptor. Most computation methods will fetch from the partial
one, with the full one referred to only by the ones that need it.

It doesn't end there. A few critical Attribute properties, byValue,
alignment, length, and type/typmod, are needed to successfully fetch
values from a TupleTableSlotImpl.Heap. So Attribute cannot use that
API to fetch those values. For those, it must hardcode their actual
offsets and sizes in the raw ByteBuffer that the containing tuple
descriptor supplies, and fetch them directly. So there is also
a SLOT_RAWBUFFER.

This may sound more costly in space than it is. The raw buffer,
of course, is just a ByteBuffer sliced off and sharing the larger
one in the TupleDescriptor, and the partial tuple is just a
TupleTableSlot instance built over that. The full tuple is another
complete copy, but only fetched when those less-commonly-needed
attributes are requested.

With those key values obtained from the raw buffer, the Attribute's
name does not require any such contortions, and can be fetched using
the civilized TupleTableSlot API, except it can't be done by name,
so the attribute number is used for that one.

An AttributeImpl.Transient holds a direct reference to
the TupleDescriptor it came from, which its containingTupleDescriptor()
method returns. An AttributeImpl.Cataloged does not, and instead holds
a reference to the RegClass for which it is defined in the system
catalogs, and containingTupleDescriptor() delegates to tupleDescriptor()
on that. If the relation has been altered, that could return an updated
new tuple descriptor.
RegClass is an easy choice, because those invalidations are also
the invalidations of TupleDescriptors, and because it has a nice
API; we are passed the oid of the relation to invalidate, so we
acquire the target in O(1).

(Note in passing: AttributeImpl is built on SwitchPointCache in
the pattern that's emerged for CatalogObjects in general, and an
AttributeImpl.Cataloged uses the SwitchPoint of the RegClass, so
it's clear that all the attributes of the associated tuple
descriptor will do the right thing upon invalidation. In contrast,
TupleDescriptorImpl itself isn't quite built that way, and the
question of just how a TupleDescriptor itself should act after
invalidation hasn't been fully nailed down yet.)

RegType is probably also worth invalidating selectively, as is
probably RegProcedure (procedures are mainly what we're about
in PL/Java. right?), though only RegType is done here.

That API is less convenient; we are passed not the oid but a hash
of the oid, and not the hash that Java uses. The solution here is
brute force, to get an initial working implementation. There are
plenty of opportunities for optimization.

One idea would be to use a subclass of SwitchPoint that would set
a flag, or invoke a Runnable, the first time its guardWithTest
method is called. If that hasn't happened, there is nothing to
invalidate. The Runnable could add the containing object into some
data structure more easily searched by the supplied hash. Transitions
of the data structure between empty and not-empty could be propagated
to a boolean in native memory, where the C callback code could avoid
the Java upcall entirely if there is nothing to do. This commit
contains none of those optimizations.

Factory.invalidateType might be misnamed; it could be syscacheInvalidate
and take the syscache id as another parameter, and then dispatch to
invalidating a RegType or RegProcedure or what have you, as the case
may be.

At least, that would be a more concise implementation than providing
separate Java methods and having the C callback decide which to call.
But if some later optimization is tracking anything-to-invalidate?
separately for them, then the C code might be the efficient place
for the check to be done.

PostgreSQL has a limited number of slots for invalidation callbacks,
and requires a separate registration (using another slot) for each
syscache id for which callbacks are wanted (even though you get
the affected syscache id in the callback?!). It would be antisocial
to grab one for every sort of CatalogObject supported here, so we
will have many relying on CatalogObject.Addressed.s_globalPoint
and some strategy for zapping that every so often. That is not
included in this commit. (The globalPoint exists, but there is
not yet anything that ever zaps it.)

Some imperfect strategy that isn't guaranteed conservative might
be necessary, and might be tolerable (PL/Java has existed for years
with less attention to invalidation). An early idea was to zap the
globalPoint on every transaction or subtransaction boundary, or when
the command counter has been incremented; those are times when
PostgreSQL processes invalidations. However, invalidations are also
processed any time locks are acquired, and that doesn't sound as if
it would be practical to intercept (or as if the resulting behavior
would be practical, even if it could be done).

Another solution approach would just be to expose a zapGlobalPoint
knob as API; if some code wants to be sure it is not seeing something
stale (in any CatalogObject we aren't doing selective invalidation for),
it can just say so before fetching it.
By allowing occasional gaps in the otherwise-consecutive
IDX_... values, new constants can be added as needed, and
kept in coherent groupings, with a smaller blast radius in
version control (and fewer merge conflicts for other branches
or forks), by avoiding extensive renumbering of otherwise
untouched members.
Given support for gaps in the ModelConstants IDX_... values,
renumber and slightly regroup the constants, with an eye
toward reducing the blast radius of future additions when
needed.
In backporting, sometimes the git history shows that something
has always had the same type, but the type was plain int rather
than an explicit-width type. So, for such things, there is no
need for a plethora of SIZEOF_FOO constants, but SIZEOF_INT may
be generally useful to detect if a platform has a surprising
value for that width.
The name andIf could be misread as suggesting some kind of
boolean dependency on what went before, when really each
alsoIf only cares about its own predicate.
The legacy dispatcher has needed to look up the language in
the catalog anyway, and so will the new dispatcher, and the
only use formerly being made of the 'trusted' value gleaned
from the entry point, and assiduously passed along, was in an
assertion that it wasn't different from what we found in the
catalog. (That, and bifurcating the "save the first oid that
refers to PL/Java" logic into two cases, which were everywhere
used in the pattern "nothing saved in this one? ok try that one".)

There still are the paired entry points, they just don't do
anything different. Changing the SQL declarations can be for
another day.
A handful of functions to get information about the Function
pointed to by the current Invocation were all in Function.c,
but for one living all alone in Invocation.c.

Centralize them all in Function, to simplify adding a new
dispatcher that won't be using C Function structs for its
dispatched routines.
This commit introduces a ByteBuffer[] _window method in Backend
that can be used to window various miscellaneous PG globals that
don't obviously belong someplace else. (The technique is already
used in MemoryContextImpl and ResourceOwnerImpl, for example, for
several globals in obvious groupings, but there may as well just
be one place consolidating uses for less closely-related things.)

The first such global is check_function_bodies. There may be
several existing JNI downcalls from Backend that could use this
technique instead, as a future opportunity for tidying.
Add a few casts to CatalogObjectImpl in key places to avoid trying to
interoperate with outside implementations of CatalogObject interfaces,
and improve an exception message that was very preliminary.

The non-private-ness of one CatalogObjectImpl.Addressed constructor was
a crutch while there were subclass implementations partially
implemented.

Narrow an overbroad SuppressWarnings("unchecked").

Have CatalogObjectImpl.toString report the name of the API interface,
not the internal class.

Fix a thinko in AttributeImpl.Transient.equals, and watch for edge cases
in RegType/RegClass tupleDescriptor methods (for example, any regular
relation has a row type, but an index or toast relation does not).

TRIGGER will be a useful type to have around, and the equality of
SIZEOF_Oid and Integer.BYTES ought to be asserted at least somewhere.
In interfacing with the single-threaded PostgreSQL backend,
there are many uses for a class with the behavior of List but
that does not invite unintended parallelism through the stream API.
A Spliterator is allowed to never report that it can split, so
an AbstractNoSplitList simply returns such a spliterator.
The PostgreSQL catalogs can contain empty strings in some
contexts where a name might not be provided (for example, when
pg_proc.proargnames is nonnull because some parameters have names,
but not all of them do). So let Identifier.Simple.fromCatalog
(but not other methods) accept an empty string, returning
the None instance.
Adds a RegType.needsResolution() method, true for the various
PostgreSQL polymorphic pseudotypes, evaluated cheaply without
having to materialize anything from the catalog.
For a modern dispatcher to PL/Java-based languages, a consistent
representation for the number, names, and types of incoming
parameters and expected results will be wanted. TupleDescriptor
fits the bill. For incoming parameters, the catalogs identify
the names and types (no typmods), and this method will serve
to gin up an ephemeral TupleDescriptor based on those.

PostgreSQL already offers some funcapi methods for getting a
tuple descriptor for the expected outputs, but only when the
routine is really expected to return a composite. For functions
returning a non-composite PostgreSQL type, we will still want
to synthesize a one-attribute TupleDescriptor of that expected
type, so that Java code will always consistently produce a result
by storing something into a TupleTableSlot. (It will be the
dispatcher's job, then, in the non-composite case, to grab the
value from the one-column TupleTableSlot and return it.)
While adding T_Bitmapset to ModelConstants, add the other node tags
expected to crop up in routine invocation also.
This is the flavor of TupleTableSlot that will enable a routine
to access its incoming parameters (number, name, and type) using
the same API as for query results.
RegProcedure and ProceduralLanguage are two more object
classes we'd like to cache as persistently as practical,
so it is worth using two more callback slots to be able to
invalidate those selectively.

Renames Factory.invalidateType to syscacheInvalidate with a
cacheId parameter, as suggested in 5adf2c8. Many other
optimization opportunities suggested in that commit still left
on the table.

The precedent set back in 2e74a6b of final SwitchPoint[] foo
= new SwitchPoint[] { new SwitchPoint() }, and then ignored
in 5adf2c8, seems to be the right one after all. After the
initial construction, it is rather tidy that compute methods
run on the PG thread, invalidations come from the PG thread,
and SwitchPoints impose some order on who sees what when.
It's that initial instantiation, meant as a cheap 'whenever
mentioned' operation, where the final-field semantics are
of help. RegType therefore also fixed to match.

Also add tests in the CatalogObjects example for invalidation
of the four object classes now expected to support it.

TupleDescImpl now also has a notion of invalidation, which can only
happen to the Cataloged flavor, and happens upon invalidation of the
corresponding RegClass. Unlike catalog objects, a TupleDescriptor
after invalidation won't magically have updated values; it will just
throw an informative IllegalStateException.
The vague early idea that a RegProcedure ought to carry a memo still
seems useful, and just how it should be useful is becoming clearer.

There may be no need for user code to apply any memos; remove the
apply() method from the API for now. Also, while RegProcedure declares
the Memo interface, the subinterface PLJavaBased seems more at home in
ProceduralLanguage, where Handler, InlineHandler, and Validator already
are.

The two memo subtypes Validator and PLJavaBased will clearly be of use.
When a RegProcedure p is determined to be a validator, it can be given a
Validator memo with a direct reference to the ProceduralLanguage it is
the validator for (contrast p.language() which is the ProceduralLanguage
that p is implemented in, and in this design will have to be PL/Java's
handler 'language'). Likewise, a PLJavaBased memo can be attached to a
RegProcedure p when p is determined to be implemented in some PL/Java-
based language pl. The memo does not need to hold a reference to pl--
p.language() is perfectly suited to this case--but it can factor out
some complexity of invalidation, and also play a foreseeable API role,
with methods exposing PL/Java-specific information useful to a handler
implementation that goes beyond what every RegProcedure exposes from the
catalogs.

Because CatalogObjects are weakly cached, they can go away when you're
not looking, even without an invalidation notice from PostgreSQL. To be
useful for caching related information, objects that are of interest
ought to be kept live. A static set s_plJavaHandlers can serve as a
root, holding references to instances of the "PL/Java handler language"
(only one such instance is envisioned, but of course PL/Java supports
the idea of language aliases). Each "handler language" instance hl can
have a LanguageSet holding references to those procedural languages
whose validators are implemented in hl. And each of those language
instances can have a RoutineSet holding references to RegProcedure
instances dependent on that language. Thus are the RegProcedures and
ProceduralLanguages of interest kept live.

Those links have to be culled when invalidations happen. An invalidated
"handler language" should remove itself from s_plJavaHandlers and
propagate the invalidation to its dependent languages; an invalidated
ordinary language should remove itself from the dependent-languages set
of its associated "handler language" (for an ordinary language l, that
is l.validator().language()) and propagate the invalidation to its
dependent RegProcedures. An invalidated ordinary PL/Java-based
RegProcedure should remove itself from its ProceduralLanguage's
RoutineSet, while an invalidated RegProcedure that is a validator of
some language needs to invalidate that language.

The different invalidation behavior for a RegProcedure, determined by
whether it is a validator or an ordinary PL/Java-based routine, can be
handled neatly by delegating to whatever memo (Validator or PLJavaBased)
it carries. And of course every memo, on invalidation, removes itself
from its carrying RegProcedure.

To be clear, all of this is determined lazily: no RegProcedure is known
to be a PL/Java-based language or a validator until it is encountered in
that role while PL/Java is dispatching a call. The attaching of memos
and entering into sets happens then. The two actions should be regarded
as duals; the reference held in some set to a dependent object and the
memo on that object should both exist, or not.

ProceduralLanguageImpl adds methods isPLJavaBased and isPLJavaHandler
that a dispatcher will be able to to call when it needs to make the
(rather fiddly) sanity checks that a language instance is set up in the
expected way. These methods, when returning true, take care of adding
the language into the appropriate parent set (s_plJavaHandlers for a
handler language, or the parent handler language's LanguageSet for an
ordinary language). However, it will be the dispatcher's job, when these
methods return true, to make any links involving a RegProcedure and
attach an appropriate memo to it.

Here decreed in these isPLJavaBased / isPLJavaHandler methods is
that the C entry points for the new dispatcher will be named
pljavaDispatchRoutine, pljavaDispatchInline, and
pljavaDispatchValidator. A "PL/Java handler" language will be one
declared with no inline handler, and with both its call handler
and its validator handler pointed at (different SQL function
overloads with) the C entry point pljavaDispatchValidator.
So that TupleTableSlot may be used uniformly as the API for
Java <-> PostgreSQL data type conversions, let every type except
unmodified RECORD or VOID have a 'notional' TupleDescriptor.

For a cataloged or interned row type, or a domain over a cataloged row
type, it is that type's tupleDescriptor() (or that of the transitive
base type, in the case of a domain). Such a descriptor will be of type
TupleDescriptor.Interned. Otherwise, it is a TupleDescriptor.Ephemeral
whose one, unnamed, attribute has this type.

The idea is that every language handler will see a TupleTableSlot into
which a routine's results should be stored, even if just one column is
there. It will be up to the common dispatcher code to grok the specific
PostgreSQL rules, "a scalar gets returned, but OUT parameters make
a composite result, unless it's just one OUT parameter and that's treated
just like a scalar, but a polymorphic type later resolved to a one-column
composite isn't" and so on, and return what is in the TupleTableSlot to
PostgreSQL in the proper way.

This notional descriptor is not exposed in RegType API, but only on
RegTypeImpl for internal use. It will exposed through a method on
a RegProcedure's PLJavaBased memo to get the routine's outputsTemplate.
Template, because what is computed here depends only on catalog
information, and may include polymorphic types needing resolution at
actual call sites.
Using the memo on a RegProcedure<PLJavaBased>, get a TupleDescriptor
describing the incoming parameters, the notional one describing the
expected results, and (because these are 'templates' that may include
polymorphic types that need later resolution), a BitSet for each,
indicating at which positions type resolution is needed. An empty
BitSet indicates the template will be exactly the descriptor seen
later at call sites.

Internally, these methods live on RegProcedureImpl, relying on its
SwitchPointCache slots for caching. The memo implementation simply
delegates to those. Alternatively, the implementations could be
moved there.

Naturally, these methods will not be usable until a dispatcher is
implemented that can know a PLJavaBased RegProcedure when it sees
one, and attach this memo to it.

New interfaces on RegProcedure are Lookup (roughly corresponding to the
PostgreSQL per-call-site struct, FmgrInfo, usually passed around as
flinfo), and Call (like the PostgreSQL per-call struct usually passed
around as fcinfo and I'm not going to type that long struct name here).

Where the memo on a RegProcedure can give you 'template' tuple
descriptors based only on the catalog and possibly with some unresolved
types, on Lookup you find methods returning the same descriptors with
types all resolved according to the types at the call site. From Call,
of course, you can get the argument and result TupleTableSlots and fetch
the argument values (and store the results someday, when that direction
gets implemented for TupleTableSlot).

These interfaces, too, await a dispatcher that will supply instances of
them to your code.
Here is the interface PLJavaBasedLanguage, with its two subinterfaces
InlineBlocks and Routines (any PLJavaBasedLanguage must additionally
implement either one or both of those). PostgreSQL's CREATE LANGUAGE
allows the inline handler to be optional, while the handler for
routines is mandatory. PL/Java won't mind, though, if a language
only implements InlineBlocks; CREATE LANGUAGE will still have to
mention pljavaDispatchRoutine, but only inline blocks will really
be allowed.

A staged-programming idiom is the approach for Routines. The prepare
method is invoked passing only the target RegProcedure, and should
return a Template that depends only on information available from
that RegProcedure and its PLJavaBased memo. The Template will be
cached with the RegProcedure itself.

At a new call site, the Template's specialize method will be applied
to the call site's Lookup instance, where it can refer to
call-site-specific information like the fully resolved argument and
result types, returning a Routine to be cached for as long as PostgreSQL
has not freed that call site. (Naturally, when there is nothing
polymorphic and no need to specialize, the Routine can be constructed
all at prepare() time, and the Template can just unconditionally return
it.

For every call through a given call site, its cached Routine's call
method is applied to a Call instance that will supply the arguments
and (one day) accept the results.
The caching of a Lookup object at a PostgreSQL call site (in
flinfo->fn_extra) will call for a flavor of DualState that can
delete a JNI global reference when the call site's memory context
is reset or deleted.

This seems to be the first DualState flavor whose nativeStateReleased
method actually does anything, which requires a little refinement of
some only-when-assertions-enabled checking that had never had to
deal with that before.
STILL UNIMPLEMENTED AT THIS STAGE: anything to do with context
classloader management or access control contexts. A language's
trusted bit or name don't matter yet; the code will run with
whatever is granted to the intersection of its and PL/Java's
codebases without any Principal-based grants. Also,
Function_currentLoader, Function_currentTypeMap, and
Function_isCurrentReadOnly won't yet work when the call didn't
go through the legacy dispatcher, which may well cause errors
or crashes in attempting to use JDBC or SPI.

An internal-only field is added to ProceduralLanguageImpl's
RoutineSet as a place to memoize the implementing class instance,
and one to the PLJavaMemo as a place to memoize the Template
generated for a RegProcedure.

The Call methods context() and resultinfo() will return objects
if PostgreSQL supplied corresponding nodes, but those classes are,
so far, only stubs with no useful methods.

Call.result() is unimplemented for now and throws an exception.
A writable TupleTableSlot is still future work.

Call.isNull(boolean) works,  to determine whether the current
implementation will return null or void. For now, though, that's
unconditionally done by the dispatcher after the custom handler
returns, based on the type expected by PostgreSQL: null is
returned for any non-byValue type, to avoid an immediate crash,
and void is returned otherwise (some callers expecting void are
unexpectedly cranky if the void they get back is the null kind).
This corner of the implementation is still stopgap until writable
TupleTableSlot happens.
A comment on InlineBlocks.execute promises that PL/Java itself
will handle propagating the atomic/non-atomic status along to
the SPI layer, but that's still on the to-do list; it doesn't
happen yet.
Adds a new dispatcher to make possible multiple JVM languages
implemented atop PL/Java.
Serves me right for thinking to save a bit of time by putting
the new handler function declarations into an example for now,
and deferring the work of adding them to InstallHelper and
declaring a new schema version.

But of course a hardcoded library name in the example won't
pass CI when Mac and Windows spell library names differently.
So the work is still done in an example for now, but in a
plpgsql inline block not really any tidier than doing it in Java.
@jcflack
Copy link
Contributor Author

jcflack commented Oct 9, 2023

Dispatcher for multiple PLs implemented atop PL/Java

I had thought to continue ticking more of the other open-items boxes before doing the dispatcher, but for a change of scenery, here is the new dispatcher.

The first brand-new PL/Java-based procedural language is Glot64. It will probably never grow to rival Python or JavaScript in popularity, either because it can't do anything but write messages to standard output, or because you write your functions/procedures in base 64 :). So, here is a Glot64 function that writes Hello, world! on the server's standard output when called:

CREATE OR REPLACE FUNCTION javatest.hello()
 RETURNS void
 LANGUAGE glot64
AS 'SGVsbG8sIHdvcmxkIQo=';

The impatient may see Hello, world! immediately, using an inline code block:

DO LANGUAGE glot64 'SGVsbG8sIHdvcmxkIQo=';

The output won't be visible at all if the server's standard output is going to /dev/null or the like. But a test instance run in PL/Java's test harness, for example, will have its standard output going to the terminal.

In addition to the base-64-decoded source string, you will see other output from the glot64 language handler, which is really the point, for a demonstration example. The base-64 string is just for fun.

Glot64, like any PL/Java-based language, needs a language handler: namely, a class that implements the PLJavaBasedLanguage interface. Various methods on that interface are used for validating functions/procedures, compiling, specializing, and calling functions/procedures, and executing inline blocks (for a language that supports those).

After installing a jar containing the class that implements the language, use the name of that class to declare a validator function, using the language pljavahandler:

CREATE OR REPLACE FUNCTION javatest.glot64_validator(oid)
 RETURNS void
 LANGUAGE pljavahandler
AS 'org.postgresql.pljava.example.polyglot.Glot64'; -- class name

followed by CREATE LANGUAGE using that new function as the validator, along with PL/Java's existing routine and inline dispatcher functions as the other two handlers:

CREATE LANGUAGE glot64
 HANDLER sqlj.pljavaDispatchRoutine
 INLINE  sqlj.pljavaDispatchInline
 VALIDATOR javatest.glot64_validator;

Bear in mind that the very first still-unticked "open items" box at the top of this pull request is still:

The to-PostgreSQL direction for Adapter, TupleTableSlot, and Datum.Accessor.

and that's why no PL/Java-based function or procedure can return any results yet. That will be done by storing the result value (or values) into the Call.result() TupleTableSlot, and the store direction doesn't work yet. So that's why Glot64 is limited to writing messages on standard output.

On the other hand, fetching from a TupleTableSlot is indeed working already, so a language handler can fetch values from the Call.arguments() TupleTableSlot using whatever Adapter is appropriate to each argument's type. The Glot64 language ignores passed arguments, but that's not a necessary limitation.

Also, of course, all the other unticked boxes in that open-items list are still unticked, so plenty of work remains. But the dispatcher is here, and the PLJavaBasedLanguage interface, enough to begin experimenting with the development of language handlers for languages of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants