Work-in-Progress on PL/Java refactoring, API modernization #399

jcflack · 2022-01-24T02:20:51Z

As a work-in-progress pull request, this is not expected to be imminently merged, but is here to document the objectives and progress of the ongoing work.

Why needed

A great advantage promised by a PL based on the JVM is the large ecosystem of languages other than Java that can be supported on the same infrastructure, whether through the Java Scripting (JSR 223) API, or through the polyglot facilities of GraalVM, or simply via separate compilation to the class file format and loading as jars.

However, PL/Java, with its origins in 2004 predating most of those developments, has architectural limitations that stand in the way.

JDBC

One of the limitations is the centrality of the JDBC API. To be sure, it is a standard in the Java world for access to a database, and for PL/Java to conform to ISO SQL/JRT, the JDBC API must be available. But it is not necessarily a preferred or natural database API for other JVM or GraalVM languages, and its design goal is to abstract away from the specifics of an underlying database, which ends up complicating or even preventing access to advanced PostgreSQL capabilities that could be prime drivers for running server-side code in the first place.

The problem is not that JDBC is an available API in PL/Java, but that it is the fundamental API in PL/Java, with its tentacles reaching right into the native C language portion of PL/Java's implementation. That has made alternative interface options impractical, and multiplied the maintenance burden of even simple tasks like adding support for new datatype mappings or fixing simple bugs. There are significant portions of JDBC 4 that remain unimplemented in PL/Java.

Experience building an implementation of ISO SQL/XML XMLQUERY showed that certain requirements of the spec were simply unsatisfiable atop JDBC, either because of inherent JDBC limitations or limits in PL/Java's implementation of it. An example of each kind:

The INTERVAL data type cannot be mapped as SQL/XML requires, because the only ResultSetMetadata methods JDBC defines for access to a type modifier are precision and scale, which apply to numeric values; the API defines no standard way to learn what the modifier of an INTERVAL says about whether months or days are present.
The DECIMAL type cannot be mapped as SQL/XML requires; for that case, the fault is not with JDBC (which defines the precision and scale methods), but with their incomplete implementation in PL/Java.

Those cases also illustrate that mapping some PostgreSQL data types to those of another language can be complex. An arbitrary PostgreSQL INTERVAL is representable as neither a java.time.Period nor a java.time.Duration alone (though a pair of the two can be used, a type that PGJDBC-NG offers). One or the other can suffice if the type modifier is known and limits the fields present. A PostgreSQL NUMERIC value has not-a-number and signed infinity values that some candidate language-library type might not, and an internal precision that its text representation does not reveal, which might need to be preserved for a mathematically demanding task. The details of converting it to another language's similar type need to be knowable or controllable by an application.

It is a goal of this work to give PL/Java an API that does not obscure or abstract from PostgreSQL details, but makes them accessible in a natural Java idiom, and that such a "natural PostgreSQL" API should be adequate to allow building a JDBC layer in pure Java above it. (The work of building such a JDBC layer is not in the scope of this pull request.)

Parameter and return-value mapping

PL/Java uses a simple, Java-centric approach where a Java method is declared naturally, giving ordinary Java types for its parameters and return, and the mappings from these to the PostgreSQL parameter and return types are chosen by PL/Java and applied transparently (and much of that happens deep in PL/Java's C code).

While convenient, that approach isn't easily adapted to other JVM languages that may offer other selections of types. Even for Java, it stands in the way of doing certain things possible in PostgreSQL, like declaring VARIADIC "any" functions.

In a modernized API, it needs to be possible to declare a function whose parameter represents the PostgreSQL FunctionCallInfo, so that the parameters and their types can be examined and converted in Java. That will make it possible to write language handlers in Java, whether for other JVM languages or for the existing PL/Java calling conventions that at present are tangled in C.

Elements of new API

Identification of data types

A PostgreSQL-specific API must be able to refer unambiguously to any type known to the database, so it cannot rely on any fixed set of generic types such as JDBCType. To interoperate with a JDBC layer, though, the identifier for types should implement JDBC's SQLType interface.

The API should support retrieving enough metadata about the type for a JDBC layer implemented above it to be able to report complete ResultSetMetaData information.

The new class serving this purpose is RegType.

As RegType implements the java.sql.SQLType interface, an aliasing issue arises for a JDBC layer. Such a layer should accept JDBCType.VARCHAR as an alias for RegType.VARCHAR, for example. JDBC itself has no methods that return an SQLType instance, so the question of whether it should return the generic JDBC type or the true RegType does not arise. A PL/Java-specific API is needed for retrieving the type identifier in any case.

The details of which JDBC types are considered aliases of which RegTypes will naturally belong in a JDBC API layer. At the level of this underlying API, a RegType is what identifies a PostgreSQL type.

While RegType includes convenience final fields for a number of common types, those by no means limit the RegTypes available. There is a RegType that can be obtained for every type known to the database, whether built in, extension-supplied, or user-defined.

Mapping PostgreSQL data types to what a PL supports

The `Adapter` class

A mapping between a PostgreSQL data type and a suitable PL data type is an instance of the Adapter class, and more specifically of the reference-returning Adapter.As<T,U> or one of the primitive-returning Adapter.AsInt<U>, Adapter.AsFloat<U>, and so on (one for each Java primitive type). The Java type produced is T for the As case, and implicit in the class name for the AsFoo cases.

The basic method for fetching a value from a TupleTableSlot is get(Attribute att, Adapter adp), and naturally is overloaded and generic so that get with an As<T,?> adapter returns a T, get with an AsInt<?> adapter returns an int, and so on. (But see this later comment below for a better API than this item-at-a-time stuff.) (The U type parameter of an adapter plays a role when adapters are combined by composition, as discussed below, and is otherwise usually uninteresting to client code, which may wildcard it, as seen above.)

A manager class for adapters

Natural use of this idiom presumes there will be some adapter-manager API that allows client code to request an adapter for some PostgreSQL type by specifying a Java witness class Class<T> or some form of super type token, and returns the adapter with the expected compile-time parameterized type.

That manager hasn't been built yet, but the requirements are straightforward and no thorny bits are foreseen. (Within the org.postgresql.pljava.internal module itself, things are simpler; no manager is needed, and code refers directly to static final INSTANCE fields of existing adapters.)

Extensibility

PL/Java has historically supported user-defined types implemented in Java, a special class of data types whose Java representations must implement a certain JDBC interface and import and export values through a matching JDBC API. In contrast, PL/Java's first-class PostgreSQL data type support—the mappings it supplies between PostgreSQL and ordinary Java types that don't involve the specialized JDBC user-defined type APIs—has been hardcoded in C using Java Native Interface (JNI) calls, and not straightforward to extend. That's a pain point for several situations:

A mapping for another PostgreSQL data type (either a type newly added to PostgreSQL, or simply one that PL/Java does not yet have a mapping for) is not easily added for an application that needs it, but generally must be added in PL/Java's C/JNI internals and made available in a new PL/Java build.
A mapping of an existing PostgreSQL data type to a new or different Java type—same story. When Java 8 introduced the java.time package, developers wishing to have PL/Java map PostgreSQL's date and time types to the improved Java types instead of the older java.sql ones had to open issues requesting that ability and wait for a PL/Java release to include it.
Not every PostgreSQL data type has a single best PL type to be mapped to. One application using the geometric types might want them mapped to the Java types in the PGJDBC library, while another might prefer the 2D classes supplied by some Java geometry library. One application might want a PostgreSQL array mapped to a flat Java List, another to a multi-dimensioned Java array, another to a matrix class from a scientific computation library. The choices multiply when considering the data types not only of Java but of other JVM languages. C coding and rebuilding of PL/Java should not be needed to tailor these mappings.

Adapters implementable in pure Java

With this PR, code external to PL/Java's implementation can supply adapters, built against the service-provider API exposed in org.postgresql.pljava.adt.spi.

Leaf adapters

A "leaf" adapter is one that directly knows the PostgreSQL datum format of its data type, and maps that to a suitable PL type. Only a leaf adapter gets access to PostgreSQL datums, which it should not leak to other code. Code that defines leaf adapters must be granted a permission in pljava.policy.

Composing adapters

A composing, or non-leaf, adapter is one meant to be composed over another adapter. An example would be an adapter that composes over an adapter returning type T (possibly null) to form an adapter returning Optional<T>. With a selection of common composing adapters (there aren't any in this pull request, yet), it isn't necessary to provide leaf adapters covering all the ways application code might want data to be presented. No special permission is needed to create a composing adapter.

Java's generic types are erased to raw types for runtime, but the Java compiler saves the parameter information for runtime access through Java reflection. As adapters are composed, the Adapter class tracks the type relationships so that, for example, an Adapter<Optional<T>,T> composed over an Adapter<String,Void> is known to produce Optional<String>.

It is that information that will allow an adapter manager to satisfy a request to map a given PostgreSQL type to some PL type, by finding and composing available adapters.

Contract-based adapters

For a PostgreSQL data type that doesn't have one obvious best mapping to a PL type (perhaps because there are multiple choices with different advantages, or because there is no suitable type in the PL's base library, and any application will want the type mapped to something in a chosen third-party library), a contract-based adapter may be best. An Adapter.Contract is a functional interface with parameters that define the semantically-important components of the PostgreSQL type, and a generic return type, so an implementation can return any desired representation for the type.

A contract-based adapter is a leaf adapter class with a constructor that accepts a Contract, producing an adapter between the PostgreSQL type and whatever PL type the contract maps it to. The adapter encapsulates the internal details of how a PostgreSQL datum encodes the value, and the contract exposes the semantic details needed to faithfully map the type. Contracts for many existing PostgreSQL types are provided in the org.postgresql.pljava.adt package.

ArrayAdapter

The one supplied ArrayAdapter is contract-based. While a Contract.Array has a single abstract method, and therefore could serve as a functional interface, in practice it is not directly implementable by a lambda; there must be a subclass or subinterface (possibly anonymous) whose type parameterization the Java compiler can record. (A lambda may then be used to instantiate that.) An instance of ArrayAdapter is constructed by supplying an adapter for the array's element type along with an array contract targeting some kind of collection of the mapped type. As with a composing adapter, the Adapter class substitutes the element adapter's target Java type through the type parameters of the array contract, to arrive at the actual parameterized type of the resulting array or collection.

PostgreSQL arrays can be multidimensional, and are regular (not "jagged"; all sub-arrays at a given dimension match in size). They can have null elements, which are tracked in a bitmap, offering a simple way to save some space for arrays that are sparse; there are no other, more specialized sparse-array provisions.

Array indices need not be 0- or 1-based; the base index as well as the index range can be given independently for each dimension. PostgreSQL creates 1-based arrays by default. This information is stored with the array value, not with the array type, so a column declared with an array type could conceivably have values of different cardinalities or even dimensionalities.

The adapter is contract-based because there are many ways application code could want a PostgreSQL array to be presented: as a List or single Java array (flattening multiple dimensions, if present, to one, and disregarding the base index), as a Java array-of-arrays, as a JDBC Array object (which does not officially contemplate more than one array dimension, but PostgreSQL's JDBC drivers have used it to represent multidimensioned arrays), as the matrix type offered by some scientific computation library, and so on.

For now, one predefined contract is supplied, AsFlatList, and a static method, nullsIncludedCopy, that can be used (via method reference) as one implementation of that contract.

Java array-of-arrays

While perhaps not an extremely efficient way to represent multidimensional arrays, the Java array-of-arrays approach is familiar, and benefits from a bit of dedicated support for it in Adapter. Therefore, if you have an Adapter a that renders a PostgreSQL type Foo as Java type Bar, you can use, for example, a.a2().build() to obtain an Adapter from the PostgreSQL array type Foo[] to the Java type Bar[][], requiring the PostgreSQL array to have two dimensions, allowing each value to have different sizes along those dimensions, but disregarding the PostgreSQL array's start indices (all Java arrays start at 0).

Because PostgreSQL stores the dimension information with each value and does not enforce it for a column as a whole, it could be possible for a column of array values to include values with other numbers of dimensions, which an adapter constructed this way will reject. On the other hand, the sizes along each dimension are also allowed by PostgreSQL to vary from one value to the next, and this adapter accommodates that, as long as the number of dimensions doesn't change.

The existing contract-based ArrayAdapter is used behind the scenes, but build() takes care of generating the contract. Examples are provided.

Adapter maintainability

Providing pure-Java adapters that know the internal layouts of PostgreSQL data types, without relying on JNI calls and the PostgreSQL native support routines, entails a parallel-implementation maintenance responsibility roughly comparable to that of PostgreSQL client drivers that support binary send and receive. (The risk is slightly higher because the backend internal layouts are less committed than the send/receive representations. Because they are used for data on disk, though, historically they have not changed often or capriciously.)

The engineering judgment is that the resulting burden will be manageable, and the benefits in clarity and maintainability of the pure-Java implementations, compared to the brittle legacy Java+C+JNI approach, will predominate. The process of developing clear contracts for PostgreSQL types already has led to discovery of one bug (#390) that could be fixed in the legacy conversions.

For the adapters supplied in the org.postgresql.pljava.internal module, it is possible to use ModelConstants.java/ModelConstants.c to ensure that key constants (offsets, flags, etc.) stay synchronized with their counterparts in the PostgreSQL C code.

Adapter is a class in the API module, with the express intent that other adapters can be developed, and found by the adapter manager through a ServiceLoader API, without being internal to PL/Java. Those might not have the same opportunity for build-time checking against PostgreSQL header files, and will have to rely more heavily on regression tests for key data values, much as binary-supporting client drivers must. The same can be true even for PL/Java internal adapters for a few PostgreSQL data types whose C implementations are so strongly encapsulated (numeric comes to mind) that necessary layouts and constants do not appear in .h files.

Known open items

In no well-defined order ....

And then

Choose some interesting JVM language foo and implement a simple PL/foo in pure Java, using these facilities.
Reimplement PL/Java's own language handler the same way.

Tweak invocation.c so the stack-allocated space provided by the caller is used to save the prior state rather than to construct the new state. This way, the current state can have a fixed address (currentInvocation is a constant pointer) and can be covered by a single static ByteBuffer that Invocation.java can read/write through without relying on JNI methods. As Invocation isn't a JDBC-specific concept or class, it has never made much sense to have it in the .jdbc package. Move it to .internal.

Both values have just been stashed by stashCallContext. Both will be restored 14 lines later by _closeIteration. And nothing in those 14 lines cares about them.

After surveying the code for where function return values can be constructed, add one switchToUpperContext() around the construction of non-composite SRF return values, where it was missing, so such values can be returned correctly after SPI_finish(), and so the former, very hacky, cross-invocation retention of SPI contexts can be sent to pasture. For the record, these are the notes from that survey of the code: Function results, non-set-returning: Type_invoke: the inherited _Type_invoke calls ->coerceObject, within sTUC. sub"class"es that override it: Boolean,Byte,Double,Float,Integer,Long,Short,Void: - overridden in order to use appropriately-typed JNI invoke method - Double,Float,Long have _asDatum that does sTUC; . historical artifact; those types were !byval before PG 8.4 - the rest do not sTUC; should be ok, all byval Coerce: does sTUC Composite: does sTUC around _getTupleAndClear Arrays: createArrayType (extern, in Array.c) does sTUC. So far so good. What about !byval elements stored into the array? the non-primitive/any types don't override _Array_coerceObject, which is where Type_coerceObject on each element, and construct_md_array are called. With no sTUC. Around construct_md_array is really where it's needed. But then, _Array_coerceObject is still being called within sTUC of _Type_invoke. All good. Hmm: !byval elements of values[] are leaked when pfree(values) happens. They should be pfree'd unconditionally; construct_md_array copies them. What about UDTs? They don't override _Type_invoke. So they inherit the one that calls ->coerceObject, within sTUC. That ought to be enough. UDT.c's coerceScalarObject itself also sTUCs, inconsistently, for fixed-length and varlena types but not NUL-terminated. That should be ok, and merely redundant. In coerceTupleObject, no sTUC appears. Again, by inheritance of coerceObject, that should be ok. Absent that, sTUC around the SQLOutputToTuple_getTuple should be adequate; only if that could produce a tuple with TOAST pointers would it also be necessary around the HeapTupleGetDatum. Function results, set-returning: _datumFromSRF is applied to each row result The inherited _datumFromSRF calls Type_coerceObject, NOT within sTUC XXX this, at least, definitely needs a sTUC added. sub"class"es that override it: only Composite: calls _getTupleAndClear, NOT within sTUC. But it works out, just because TupleDesc.java's native _formTuple method uses JavaMemoryContext. Spooky action at a distance? Results from triggers: Function.c's invokeTrigger does sTUC around the getTriggerReturnTuple.

In passing, fix a long-standing thinko in Invocation_popInvocation: the memory context that was current on entry is stored in upperContext of *this* Invocation, but popInvocation was 'restoring' the one that was saved in the *previous* Invocation. Also in passing, move the cleanEnqueuedInstances step later in the pop sequence, improving its chance of seeing instances that could become unreachable through the release of SPI contexts or the JNI local frame.

This can reveal issues with the nesting of SPI 'connections' or management of their associated memory contexts.

Without the special treatment, the instance of the Java class Invocation, if any, that corresponds to the C Invocation, has its lifetime simply bounded to that of the C Invocation, rather than artificially extended across a sequence of SRF value-per-call invocations. It is simpler, does not break any existing tests, and is less likely to be violating PostgreSQL assumptions on correct behavior.

The commits merged here into this branch simplify PL/Java's management of the PostgreSQL-to-PL/Java-function invocation stack, and especially simplify the handling of SPI (PostgreSQL's Server Programming Interface) and set-returning functions. SPI includes "connect" and "finish" operations normally used in a simple pattern: connect before using SPI functions, finish when done and before returning to the caller, and if anything allocated while "connected" is to be returned to the caller, be sure to allocate that in the "upper executor" memory context (that is, the context that was current before SPI_connect). PL/Java has long diverged from that approach, especially for the case of set-returning functions using the value-per-call protocol (the only one PL/Java currently supports). If SPI was connected during one call in the sequence, PL/Java has sought to save and reuse that connection and its memory contexts over later calls (where a simpler, "by the book" implementation would simply SPI_connect and SPI_finish within the individual calls as needed). It never seemed altogether clear that was a good idea, but at the same time there weren't field reports of failure. It turns out, though, not hard to construct tests showing the apparent success was all luck. It has not been much trouble to reorganize that code so that SPI is used in the much simpler, by-the-book fashion. b2094ba fixes one place where a needed switchToUpperContext was missing but the error was masked by the former SPI juggling, and with that fixed, all the tests in the CI script promptly passed, with SPI used in the purely nested way that it expects. One other piece of complexity that has been removed was the handling of Java Invocation objects during set-returning functions. Although the stack-allocated C invocation struct naturally lasts only through one actual call, PL/Java's SRF code took pains to keep its Java counterpart alive, as if the one instance represented the entire sequence of actual calls while returning a set. Eliminating that behavior has simplified the code and shown no adverse effect in the available tests. As these are changes of some significance that might possibly alter some behavior not tested here, they have not been made in the 1.6 or 1.5 branches. But the simplification seems to make a less brittle base for the development going forward on this branch.

CacheMap is a generic class useful for (possibly weak or soft) canonicalizing caches of things that are identified by one or more primitive values. (Writing the key values into a ByteBuffer avoids the allocation involved in boxing them; however, the API as it currently stands might be exceeding that cost with instantiation of lambdas. It should eventually be profiled, and possibly revised into a less tidy, but more efficient, form.) SwitchPointCache is intended for lazily caching numerous values of diverse types, groups of which can be associated with a single SwitchPoint for purposes of invalidation. As currently structured, the SwitchPoints (and their dependent GuardWithTest nodes) do not get stored in static final fields; this may limit HotSpot's ability to optimize them as fully as it could if they did.

Adapter is the abstract ancestor of all classes that implement PostgreSQL datatypes for PL/Java, and the adt.spi package contains classes that will be of use to datatype-implementing code: in particular, Datum. PostgreSQL datums are only exposed to Adapters, and the Adapter's job is to reliably convert between the PostgreSQL type and some appropriate Java representation. For some datatypes, there is a single or obvious appropriate Java representation, and an Adapter may be provided that simply produces that. For other datatypes, there may be no single obvious choice of Java representation, either because there is no good match or because there are several; an application might want to map types to specialized classes available in some domain-specific library. To serve those cases, Adapters can be defined in terms of Adapter.Contract subinterfaces, which are simply functional interfaces that document and expose the semantic components of the PostgreSQL type. For example, a contract for PostgreSQL INTERVAL would expose a 64-bit microseconds component, a 32-bit day count, and a 32-bit month count. The division of responsibility is that the Adapter encapsulates how to extract those components given a PostgreSQL datum, but the contract fixes the semantics of what the components are. It is then simple to use the Adapter, with any lambda that conforms to the contract, to produce any desired Java representation of the type. Dummy versions of Attribute, RegClass, RegType, TupleDescriptor, and TupleTableSlot break ground here on the model package, which will consist of a set of classes modeling key PostgreSQL abstractions and a useful subset of the PostgreSQL system catalogs. RegType also implements java.sql.SQLType, making it usable in (a suitable implementation of) JDBC to specify PostgreSQL types precisely. adt.spi.AbstractType needs the specialization() method that was earlier added to internal.Function in anticipation of needing it someday.

The org.postgresql.pljava.adt package contains 'contracts' (subinterfaces of Adapter.Contract.Scalar or Adapter.Contract.Array), which are functional interfaces that document and expose the exact semantic components of PostgreSQL data types. Adapters are responsible for the internal details of PostgreSQL's representation that aren't semantically important, and code that simply needs to construct some semantically faithful representation of the type only needs to be concerned with the contract.

CharsetEncoding is not really a catalog object (the available encodings in PostgreSQL are hardcoded) but is exposed here as a similar kind of object with useful operations, including encoding and decoding using the corresponding Java codec when known. CatalogObject is, of course, the superinterface of all things that really are catalog objects (identified by a classId, an objectId, and rarely a subId). This commit brings in RegNamespace and RegRole as needed for CatalogObject.Namespaced and CatalogObject.Owned. RolePrincipal is a bridge between a RegRole and Java's Principal interface. CatalogObject.Factory is a service interface 'used' by the API module, and will be 'provided' by the internals module to supply the implementations of these things.

And convert other code to use CharsetEncoding.SERVER_ENCODING where earlier hacks were used, like the implServerCharset() added to Session in 1.5.1. In passing, fix a bit of overlooked java7ification in SQLXMLImpl. The new CharsetEncodings example provides two functions: SELECT * FROM javatest.charsets(); returns a table of the available PostgreSQL encodings, and what Java encodings they could be matched up with. SELECT * FROM javatest.java_charsets(try_aliases); returns the table of all available Java charsets and the PostgreSQL ones they could be matched up with, where the boolean try_aliases indicates whether to try Java's known aliases for a charset when nothing in PostgreSQL matched its canonical name. False matches happen when try_aliases is true, so that's not a great idea.

These PostgreSQL notions will have to be available to Java code for two reasons. First, even code that has no business poking at them can still need to know which one is current, to set an appropriate lifetime on a Java object that corresponds to something in PostgreSQL allocated in that context or registered to that owner. For that purpose, they both will be exposed as subtypes of Lifespan, and the existing PL/Java DualState class will be reworked to accept any Lifespan to bound the validity of the native state. Second, Adapter code could very well need to poke at such objects (MemoryContexts, anyway): either to make a selected one current for when allocating some object, or even to create and manage one. Methods for that will not be exposed on MemoryContext or ResourceOwner proper, but could be protected methods of Adapter, so that only an Adapter can use them.

In addition to MemoryContextImpl and ResourceOwnerImpl proper, this step will require reworking DualState so state lives are bounded by Lifespan instances instead of arbitrary pointer values. Invocation will be made into yet another subtype of Lifespan, appropriate for the life of an object passed by PostgreSQL in a call and presumed good while the call is in progress. The DualState change will have to be rototilled through all of its clients. That will take the next several commits. The DualState.Key requirement that was introduced in 1.5.1 as a way to force DualState-guarded objects to be constructed only in upcalls from C (as a hedge against Java code inadvertently doing it on the wrong thread) will go away. We *want* Adapters to be able to easily construct things without leaving Java. Just don't do it on the wrong thread.

Never very well publicized upstream, reading the examples of plpgsql, plperl, and plpython, when using BeginInternalSubTransaction, there is a certain pattern of saving and restoring the memory context and resource owner that PL/Java has not been doing. Now it is easy to implement that. https://www.postgresql.org/message-id/619EA06D.9070806%40anastigmatix.net

The current invocation can be the right Lifespan to specify for a DualState that's guarding some object PostgreSQL passed in to the call, which is expected to be good for as long as the call is in progress. In other, but related, news, Invocation can now return the "upper executor" memory context: that is, whatever context was current at entry, even if a later use of SPI changes the context that is current. It can appear tempting to eliminate the special treatment of PgSavepoint in Invocation, and simply make it another DualState client, but because of the strict nesting imposed on savepoints, keeping just the one reference to the first one set suffices, and is more efficient.

Simplify these: their C callers were passing unconditional null as the ResourceOwner before, which their Java constructors passed along unchanged. Now just have the Java constructor pass null as the Lifespan.

These DualState clients were previously passing the address of the current invocation struct as their "resource owner", again from the C code, passed along by the Java constructor. Again simplify to call Invocation.current() right in the Java constructor and use that as the Lifespan. On a side note, the legacy Relation class included here (and its legacy Tuple and TupleDesc) will naturally be among the first candidates for retirement when this new model API is ready.

This legacy Portal class is called from C and passed the address of the PostgreSQL ResourceOwner associated with the Portal itself.

This is only an intermediate refactoring of VarlenaWrapper. Construction of one is still set in motion from C. Ultimately, it should implement Datum and be something that a Datum.Accessor can construct with a minimum of fuss.

Originally a hedge against coding mistakes during the introduction of DualState for 1.5.1 (which had to support Java < 9), it is less necessary now that the internals are behind JPMS encapsulation, and the former checks for the cookie can be replaced with assertions that the action is happening on the right thread. The CI tests run with assertions enabled, so this should be adequate.

The commits grouped under this merge add API to expose in Java the PostgreSQL notions of MemoryContext and ResourceOwner, and then rework PL/Java's DualState class (which manages objects that combine some Java state and some native state, and may need specified actions to occur if the Java state becomes unreachable or explicitly released or if a lifespan bounding the native state expires). A DualState now accepts a Lifespan, of which MemoryContext and ResourceOwner are both subtypes. So is Invocation, an obvious lifespan for things PostgreSQL passes in that are expected to be valid for the duration of the call. The remaining commits in this group propagate the changes through the affected legacy code.

Fitting it into the new scheme is not entirely completed here; for example, newReadable takes a Datum.Input parameter, but still casts it internally to VarlenaWrapper.Input. Making it interoperate with any Datum.Input may be a bit more work. Likewise, newReadable with synthetic=true still encapsulates all the knowledge of what datatypes there is synthetic-XML coverage for and selecting the right VarlenaXMLRenderer for it (there's that varlena-specificity again!). More of that should be moved out of here and into an Adapter. In passing, fix a couple typos in toString() methods, and add a serviceable, if brute-force, getString() method to Synthetic. It would be better for SyntheticXMLReader to gain the ability to produce character-stream output efficiently, but until that happens, there needs to be something for those moments when you just want a string to look at and shouldn't have to fuss to get it. For now, VarlenaWrapper.Input and .Stream still extend, and add small features like toString(Object) to, DatumImpl. Later work can probably migrate those bits so VarlenaWrapper will only contain logic specific to varlenas. An adt.spi interface Verifier is added, though Datum doesn't yet expose any way to use it; in this commit, only one method accepting Verifier.OfStream is added in DatumImpl.Input.Stream, the minimal change needed to get things working.

As before, JNI methods for this 'model' framework continue to be grouped together in ModelUtils.c; their total number and complexity is expected to be low enough for that to be practical, and then they can all be seen in one place. RegClassImpl and RegTypeImpl acquire m_tupDescHolder arrays in this commit, without much explanation; that will come a few commits later.

There are two flavors so far, Deformed and Heap. Deformed works with whatever a real PostgreSQL TupleTableSlot can work with, relying on the PostgreSQL implementation to 'deform' it into separate datum and isnull arrays. (That doesn't have to be a PostgreSQL 'virtual' TupleTableSlot; it can do the deforming independently of the type of slot. When the time comes to implement the reverse direction and produce tuples, a virtual slot will be the way to go for that, using the PostgreSQL C code to 'form' it once populated.) The Heap flavor knows enough about that PostgreSQL tuple format to 'deform' it in Java without the JNI calls (except where some out-of-line value has to be mapped, or for varlena values until VarlenaWrapper sheds more of its remaining JNI-centricity). The Heap implementation does not yet do anything clever to memoize the offsets into the tuple, which makes the retrieval of all the tuple's values an O(n^2) proposition; there is a low-hanging-fruit optimization opportunity there. For now, it gets the job done. It might be interesting to see how the two flavors compare on typical heap tuples: Deformed, making more JNI calls but relying on PostgreSQL's fast native deforming, or Heap, which can avoid more JNI calls, and also avoids deforming something into a fresh native memory allocation if the only thing it will be used for is to immediately construct some Java object. The Heap flavor can do one thing the Deformed flavor definitely cannot: it can operate on heap-tuple-formatted contents of an arbitrary Java byte buffer, which in theory might not even be backed by native memory. (Again, for now, this is slightly science fiction where varlena values are concerned, because VarlenaWrapper retains a lot of its native dependencies. A ByteBuffer "heap tuple" with varlenas in it will have to be native-backed for now.) The selection of the DualState guard by heapTupleGetLightSlot() is currently more hardcoded than that would suggest; it assumes the buffer is mapping memory that can be heap_free_tuple'd. The 'light' in heapTupleGetLightSlot really means that there isn't an underlying PostgreSQL TupleTableSlot constructed. The whole business of how to apply and use DualState guards on these things still needs more attention. There is also Heap.Indexed, which is the thing needed for arrays. When the element type is fixed-length, it achieves O(1) access (plus null-bitmap processing if there are nulls). It uses a "count preceding null bits ahead of time" strategy that could also easily be adopted in Heap. A NullableDatum flavor is also needed, which would be the thing for mapping (as one prominent example) function-call arguments. The HeapTuples8 and HeapTuples4 classes at the end are scaffolding and ought to be factored out into something with a decent API, as hinted at in the comment preceding them. A Heap instance still inherits the values/nulls array fields used in the deformed case, without (at present) making any use of them. It is possible some use could be made (as, again, an underlying PG TupleTableSlot could be used in deforming a heap tuple), but it's also possible that won't ever be needed, and the class could be refactored to a simpler form.

Here's how this is going to work. The "exists because mentioned" aspect of a CatalogObject is a lightweight operation, just caching/returning a singleton with the mentioned values of classId/objId/(subId?). For a bare CatalogObject (objId unaccompanied by classId), that's all there is. But for any CatalogObject.Addressed subtype, the classId and objId together identify a tuple in a particular system catalog (or, that is, identify a tuple that could exist in that catalog). And the methods on the Java class that return information about the object get the information by fetching attributes from that tuple, then constructing whatever the Java representation will be. Not to duplicate the work of fetching (the tuple itself, and then an attribute from the tuple) and constructing the Java result, an instance will have an array of SwitchPointCache-managed "slots" that will cache, lazily, the constructed results. Five of those slots have their indices standardized right here in CatalogObjectImpl, to account for the name, namespace, owner, and ACL of objects that have those things. Slot 0 is for the tuple itself. When an uncached value is requested, the "computation method" set up for that slot will execute (always on the PG thread, so it can interact with PostgreSQL with no extra ceremony). Most computation methods will begin by calling cacheTuple() to obtain the tuple itself from slot 0, and then will fetch the wanted attribute from it and construct the result. The computation method for cacheTuple(), in turn, will obtain the tuple if that hasn't happened yet, usually from the PostgreSQL syscache. We copy it to a long-lived memory context where we can keep it until its invalidation. The most common way the cacheTuple is fetched is by a one-argument syscache search by the object's Oid. When that is all that is needed, the Java class need only implement cacheId() to return the number of the PostgreSQL syscache to search in. For exceptional cases (attributes, for example, require a two-argument syscache search), a class should just provide its own cacheTuple computation method. The slots for an object are associated with a Java SwitchPoint, and the mapping from the object to its associated SwitchPoint is a function supplied to the SwitchPointCache.Builder. Some classes, such as RegClass and RegType, will allocate a SwitchPoint per object, and can be selectively invalidated. Otherwise, by default, the s_globalPoint declared here can be used, which will invalidate all values of all slots depending on it.

They are the two CatalogObjects with tupleDescriptor() methods. You can get strictly more tuple descriptors by asking RegType; a RegType.Blessed can give you a tuple descriptor that has been interned in the PostgreSQL typcache and corresponds to nothing in the system catalogs. But whenever a RegType t is an ordinary cataloged composite type or the row type of a cataloged relation, then there is a RegClass c such that c == t.relation() and t == c.type(), and you will get the same tuple descriptor from the tupleDescriptor() method of either c or t. In all but one such case, c delegates to c.type().tupleDescriptor() and lets the RegType do the work, obtaining the descriptor from the PG typcache. The one exception is when the tuple descriptor for pg_class itself is wanted, in which case the RegClass does the work, obtaining the descriptor from the PG relcache, and RegType delegates to it for that one exceptional case. The reason is that RegClass will see the first request for the pg_class tuple descriptor, and before that is available, c.type() can't be evaluated. In either case, whichever class looked it up, a cataloged tuple descriptor is always stored on the RegClass instance, and RegClass will be responsible for its invalidation if the relation is altered. (A RegType.Blessed has its own field for its tuple descriptor, because there is no corresponding RegClass for one of those.) Because of this close connection between RegClass and RegType, the methods RegClass.type() and RegType.relation() use a handshake protocol to ensure that, whenever either method is called, not only does it cache the result, but its counterpart for that result instance caches the reverse result, so the connection can later be traversed in either direction with no need for a lookup by oid. In the static initializer pattern introduced here, the handful of SwitchPointCache slots that are predefined in CatalogObject.Addressed are added to, by starting an int index at Addressed.NSLOTS, incrementing it to initialize additional slot index constants, then using its final value to define a new NSLOTS that shadows the original.

An Attribute is most often obtained from a TupleDescriptor (in this API, that's how it's done), and the TupleDescriptor can supply a version of Attribute's tuple directly; no need to look it up anywhere else. That copy, however, cuts off at ATTRIBUTE_FIXED_PART_SIZE bytes. The most commonly needed attributes of Attribute are found there, but for others beyond that cutoff, the full tuple has to be fetched from the syscache. So AttributeImpl has the normal SLOT_TUPLE slot, used for the rarely-needed full tuple, and also its own SLOT_PARTIALTUPLE, for the truncated version obtained from the containing tuple descriptor. Most computation methods will fetch from the partial one, with the full one referred to only by the ones that need it. It doesn't end there. A few critical Attribute properties, byValue, alignment, length, and type/typmod, are needed to successfully fetch values from a TupleTableSlotImpl.Heap. So Attribute cannot use that API to fetch those values. For those, it must hardcode their actual offsets and sizes in the raw ByteBuffer that the containing tuple descriptor supplies, and fetch them directly. So there is also a SLOT_RAWBUFFER. This may sound more costly in space than it is. The raw buffer, of course, is just a ByteBuffer sliced off and sharing the larger one in the TupleDescriptor, and the partial tuple is just a TupleTableSlot instance built over that. The full tuple is another complete copy, but only fetched when those less-commonly-needed attributes are requested. With those key values obtained from the raw buffer, the Attribute's name does not require any such contortions, and can be fetched using the civilized TupleTableSlot API, except it can't be done by name, so the attribute number is used for that one. An AttributeImpl.Transient holds a direct reference to the TupleDescriptor it came from, which its containingTupleDescriptor() method returns. An AttributeImpl.Cataloged does not, and instead holds a reference to the RegClass for which it is defined in the system catalogs, and containingTupleDescriptor() delegates to tupleDescriptor() on that. If the relation has been altered, that could return an updated new tuple descriptor.

RegClass is an easy choice, because those invalidations are also the invalidations of TupleDescriptors, and because it has a nice API; we are passed the oid of the relation to invalidate, so we acquire the target in O(1). (Note in passing: AttributeImpl is built on SwitchPointCache in the pattern that's emerged for CatalogObjects in general, and an AttributeImpl.Cataloged uses the SwitchPoint of the RegClass, so it's clear that all the attributes of the associated tuple descriptor will do the right thing upon invalidation. In contrast, TupleDescriptorImpl itself isn't quite built that way, and the question of just how a TupleDescriptor itself should act after invalidation hasn't been fully nailed down yet.) RegType is probably also worth invalidating selectively, as is probably RegProcedure (procedures are mainly what we're about in PL/Java. right?), though only RegType is done here. That API is less convenient; we are passed not the oid but a hash of the oid, and not the hash that Java uses. The solution here is brute force, to get an initial working implementation. There are plenty of opportunities for optimization. One idea would be to use a subclass of SwitchPoint that would set a flag, or invoke a Runnable, the first time its guardWithTest method is called. If that hasn't happened, there is nothing to invalidate. The Runnable could add the containing object into some data structure more easily searched by the supplied hash. Transitions of the data structure between empty and not-empty could be propagated to a boolean in native memory, where the C callback code could avoid the Java upcall entirely if there is nothing to do. This commit contains none of those optimizations. Factory.invalidateType might be misnamed; it could be syscacheInvalidate and take the syscache id as another parameter, and then dispatch to invalidating a RegType or RegProcedure or what have you, as the case may be. At least, that would be a more concise implementation than providing separate Java methods and having the C callback decide which to call. But if some later optimization is tracking anything-to-invalidate? separately for them, then the C code might be the efficient place for the check to be done. PostgreSQL has a limited number of slots for invalidation callbacks, and requires a separate registration (using another slot) for each syscache id for which callbacks are wanted (even though you get the affected syscache id in the callback?!). It would be antisocial to grab one for every sort of CatalogObject supported here, so we will have many relying on CatalogObject.Addressed.s_globalPoint and some strategy for zapping that every so often. That is not included in this commit. (The globalPoint exists, but there is not yet anything that ever zaps it.) Some imperfect strategy that isn't guaranteed conservative might be necessary, and might be tolerable (PL/Java has existed for years with less attention to invalidation). An early idea was to zap the globalPoint on every transaction or subtransaction boundary, or when the command counter has been incremented; those are times when PostgreSQL processes invalidations. However, invalidations are also processed any time locks are acquired, and that doesn't sound as if it would be practical to intercept (or as if the resulting behavior would be practical, even if it could be done). Another solution approach would just be to expose a zapGlobalPoint knob as API; if some code wants to be sure it is not seeing something stale (in any CatalogObject we aren't doing selective invalidation for), it can just say so before fetching it.

introduced in 45f965c.

By allowing occasional gaps in the otherwise-consecutive IDX_... values, new constants can be added as needed, and kept in coherent groupings, with a smaller blast radius in version control (and fewer merge conflicts for other branches or forks), by avoiding extensive renumbering of otherwise untouched members.

Given support for gaps in the ModelConstants IDX_... values, renumber and slightly regroup the constants, with an eye toward reducing the blast radius of future additions when needed.

In backporting, sometimes the git history shows that something has always had the same type, but the type was plain int rather than an explicit-width type. So, for such things, there is no need for a plethora of SIZEOF_FOO constants, but SIZEOF_INT may be generally useful to detect if a platform has a surprising value for that width.

The name andIf could be misread as suggesting some kind of boolean dependency on what went before, when really each alsoIf only cares about its own predicate.

The legacy dispatcher has needed to look up the language in the catalog anyway, and so will the new dispatcher, and the only use formerly being made of the 'trusted' value gleaned from the entry point, and assiduously passed along, was in an assertion that it wasn't different from what we found in the catalog. (That, and bifurcating the "save the first oid that refers to PL/Java" logic into two cases, which were everywhere used in the pattern "nothing saved in this one? ok try that one".) There still are the paired entry points, they just don't do anything different. Changing the SQL declarations can be for another day.

A handful of functions to get information about the Function pointed to by the current Invocation were all in Function.c, but for one living all alone in Invocation.c. Centralize them all in Function, to simplify adding a new dispatcher that won't be using C Function structs for its dispatched routines.

This commit introduces a ByteBuffer[] _window method in Backend that can be used to window various miscellaneous PG globals that don't obviously belong someplace else. (The technique is already used in MemoryContextImpl and ResourceOwnerImpl, for example, for several globals in obvious groupings, but there may as well just be one place consolidating uses for less closely-related things.) The first such global is check_function_bodies. There may be several existing JNI downcalls from Backend that could use this technique instead, as a future opportunity for tidying.

Add a few casts to CatalogObjectImpl in key places to avoid trying to interoperate with outside implementations of CatalogObject interfaces, and improve an exception message that was very preliminary. The non-private-ness of one CatalogObjectImpl.Addressed constructor was a crutch while there were subclass implementations partially implemented. Narrow an overbroad SuppressWarnings("unchecked"). Have CatalogObjectImpl.toString report the name of the API interface, not the internal class. Fix a thinko in AttributeImpl.Transient.equals, and watch for edge cases in RegType/RegClass tupleDescriptor methods (for example, any regular relation has a row type, but an index or toast relation does not). TRIGGER will be a useful type to have around, and the equality of SIZEOF_Oid and Integer.BYTES ought to be asserted at least somewhere.

In interfacing with the single-threaded PostgreSQL backend, there are many uses for a class with the behavior of List but that does not invite unintended parallelism through the stream API. A Spliterator is allowed to never report that it can split, so an AbstractNoSplitList simply returns such a spliterator.

The PostgreSQL catalogs can contain empty strings in some contexts where a name might not be provided (for example, when pg_proc.proargnames is nonnull because some parameters have names, but not all of them do). So let Identifier.Simple.fromCatalog (but not other methods) accept an empty string, returning the None instance.

Adds a RegType.needsResolution() method, true for the various PostgreSQL polymorphic pseudotypes, evaluated cheaply without having to materialize anything from the catalog.

For a modern dispatcher to PL/Java-based languages, a consistent representation for the number, names, and types of incoming parameters and expected results will be wanted. TupleDescriptor fits the bill. For incoming parameters, the catalogs identify the names and types (no typmods), and this method will serve to gin up an ephemeral TupleDescriptor based on those. PostgreSQL already offers some funcapi methods for getting a tuple descriptor for the expected outputs, but only when the routine is really expected to return a composite. For functions returning a non-composite PostgreSQL type, we will still want to synthesize a one-attribute TupleDescriptor of that expected type, so that Java code will always consistently produce a result by storing something into a TupleTableSlot. (It will be the dispatcher's job, then, in the non-composite case, to grab the value from the one-column TupleTableSlot and return it.)

While adding T_Bitmapset to ModelConstants, add the other node tags expected to crop up in routine invocation also.

This is the flavor of TupleTableSlot that will enable a routine to access its incoming parameters (number, name, and type) using the same API as for query results.

RegProcedure and ProceduralLanguage are two more object classes we'd like to cache as persistently as practical, so it is worth using two more callback slots to be able to invalidate those selectively. Renames Factory.invalidateType to syscacheInvalidate with a cacheId parameter, as suggested in 5adf2c8. Many other optimization opportunities suggested in that commit still left on the table. The precedent set back in 2e74a6b of final SwitchPoint[] foo = new SwitchPoint[] { new SwitchPoint() }, and then ignored in 5adf2c8, seems to be the right one after all. After the initial construction, it is rather tidy that compute methods run on the PG thread, invalidations come from the PG thread, and SwitchPoints impose some order on who sees what when. It's that initial instantiation, meant as a cheap 'whenever mentioned' operation, where the final-field semantics are of help. RegType therefore also fixed to match. Also add tests in the CatalogObjects example for invalidation of the four object classes now expected to support it. TupleDescImpl now also has a notion of invalidation, which can only happen to the Cataloged flavor, and happens upon invalidation of the corresponding RegClass. Unlike catalog objects, a TupleDescriptor after invalidation won't magically have updated values; it will just throw an informative IllegalStateException.

The vague early idea that a RegProcedure ought to carry a memo still seems useful, and just how it should be useful is becoming clearer. There may be no need for user code to apply any memos; remove the apply() method from the API for now. Also, while RegProcedure declares the Memo interface, the subinterface PLJavaBased seems more at home in ProceduralLanguage, where Handler, InlineHandler, and Validator already are. The two memo subtypes Validator and PLJavaBased will clearly be of use. When a RegProcedure p is determined to be a validator, it can be given a Validator memo with a direct reference to the ProceduralLanguage it is the validator for (contrast p.language() which is the ProceduralLanguage that p is implemented in, and in this design will have to be PL/Java's handler 'language'). Likewise, a PLJavaBased memo can be attached to a RegProcedure p when p is determined to be implemented in some PL/Java- based language pl. The memo does not need to hold a reference to pl-- p.language() is perfectly suited to this case--but it can factor out some complexity of invalidation, and also play a foreseeable API role, with methods exposing PL/Java-specific information useful to a handler implementation that goes beyond what every RegProcedure exposes from the catalogs. Because CatalogObjects are weakly cached, they can go away when you're not looking, even without an invalidation notice from PostgreSQL. To be useful for caching related information, objects that are of interest ought to be kept live. A static set s_plJavaHandlers can serve as a root, holding references to instances of the "PL/Java handler language" (only one such instance is envisioned, but of course PL/Java supports the idea of language aliases). Each "handler language" instance hl can have a LanguageSet holding references to those procedural languages whose validators are implemented in hl. And each of those language instances can have a RoutineSet holding references to RegProcedure instances dependent on that language. Thus are the RegProcedures and ProceduralLanguages of interest kept live. Those links have to be culled when invalidations happen. An invalidated "handler language" should remove itself from s_plJavaHandlers and propagate the invalidation to its dependent languages; an invalidated ordinary language should remove itself from the dependent-languages set of its associated "handler language" (for an ordinary language l, that is l.validator().language()) and propagate the invalidation to its dependent RegProcedures. An invalidated ordinary PL/Java-based RegProcedure should remove itself from its ProceduralLanguage's RoutineSet, while an invalidated RegProcedure that is a validator of some language needs to invalidate that language. The different invalidation behavior for a RegProcedure, determined by whether it is a validator or an ordinary PL/Java-based routine, can be handled neatly by delegating to whatever memo (Validator or PLJavaBased) it carries. And of course every memo, on invalidation, removes itself from its carrying RegProcedure. To be clear, all of this is determined lazily: no RegProcedure is known to be a PL/Java-based language or a validator until it is encountered in that role while PL/Java is dispatching a call. The attaching of memos and entering into sets happens then. The two actions should be regarded as duals; the reference held in some set to a dependent object and the memo on that object should both exist, or not. ProceduralLanguageImpl adds methods isPLJavaBased and isPLJavaHandler that a dispatcher will be able to to call when it needs to make the (rather fiddly) sanity checks that a language instance is set up in the expected way. These methods, when returning true, take care of adding the language into the appropriate parent set (s_plJavaHandlers for a handler language, or the parent handler language's LanguageSet for an ordinary language). However, it will be the dispatcher's job, when these methods return true, to make any links involving a RegProcedure and attach an appropriate memo to it. Here decreed in these isPLJavaBased / isPLJavaHandler methods is that the C entry points for the new dispatcher will be named pljavaDispatchRoutine, pljavaDispatchInline, and pljavaDispatchValidator. A "PL/Java handler" language will be one declared with no inline handler, and with both its call handler and its validator handler pointed at (different SQL function overloads with) the C entry point pljavaDispatchValidator.

So that TupleTableSlot may be used uniformly as the API for Java <-> PostgreSQL data type conversions, let every type except unmodified RECORD or VOID have a 'notional' TupleDescriptor. For a cataloged or interned row type, or a domain over a cataloged row type, it is that type's tupleDescriptor() (or that of the transitive base type, in the case of a domain). Such a descriptor will be of type TupleDescriptor.Interned. Otherwise, it is a TupleDescriptor.Ephemeral whose one, unnamed, attribute has this type. The idea is that every language handler will see a TupleTableSlot into which a routine's results should be stored, even if just one column is there. It will be up to the common dispatcher code to grok the specific PostgreSQL rules, "a scalar gets returned, but OUT parameters make a composite result, unless it's just one OUT parameter and that's treated just like a scalar, but a polymorphic type later resolved to a one-column composite isn't" and so on, and return what is in the TupleTableSlot to PostgreSQL in the proper way. This notional descriptor is not exposed in RegType API, but only on RegTypeImpl for internal use. It will exposed through a method on a RegProcedure's PLJavaBased memo to get the routine's outputsTemplate. Template, because what is computed here depends only on catalog information, and may include polymorphic types needing resolution at actual call sites.

Using the memo on a RegProcedure<PLJavaBased>, get a TupleDescriptor describing the incoming parameters, the notional one describing the expected results, and (because these are 'templates' that may include polymorphic types that need later resolution), a BitSet for each, indicating at which positions type resolution is needed. An empty BitSet indicates the template will be exactly the descriptor seen later at call sites. Internally, these methods live on RegProcedureImpl, relying on its SwitchPointCache slots for caching. The memo implementation simply delegates to those. Alternatively, the implementations could be moved there. Naturally, these methods will not be usable until a dispatcher is implemented that can know a PLJavaBased RegProcedure when it sees one, and attach this memo to it. New interfaces on RegProcedure are Lookup (roughly corresponding to the PostgreSQL per-call-site struct, FmgrInfo, usually passed around as flinfo), and Call (like the PostgreSQL per-call struct usually passed around as fcinfo and I'm not going to type that long struct name here). Where the memo on a RegProcedure can give you 'template' tuple descriptors based only on the catalog and possibly with some unresolved types, on Lookup you find methods returning the same descriptors with types all resolved according to the types at the call site. From Call, of course, you can get the argument and result TupleTableSlots and fetch the argument values (and store the results someday, when that direction gets implemented for TupleTableSlot). These interfaces, too, await a dispatcher that will supply instances of them to your code.

Here is the interface PLJavaBasedLanguage, with its two subinterfaces InlineBlocks and Routines (any PLJavaBasedLanguage must additionally implement either one or both of those). PostgreSQL's CREATE LANGUAGE allows the inline handler to be optional, while the handler for routines is mandatory. PL/Java won't mind, though, if a language only implements InlineBlocks; CREATE LANGUAGE will still have to mention pljavaDispatchRoutine, but only inline blocks will really be allowed. A staged-programming idiom is the approach for Routines. The prepare method is invoked passing only the target RegProcedure, and should return a Template that depends only on information available from that RegProcedure and its PLJavaBased memo. The Template will be cached with the RegProcedure itself. At a new call site, the Template's specialize method will be applied to the call site's Lookup instance, where it can refer to call-site-specific information like the fully resolved argument and result types, returning a Routine to be cached for as long as PostgreSQL has not freed that call site. (Naturally, when there is nothing polymorphic and no need to specialize, the Routine can be constructed all at prepare() time, and the Template can just unconditionally return it. For every call through a given call site, its cached Routine's call method is applied to a Call instance that will supply the arguments and (one day) accept the results.

The caching of a Lookup object at a PostgreSQL call site (in flinfo->fn_extra) will call for a flavor of DualState that can delete a JNI global reference when the call site's memory context is reset or deleted. This seems to be the first DualState flavor whose nativeStateReleased method actually does anything, which requires a little refinement of some only-when-assertions-enabled checking that had never had to deal with that before.

STILL UNIMPLEMENTED AT THIS STAGE: anything to do with context classloader management or access control contexts. A language's trusted bit or name don't matter yet; the code will run with whatever is granted to the intersection of its and PL/Java's codebases without any Principal-based grants. Also, Function_currentLoader, Function_currentTypeMap, and Function_isCurrentReadOnly won't yet work when the call didn't go through the legacy dispatcher, which may well cause errors or crashes in attempting to use JDBC or SPI. An internal-only field is added to ProceduralLanguageImpl's RoutineSet as a place to memoize the implementing class instance, and one to the PLJavaMemo as a place to memoize the Template generated for a RegProcedure. The Call methods context() and resultinfo() will return objects if PostgreSQL supplied corresponding nodes, but those classes are, so far, only stubs with no useful methods. Call.result() is unimplemented for now and throws an exception. A writable TupleTableSlot is still future work. Call.isNull(boolean) works, to determine whether the current implementation will return null or void. For now, though, that's unconditionally done by the dispatcher after the custom handler returns, based on the type expected by PostgreSQL: null is returned for any non-byValue type, to avoid an immediate crash, and void is returned otherwise (some callers expecting void are unexpectedly cranky if the void they get back is the null kind). This corner of the implementation is still stopgap until writable TupleTableSlot happens.

A comment on InlineBlocks.execute promises that PL/Java itself will handle propagating the atomic/non-atomic status along to the SPI layer, but that's still on the to-do list; it doesn't happen yet.

Adds a new dispatcher to make possible multiple JVM languages implemented atop PL/Java.

Serves me right for thinking to save a bit of time by putting the new handler function declarations into an example for now, and deferring the work of adding them to InstallHelper and declaring a new schema version. But of course a hardcoded library name in the example won't pass CI when Mac and Windows spell library names differently. So the work is still done in an example for now, but in a plpgsql inline block not really any tidier than doing it in Java.

jcflack · 2023-10-09T23:18:47Z

Dispatcher for multiple PLs implemented atop PL/Java

I had thought to continue ticking more of the other open-items boxes before doing the dispatcher, but for a change of scenery, here is the new dispatcher.

The first brand-new PL/Java-based procedural language is Glot64. It will probably never grow to rival Python or JavaScript in popularity, either because it can't do anything but write messages to standard output, or because you write your functions/procedures in base 64 :). So, here is a Glot64 function that writes Hello, world! on the server's standard output when called:

CREATE OR REPLACE FUNCTION javatest.hello()
 RETURNS void
 LANGUAGE glot64
AS 'SGVsbG8sIHdvcmxkIQo=';

The impatient may see Hello, world! immediately, using an inline code block:

DO LANGUAGE glot64 'SGVsbG8sIHdvcmxkIQo=';

The output won't be visible at all if the server's standard output is going to /dev/null or the like. But a test instance run in PL/Java's test harness, for example, will have its standard output going to the terminal.

In addition to the base-64-decoded source string, you will see other output from the glot64 language handler, which is really the point, for a demonstration example. The base-64 string is just for fun.

Glot64, like any PL/Java-based language, needs a language handler: namely, a class that implements the PLJavaBasedLanguage interface. Various methods on that interface are used for validating functions/procedures, compiling, specializing, and calling functions/procedures, and executing inline blocks (for a language that supports those).

After installing a jar containing the class that implements the language, use the name of that class to declare a validator function, using the language pljavahandler:

CREATE OR REPLACE FUNCTION javatest.glot64_validator(oid)
 RETURNS void
 LANGUAGE pljavahandler
AS 'org.postgresql.pljava.example.polyglot.Glot64'; -- class name

followed by CREATE LANGUAGE using that new function as the validator, along with PL/Java's existing routine and inline dispatcher functions as the other two handlers:

CREATE LANGUAGE glot64
 HANDLER sqlj.pljavaDispatchRoutine
 INLINE  sqlj.pljavaDispatchInline
 VALIDATOR javatest.glot64_validator;

Bear in mind that the very first still-unticked "open items" box at the top of this pull request is still:

The to-PostgreSQL direction for Adapter, TupleTableSlot, and Datum.Accessor.

and that's why no PL/Java-based function or procedure can return any results yet. That will be done by storing the result value (or values) into the Call.result() TupleTableSlot, and the store direction doesn't work yet. So that's why Glot64 is limited to writing messages on standard output.

On the other hand, fetching from a TupleTableSlot is indeed working already, so a language handler can fetch values from the Call.arguments() TupleTableSlot using whatever Adapter is appropriate to each argument's type. The Glot64 language ignores passed arguments, but that's not a necessary limitation.

Also, of course, all the other unticked boxes in that open-items list are still unticked, so plenty of work remains. But the dispatcher is here, and the PLJavaBasedLanguage interface, enough to begin experimenting with the development of language handlers for languages of interest.

Typo discovered while writing pull request comment.

jcflack added 30 commits January 22, 2022 22:22

These two lines considered redundant

17a3a26

Both values have just been stashed by stashCallContext. Both will be restored 14 lines later by _closeIteration. And nothing in those 14 lines cares about them.

Tests for use of upper memory context

9342f68

Add a nested/SPI test to SetOfRecordTest

e380e6b

This can reveal issues with the nesting of SPI 'connections' or management of their associated memory contexts.

DualState clients that pass no Lifespan

f7561df

Simplify these: their C callers were passing unconditional null as the ResourceOwner before, which their Java constructors passed along unchanged. Now just have the Java constructor pass null as the Lifespan.

Client using its own ResourceOwner as Lifespan

901712a

This legacy Portal class is called from C and passed the address of the PostgreSQL ResourceOwner associated with the Portal itself.

Client with both ResourceOwner and MemoryContext

827f2d6

This is only an intermediate refactoring of VarlenaWrapper. Construction of one is still set in motion from C. Ultimately, it should implement Datum and be something that a Datum.Accessor can construct with a minimum of fuss.

jcflack added 27 commits September 8, 2023 18:04

Banish recent unchecked warning

5e8fa13

introduced in 45f965c.

Regroup and renumber ModelConstants

dc62b4e

Given support for gaps in the ModelConstants IDX_... values, renumber and slightly regroup the constants, with an eye toward reducing the blast radius of future additions when needed.

Rename AttNames.andIf to alsoIf

cb71f38

The name andIf could be misread as suggesting some kind of boolean dependency on what went before, when really each alsoIf only cares about its own predicate.

Merge branch 'REL1_7_STABLE' into feature/REL1_7_STABLE/model

3131af9

Distinguish the polymorphic pseudotypes

57ed135

Adds a RegType.needsResolution() method, true for the various PostgreSQL polymorphic pseudotypes, evaluated cheaply without having to materialize anything from the catalog.

Interlude: BitSet <-> PostgreSQL bitmapset

a061b4e

While adding T_Bitmapset to ModelConstants, add the other node tags expected to crop up in routine invocation also.

TupleTableSlotImpl.NullableDatum

95fcf8f

This is the flavor of TupleTableSlot that will enable a routine to access its incoming parameters (number, name, and type) using the same API as for query results.

Add example

afaeb88

A comment on InlineBlocks.execute promises that PL/Java itself will handle propagating the atomic/non-atomic status along to the SPI layer, but that's still on the to-do list; it doesn't happen yet.

Merge polyglot into feature/REL1_7_STABLE/model

f6f9c61

Adds a new dispatcher to make possible multiple JVM languages implemented atop PL/Java.

Restore pre-PG16 buildability (back to PG 13)

cbfcd09

jcflack added 2 commits April 3, 2024 20:06

Correct spelling of glot64

75d93b2

Typo discovered while writing pull request comment.

Merge REL1_7_STABLE into feature/REL1_7_STABLE/model

f072a64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work-in-Progress on PL/Java refactoring, API modernization #399

Work-in-Progress on PL/Java refactoring, API modernization #399

jcflack commented Jan 24, 2022 •

edited

Loading

jcflack commented Oct 9, 2023

Work-in-Progress on PL/Java refactoring, API modernization #399

Are you sure you want to change the base?

Work-in-Progress on PL/Java refactoring, API modernization #399

Conversation

jcflack commented Jan 24, 2022 • edited Loading

Why needed

JDBC

Parameter and return-value mapping

Elements of new API

Identification of data types

Other PostgreSQL catalog objects and key abstractions

Mapping PostgreSQL data types to what a PL supports

The Adapter class

A manager class for adapters

Extensibility

Adapters implementable in pure Java

Leaf adapters

Composing adapters

Contract-based adapters

ArrayAdapter

Java array-of-arrays

Adapter maintainability

Known open items

And then

jcflack commented Oct 9, 2023

Dispatcher for multiple PLs implemented atop PL/Java

jcflack commented Jan 24, 2022 •

edited

Loading

The `Adapter` class