Fix to SNAP-2620. #434

kneeraj · 2018-10-11T18:36:39Z

Changes proposed in this pull request

The fix is to replace the wrapper, which indicates delete done by the same
transaction, by the gfxd tx entry. During rollback however we would have to
reinstate the committed region entry which was there before this transaction.

Patch testing

Added dunit for the same

Is precheckin with -Pstore clean?

Running

ReleaseNotes changes

None

Other PRs

No

The fix is to replace the wrapper, which indicates delete done by the same transaction, by the gfxd tx entry. During rollback however we would have to reinstate the committed region entry which was there before this transaction.

sumwale

Added few comments. Not sure if current way is clean. We should try to fix the proper way by replacing in insert operation.

sumwale · 2018-10-12T12:59:32Z

...src/main/java/com/pivotal/gemfirexd/internal/engine/access/index/MemIndexScanController.java

          conglomerate.getGemFireContainer(), Integer.toHexString(openMode),
          scanColumnList, ArrayUtils.objectString(startKeyValue),
          startSearchOperator, ArrayUtils.objectString(stopKeyValue),
-          stopSearchOperator, ArrayUtils.objectString(qualifier),
-          tran.getActiveTXState());
+          stopSearchOperator, ArrayUtils.objectString(qualifier));
    }


why this logging change?

Noted a task to look into this. This seems to be wrong logging for getting. Active tx state is set properly later. But here it gives:
unexpected change in TXState expected: " + myTX
+ ", thread-local: " + TXManagerImpl.getCurrentTXState();

sumwale · 2018-10-12T14:23:56Z

gemfirexd/core/src/main/java/com/pivotal/gemfirexd/internal/engine/access/GfxdTXStateProxy.java

+          deleted = SortedMap2IndexInsertOperation.doMe(null, null, indexContainer,
+              indexKey, (RowLocation) wrapper.getUnderlyingRegionEntry(), true,
+              null, false);
+        }


Replacing during insert does not work for some reason? This looks like a hackish way.

For delete followed by insert in a txn an error use to come for unique indexes. So for that the change done is to repplace the wrapperRowLocation put by delete with the GfxdTxEntryState. Check SortedMap2IndexInsertOperation changes. So if you ask me this is not hackish. This only indicates that the wrapper which was suppossed to be there is not anymore as for unique indexes this sequence can create that situation. Please note it is not done for normal index type.

sumwale · 2018-10-12T14:27:34Z

.../com/pivotal/gemfirexd/internal/engine/access/operations/SortedMap2IndexInsertOperation.java

+                // doing a replace and then insert in GfxdTXStateProxy.cleanupIndexEntryForDestroy
+                // Check SNAP-2620 in that class.
+                // TXRegionState txrs = tx.readRegion(((GfxdTXEntryState)value).getDataRegion());
+                // txrs.getToBeReinstatedIndexMap().removeKeyPair(value, oldValue);


This looks like the proper fix. Is this missing call to addUniqIdxInfosForReplacedEntries?

Yes as commented, I wanted to do like this but this failed due to to some reason. That's why marked a todo. BTW looks like more fix is required. I extended some testing and at least one more failure is what I am seeing.

key column changes and also some other column changes ) -- then rollback and delete. Need to add more test cases.

ahshahid · 2018-10-17T00:53:21Z

.../com/pivotal/gemfirexd/internal/engine/access/operations/SortedMap2IndexInsertOperation.java

+                SQLState.GFXD_OPERATION_CONFLICT, ce, ce.getMessage());
+          } else {
+            if (oldValue instanceof WrapperRowLocationForTxn) {
+              GfxdTXEntryState gfxdtxentry = (GfxdTXEntryState)value;


In what situation will the gfxdtxentry not contain the committed region entry?

- for routed queries, the schema in ResultSetMetadata was hard-coded to "APP" - also added a SQLWarning field to pass through any warnings from the source - correct SQLWarning handling in GemFireResultSet and ij display

- force output directory of IDEA to be same as gradle output so that it can pick up generated java/resource files among others - generated resource files are now copied to classes output directory rather than resources since IDEA does not seem to include resources in CLASSPATH when its output directory has been customized like above

don't attempt to cleanup region initialization in case of fatal error like OOME since that itself can get stuck

#441) * Checking if the entry causing constraint violation is destroyed or removed in a separate thread with synch taken on region entry, to avoid dead lock.

- skip init to set history max-size else it invokes load() in constructor that truncates the file to default 500 lines - update jline to 2.14.6 for this new constructor (jline/jline2#277)

As suggested by Asif: :dunit replaced to :dunitTest :gemfirexd-tools replaced with :snappy-store:snappydata-store-tools

…444) For cases having large number of regions (like 200 or more), SampleCollector runs in a hot loop trying to lookup every statistic in every monitor. Latter is created for every metric/gauge in every statistic instance and thus runs into thousands (e.g. ~10K with 500 tables) so overall lookups become like 5 million in every run. While the design of having a monitor for every metric/gauge for every statistic instance, and looking up against it in case of any updates itself is inefficient, the changes below bring the performance within acceptable range with tables of the order of 1000 or so which is sufficient for practical purposes. ## Changes proposed in this pull request - Skip including statistics for checking that have seen no updates. This reduces the number of changes from 500 to single digits and improve the performance by two orders or magnitude or more. Actual number of statistics with updates in a second are expected to be small on an average. - avoid lookup in hive metastore for tableType column in SYSTABLES scan and instead use PartitionedRegion.needsBatching() - avoid lookup of hive metastore for isRowTable() since calling points already have the HiveTable - renamed "needsBatching()" to "isRowBuffer()" and cleaned up checks for column/row buffers to consistently use LocalRegion.isInternalColumnTable() and isRowBuffer() - add methods to return all PartitionedRegions and clear PartitionedRegion.isRowBuffer since it may have been cached incorrectly when column store for a table was not yet created - add routing for DESCRIBE EXTENDED - reset isRowBuffer flag after column store initialization to set correctly after recovery

Some ThreadLocals used by code are member variables that hold reference to the object causing a memory leak. Changes to remove those ThreadLocals causing them to be remove from ThreadLocalMap instead of just setting to null/invalid when done.

* adding jvmkill.c and also merged it into libgemfirexd64.so and libgemfirexd.so * changes related to review comments and enhances * replacing realpath command with absPath() * added generation of ibgemfirexd*.dylib for Mac * adding gemfirexd(64).dylib files * recompiled libgemfirexd so files on centos with logging in jvmkill * minor fixes * logging heap histogram on critical event * adding libgemfirexd dylib files * adding .so files compiled on centos * incorporating review suggestions * implementing review suggestions

- consistently allow for both Log4J format and older GemFire format log-level names - add "snappydata.numServersOnHost" property to scale down default memory-size allocated to a server proportionately - updated some dependencies in build.gradles to avoid unnecessary recompilations

* changes for getting ldap groups for a user. The implementation first checks if "memberOf" attribute is present in the user's information ( as in case of AD Groups). If the information is available that is used to get names of Ldap Groups else the code recurses through the ldap group tree.

* Check the ControlConnection and if it on a server then disable load-balance by default (and close the ControlConnection) * added test for load-balance=false * updated tests to deal with thrift and added checks that default on server connections is load-balance=false * route "DROP DATABASE" DDL

* Change the default auto reconnect to false, for the locator it is still set to true

… case they are different

* Changes related to oldEntryMap and snapshot to store NonLocalRegionEntry instead of BlockingQueue * Changes for fast cleaning of oldEntryMap Also reduces the overhead by first storing just the NonLocalRegionEntry and later a collection * Change the old entry cleaner interval

Fixing the issue of reservoir region for sample being skipped during ddl replay. Using a postProcess flag to delay the execution of GfxdSystemProcedureMessages with postProcess flag as true & replaying those messages , once the boot of the data base has happened.

Partial part of PR #471

explicitly use DRDA connections in some tests that don't boot local VM so don't set thrift-default=false in GemXD mode

As default in the product has changed the test was failing

* Were returning RegionEntry to the iterator, which could have its value changed by a different writer before iterator could consume it * The NonLocalRegionEntry had oldDiskId and ColumFormatValue used to read from DiskEntry instead of the value in memory. Setting that to null now, which means the value in memory will be read * Used CustomEntryConcurrentHashMap to make sure we remove entry only if the same entry is present in the map * Similarly, made sure that before a queue is removed, its size is 0 and anytime an entry is added to queue it is put again in the map for atomicity * While checking whether an entry in region exists in the snapshot of a tx, avoided the entry that was being changed so that due to uncommitted entry a valid entry doesn't get removed from the oldEntryMap * Added tests where bulk puts and bulk reads are checked for atomicity * Don't account for SerializeDiskBuffer in off-heap * For ObjectiStore use different getExecRow.

Problem with using newEntryWithoutFaultin when creating NonLocalRegionEntry is that it leads to an extra reference count. But higher layer cannot do extra release in all cases because when entry is read from oldEntriesMap then the extra release will be incorrect. Also the extra reference count cannot be released immediately after this call because if the value has been read from disk without fault-in then it will have only one refCount so will get released immediately before even being read by iterator. The change to create NLRE in all cases leads to a performance problem when data is partially on disk since it will be read in essentially random order. This will be fixed in a subsequent PR. ## Changes proposed in this pull request Use NLRE.newEntry instead of newEntryWithoutFaultin to create a snapshot of entry.

@kneeraj

…ding upon enterprise flag. * Discussed with @kneeraj

- adjust changes that change SnappyData to use lower-case identifiers - change quoted identifiers to upper-case: for consistency, always use upper-case identifier names because Spark side will do case-insensitive analysis by default - also fixed handling of double backtick conversion to single - add support for OpenJDK in hydra framework - fix schema name handling in SnappyRegionStatsCollectorFunction

* Fixes for SNAP-2707 * Added query routing code for put into values of column table * changes for SNAP-2707 - Modified exception message. - Added routeQuery parameter while checking for routing a query. * Minor change * Added formatting changes * Added implementation for returning count of put into query on column table Incorporated review comments * Incorporated review comments

…o remove data (#481) * Fix for SNAP-3007: Delete without parameters using PreparedStatement fails to remove data

Fix to SNAP-2620.

4ef3215

The fix is to replace the wrapper, which indicates delete done by the same transaction, by the gfxd tx entry. During rollback however we would have to reinstate the committed region entry which was there before this transaction.

kneeraj requested review from suranjan, sumwale and ahshahid October 11, 2018 18:36

Changed the test name a bit.

c282b43

sumwale reviewed Oct 12, 2018

View reviewed changes

Neeraj Kumar and others added 5 commits October 13, 2018 00:29

Fixed issues around (delete --> insert --> update ( both such that index

d551605

key column changes and also some other column changes ) -- then rollback and delete. Need to add more test cases.

Nullpointer check.

f683f5f

Merge branch 'snappy/master' into SNAP-2620

38657ac

Some more fixes and unit tests.

24a4864

minor change

a09719b

ahshahid reviewed Oct 17, 2018

View reviewed changes

Neeraj Kumar and others added 18 commits October 17, 2018 11:01

Merge branch 'snappy/master' into SNAP-2620

41f3ea6

Merge branch 'snappy/master' into SNAP-2620

aaf6c3e

Fix schema in ResultSet metadata

b3f38ca

- for routed queries, the schema in ResultSetMetadata was hard-coded to "APP" - also added a SQLWarning field to pass through any warnings from the source - correct SQLWarning handling in GemFireResultSet and ij display

Fixing build.sh and updated shared libraries with RHEL6 builds

1a10472

Catch and throw any Errors like OOMEs during region initialization

5b2ff3c

don't attempt to cleanup region initialization in case of fatal error like OOME since that itself can get stuck

Checking if the entry causing constraint violation is destroyed or re… (

286ee3a

#441) * Checking if the entry causing constraint violation is destroyed or removed in a separate thread with synch taken on region entry, to avoid dead lock.

Increase history file size to 50000 lines

715bbd3

- skip init to set history max-size else it invokes load() in constructor that truncates the file to default 500 lines - update jline to 2.14.6 for this new constructor (jline/jline2#277)

Patch from Sumedh for SNAP-2641

2409eaf

Update README.md

6f28fcc

As suggested by Asif: :dunit replaced to :dunitTest :gemfirexd-tools replaced with :snappy-store:snappydata-store-tools

Add CACHE/UNCACHE/RECACHE/RESET for routing

fb60da7

minor change to stdout message in startup

3d0e5eb

changing the TABLE_TYPE data type to varchar(16) (#443)

a7abc00

Updates to remove member ThreadLocals

852791f

Some ThreadLocals used by code are member variables that hold reference to the object causing a memory leak. Changes to remove those ThreadLocals causing them to be remove from ThreadLocalMap instead of just setting to null/invalid when done.

minor formatting changes

f509142

ahshahid and others added 25 commits July 3, 2019 16:29

added schema creation pattern for routing to lead node (#462)

c359a30

[SNAP-2956] Added a constructor with cause (#467)

4bce7b9

fix occasional NPE in ClientService init

1773efa

Change the default auto reconnect to false (#468)

20bbe67

* Change the default auto reconnect to false, for the locator it is still set to true

updates to handle capacity vs limit of DirectByteBuffers correctly in…

33497e1

… case they are different

avoid double printing of test start/stop when run from SnappyData

66a1d48

Adding backtick also for quoted identifier.

7c6938a

Partial part of PR #471

fix some dunit test failures

def5ee9

explicitly use DRDA connections in some tests that don't boot local VM so don't set thrift-default=false in GemXD mode

Set auto reconnect to true in test (#472)

b823920

As default in the product has changed the test was failing

* Version changes

500760a

* Updated some metainfo in prep for 1.1.0 release.

d69ffb7

disable load-balance by default in native driver

0c8be32

* Select product name to be printed before snappy shell starts, depen…

39df4e9

…ding upon enterprise flag. * Discussed with @kneeraj

fix occasional failure in GfxdTableMBeanDUnit

13ec69f

Add support for OpenJ9

b4eef54

minor change to prevent possible NPE

8ad2fb2

[SNAP-3007] Delete without parameters using PreparedStatement fails t…

c6da814

…o remove data (#481) * Fix for SNAP-3007: Delete without parameters using PreparedStatement fails to remove data

kneeraj mentioned this pull request Jul 5, 2019

Fix for SNAP-2620. New branch created. SNAP-2620 will be deleted. #484

Merged

ashetkar force-pushed the SNAP-2620 branch from cde999c to c6da814 Compare September 24, 2020 09:36

sumwale force-pushed the snappy/master branch from 25ddfef to 959bfad Compare June 26, 2021 19:21

sumwale force-pushed the snappy/master branch 2 times, most recently from 337768c to ec453a6 Compare April 8, 2022 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to SNAP-2620. #434

Fix to SNAP-2620. #434

kneeraj commented Oct 11, 2018

sumwale left a comment

sumwale Oct 12, 2018

kneeraj Oct 12, 2018

sumwale Oct 12, 2018

kneeraj Oct 12, 2018

sumwale Oct 12, 2018

kneeraj Oct 12, 2018

ahshahid Oct 17, 2018

Fix to SNAP-2620. #434

Are you sure you want to change the base?

Fix to SNAP-2620. #434

Conversation

kneeraj commented Oct 11, 2018

Changes proposed in this pull request

Patch testing

Is precheckin with -Pstore clean?

ReleaseNotes changes

Other PRs

sumwale left a comment

Choose a reason for hiding this comment

sumwale Oct 12, 2018

Choose a reason for hiding this comment

kneeraj Oct 12, 2018

Choose a reason for hiding this comment

sumwale Oct 12, 2018

Choose a reason for hiding this comment

kneeraj Oct 12, 2018

Choose a reason for hiding this comment

sumwale Oct 12, 2018

Choose a reason for hiding this comment

kneeraj Oct 12, 2018

Choose a reason for hiding this comment

ahshahid Oct 17, 2018

Choose a reason for hiding this comment