Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to SNAP-2620. #434

Open
wants to merge 82 commits into
base: snappy/master
Choose a base branch
from
Open

Fix to SNAP-2620. #434

wants to merge 82 commits into from

Conversation

kneeraj
Copy link

@kneeraj kneeraj commented Oct 11, 2018

Changes proposed in this pull request

The fix is to replace the wrapper, which indicates delete done by the same
transaction, by the gfxd tx entry. During rollback however we would have to
reinstate the committed region entry which was there before this transaction.

Patch testing

Added dunit for the same

Is precheckin with -Pstore clean?

Running

ReleaseNotes changes

None

Other PRs

No

The fix is to replace the wrapper, which indicates delete done by the same
transaction, by the gfxd tx entry. During rollback however we would have to
reinstate the committed region entry which was there before this transaction.
Copy link

@sumwale sumwale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments. Not sure if current way is clean. We should try to fix the proper way by replacing in insert operation.

conglomerate.getGemFireContainer(), Integer.toHexString(openMode),
scanColumnList, ArrayUtils.objectString(startKeyValue),
startSearchOperator, ArrayUtils.objectString(stopKeyValue),
stopSearchOperator, ArrayUtils.objectString(qualifier),
tran.getActiveTXState());
stopSearchOperator, ArrayUtils.objectString(qualifier));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this logging change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a task to look into this. This seems to be wrong logging for getting. Active tx state is set properly later. But here it gives:
unexpected change in TXState expected: " + myTX
+ ", thread-local: " + TXManagerImpl.getCurrentTXState();

deleted = SortedMap2IndexInsertOperation.doMe(null, null, indexContainer,
indexKey, (RowLocation) wrapper.getUnderlyingRegionEntry(), true,
null, false);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing during insert does not work for some reason? This looks like a hackish way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For delete followed by insert in a txn an error use to come for unique indexes. So for that the change done is to repplace the wrapperRowLocation put by delete with the GfxdTxEntryState. Check SortedMap2IndexInsertOperation changes. So if you ask me this is not hackish. This only indicates that the wrapper which was suppossed to be there is not anymore as for unique indexes this sequence can create that situation. Please note it is not done for normal index type.

// doing a replace and then insert in GfxdTXStateProxy.cleanupIndexEntryForDestroy
// Check SNAP-2620 in that class.
// TXRegionState txrs = tx.readRegion(((GfxdTXEntryState)value).getDataRegion());
// txrs.getToBeReinstatedIndexMap().removeKeyPair(value, oldValue);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the proper fix. Is this missing call to addUniqIdxInfosForReplacedEntries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as commented, I wanted to do like this but this failed due to to some reason. That's why marked a todo. BTW looks like more fix is required. I extended some testing and at least one more failure is what I am seeing.

Neeraj Kumar and others added 5 commits October 13, 2018 00:29
SQLState.GFXD_OPERATION_CONFLICT, ce, ce.getMessage());
} else {
if (oldValue instanceof WrapperRowLocationForTxn) {
GfxdTXEntryState gfxdtxentry = (GfxdTXEntryState)value;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation will the gfxdtxentry not contain the committed region entry?

Neeraj Kumar and others added 18 commits October 17, 2018 11:01
- for routed queries, the schema in ResultSetMetadata was hard-coded to "APP"
- also added a SQLWarning field to pass through any warnings from the source
- correct SQLWarning handling in GemFireResultSet and ij display
- force output directory of IDEA to be same as gradle output so that it can
  pick up generated java/resource files among others
- generated resource files are now copied to classes output directory rather
  than resources since IDEA does not seem to include resources in CLASSPATH
  when its output directory has been customized like above
don't attempt to cleanup region initialization in case of fatal error like OOME
since that itself can get stuck
#441)

* Checking if the entry causing constraint violation is destroyed or removed in a separate thread with synch taken on region entry, to avoid dead lock.
- skip init to set history max-size else it invokes load() in constructor
  that truncates the file to default 500 lines
- update jline to 2.14.6 for this new constructor (jline/jline2#277)
As suggested by Asif:
:dunit replaced to :dunitTest 
:gemfirexd-tools  replaced with :snappy-store:snappydata-store-tools
…444)

For cases having large number of regions (like 200 or more), SampleCollector runs in a hot loop
trying to lookup every statistic in every monitor. Latter is created for every metric/gauge in every
statistic instance and thus runs into thousands (e.g. ~10K with 500 tables) so overall lookups
become like 5 million in every run.

While the design of having a monitor for every metric/gauge for every statistic instance, and looking
up against it in case of any updates itself is inefficient, the changes below bring the performance
within acceptable range with tables of the order of 1000 or so which is sufficient for practical purposes.

## Changes proposed in this pull request

- Skip including statistics for checking that have seen no updates. This reduces the number of changes
  from 500 to single digits and improve the performance by two orders or magnitude or more.
  Actual number of statistics with updates in a second are expected to be small on an average.
- avoid lookup in hive metastore for tableType column in SYSTABLES scan and instead use
  PartitionedRegion.needsBatching()
- avoid lookup of hive metastore for isRowTable() since calling points already have the HiveTable
- renamed "needsBatching()" to "isRowBuffer()" and cleaned up checks for column/row buffers to
  consistently use LocalRegion.isInternalColumnTable() and isRowBuffer()
- add methods to return all PartitionedRegions and clear PartitionedRegion.isRowBuffer since it may
  have been cached incorrectly when column store for a table was not yet created
- add routing for DESCRIBE EXTENDED
- reset isRowBuffer flag after column store initialization to set correctly after recovery
Some ThreadLocals used by code are member variables that hold reference to the object
causing a memory leak. Changes to remove those ThreadLocals causing them to be remove
from ThreadLocalMap instead of just setting to null/invalid when done.
* adding jvmkill.c and also merged it into libgemfirexd64.so and libgemfirexd.so

* changes related to review comments and enhances

* replacing realpath command with absPath()

* added generation of ibgemfirexd*.dylib for Mac

* adding gemfirexd(64).dylib files

* recompiled libgemfirexd so files on centos with logging in jvmkill

* minor fixes

* logging heap histogram on critical event

* adding libgemfirexd dylib files

* adding .so files compiled on centos

* incorporating review suggestions

* implementing review suggestions
- consistently allow for both Log4J format and older GemFire format log-level names
- add "snappydata.numServersOnHost" property to scale down default memory-size
  allocated to a server proportionately
- updated some dependencies in build.gradles to avoid unnecessary recompilations
ahshahid and others added 25 commits July 3, 2019 16:29
* changes for getting ldap groups for a user. The implementation first checks if "memberOf" attribute is present in the user's information ( as in case of AD Groups). If the information is available that is used to get names of Ldap Groups else the code recurses through the ldap group tree.
* Check the ControlConnection and if it on a server then disable load-balance by default (and close
  the ControlConnection)
* added test for load-balance=false
* updated tests to deal with thrift and added checks that default on server
  connections is load-balance=false
* route "DROP DATABASE" DDL
* Change the default auto reconnect to false, for the locator it is still set to true
* Changes related to oldEntryMap and snapshot to store NonLocalRegionEntry instead of BlockingQueue

* Changes for fast cleaning of oldEntryMap Also reduces the overhead by first storing just the NonLocalRegionEntry and later a collection

* Change the old entry cleaner interval
Fixing the issue of reservoir region for sample being skipped during ddl replay. Using a postProcess flag to delay the execution of  GfxdSystemProcedureMessages with postProcess flag as true & replaying those messages , once the boot of the data base has happened.
explicitly use DRDA connections in some tests that don't boot local VM so don't
set thrift-default=false in GemXD mode
As default in the product has changed the test was failing
* Were returning RegionEntry to the iterator, which could have its value changed by a different writer before iterator could consume it
* The NonLocalRegionEntry had oldDiskId and ColumFormatValue used to read from DiskEntry instead of the value in memory. Setting that to null now, which means the value in memory will be read
 * Used CustomEntryConcurrentHashMap to make sure we remove entry only if the same entry is present in the map
* Similarly, made sure that before a queue is removed, its size is 0 and anytime an entry is added to queue  it is put again in the map for atomicity
* While checking whether an entry in region exists in the snapshot of a tx, avoided the entry that was being changed so that due to uncommitted entry a valid entry doesn't get removed from the oldEntryMap
* Added tests where bulk puts and bulk reads are checked for atomicity
* Don't account for SerializeDiskBuffer in off-heap
* For ObjectiStore use different getExecRow.
Problem with using newEntryWithoutFaultin when creating NonLocalRegionEntry is that it leads to an extra reference count. But higher layer cannot do extra release in all cases because when entry is read from oldEntriesMap then the extra release will be incorrect. Also the extra reference count cannot be released immediately after this call because if the value has been read from disk without fault-in then it will have only one refCount so will get released immediately before even being read by iterator.

The change to create NLRE in all cases leads to a performance problem when data is partially on disk since it will be read in essentially random order. This will be fixed in a subsequent PR.

## Changes proposed in this pull request

Use NLRE.newEntry instead of newEntryWithoutFaultin to create a snapshot of entry.
- adjust changes that change SnappyData to use lower-case identifiers
- change quoted identifiers to upper-case:
  for consistency, always use upper-case identifier names because Spark side
  will do case-insensitive analysis by default
- also fixed handling of double backtick conversion to single
- add support for OpenJDK in hydra framework
- fix schema name handling in SnappyRegionStatsCollectorFunction
* Fixes for SNAP-2707

* Added query routing code for put into values of column table

* changes for SNAP-2707
- Modified exception message.
- Added routeQuery parameter while checking for routing a query.

* Minor change

* Added formatting changes

* Added implementation for returning count of put into query on column table
Incorporated review comments

* Incorporated review comments
…o remove data (#481)

* Fix for SNAP-3007: Delete without parameters using PreparedStatement fails to remove data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants