Rename geo index directories in place when database is deleted #7

jiangphcn · 2017-09-06T08:15:46Z

Overview

Before change, the geo index directories and files were not deleted when database was deleted. This will cause orphan geo index files and directories. In detail, when database was deleted, the gen_server:cast(?MODULE, {cleanup, DbName}) was called in
https://github.com/cloudant-labs/hastings/blob/master/src/hastings_vacuum.erl#L99. Later, {ok, JsonDDocs} = get_ddocs(DbName) in clean_db/1 was called. Because database was deleted, {'DOWN', Ref, _, _, {error, database_does_not_exist}} message was sent in https://github.com/cloudant-labs/hastings/blob/master/src/hastings_vacuum.erl#L149. Finally, cleanup(DbName, ActiveSigs) was not called in https://github.com/cloudant-labs/hastings/blob/master/src/hastings_vacuum.erl#L127.

This PR is aimed to address above issue and take action against geo index when database is deleted.

The direct approach is to delete geo index files and directoires when database is deleted, i.e. https://github.com/cloudant-labs/hastings/pull/3/files#diff-7ac6be6388d90e131766d8c5824cb226R259

The second approach is to rename geo index in place such as from /srv/geo_index/shards/60000000-7fffffff/<dbname.ts>/e66df316792ab411705e2741bba44371 to /srv/geo_index/shards/60000000-7fffffff/<dbname.YYMMDD.HHMMSS.deleted.ts>/e66df316792ab411705e2741bba44371 when the corresponding database was deleted and "enable_database_recovery" configuration item is set to true. This allows geo index files to be re-used if database is recovered.
https://github.com/cloudant-labs/hastings/pull/3/files#diff-7ac6be6388d90e131766d8c5824cb226R259

Testing recommendations

make check apps=hastings skip_deps+=couch_epi tests=hastings_delete_db_test_
    Running test function(s):
     ======================== EUnit ========================
Hastings Delete Database Tests
  hastings_delete_db_test:58: should_delete_index_after_deleting_database...[0.402 s] ok
  hastings_delete_db_test:93: should_rename_index_after_deleting_database...[0.158 s] ok
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
  [done in 1.075 s]
=======================================================
  2 tests passed.

GitHub issue number

Bugzid: 86318

Related Pull Requests

#3 was deprecated with this PR
https://github.com/cloudant/chef-repo/pull/7719

N/A

Checklist

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

davisp

This seems wrong to me but I could be reading things wrong.

So far as I can tell, hastings_util:rename_dir/1 takes a path that looks lie "/srv/geo_index/shards/00-ff/davisp/testdb.12345/234adfe" and renames "/srv/geo_index/shards/00-ff/davisp/testdb.12345" to "/srv/geo_index/shards/00-ff/davisp/testdb.2017.09.21.09.50.34.deleted.12345/"

That is to say, that the rename_dir/1 takes a path to the directory of a specific index, and then renames the containing directory, thus effectively renaming all indexes for that given database. That seems wrong given the variable names and also makes me think that hastings_vacuum:rename_all_indexes/1 is broken given that its operating on all indexes even though they should all be gone after the first index is processed.

jiangphcn · 2017-09-22T02:31:04Z

@davisp Hi Paul, thanks a lot for your review on this PR.

So far as I can tell, hastings_util:rename_dir/1 takes a path that looks lie "/srv/geo_index/shards/00-ff/davisp/testdb.12345/234adfe" and renames "/srv/geo_index/shards/00-ff/davisp/testdb.12345" to "/srv/geo_index/shards/00-ff/davisp/testdb.2017.09.21.09.50.34.deleted.12345/"

yes, hastings_util:rename_dir/1 will rename the dbname part of geo index directory path.

That is to say, that the rename_dir/1 takes a path to the directory of a specific index, and then renames the containing directory

yes, it renames the containing directory.

thus effectively renaming all indexes for that given database.

no, each call to rename_dir/1 can only rename indexes belonging to one shard for the given database.

Let us have one example:

When calling hastings_util:get_existing_index_dirs/2 at https://github.com/cloudant-labs/hastings/blob/86318-rename-geo-indexfiles-when-dbdeleted/src/hastings_vacuum.erl#L270, we get DirList with

[<<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/00000000-1fffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/20000000-3fffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/40000000-5fffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/60000000-7fffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/80000000-9fffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/a0000000-bfffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/c0000000-dfffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/e0000000-ffffffff/geotestdb03.1506045171/7ac670379186b68685b22ee460224496">>]

In Lists:foreach/2, each call to hastings_util:rename_dir/1 at https://github.com/cloudant-labs/hastings/blob/86318-rename-geo-indexfiles-when-dbdeleted/src/hastings_vacuum.erl#L274 is used to rename dbname part in one shard.

If there are multiple geo design documents defined for one database, then the result from hastings_util:get_existing_index_dirs/2 could be

[<<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/00000000-1fffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/00000000-1fffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/20000000-3fffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/20000000-3fffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/40000000-5fffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/40000000-5fffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/60000000-7fffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/60000000-7fffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/80000000-9fffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/80000000-9fffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/a0000000-bfffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/a0000000-bfffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/c0000000-dfffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/c0000000-dfffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/e0000000-ffffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>,
 <<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/e0000000-ffffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>]

Thus, dbname part of indexes belonging to same shard will be renamed for each call to hastings_util:rename_dir/1.

Let me change from hastings_util:rename_dir/1 to hastings_util:rename_dbname_path/1 to make it clearer?

davisp · 2017-09-26T19:20:30Z

I don't think the function rename is what you want, I think you need to change the behavior. Your second example of multiple indexes is the case I'm worried about.

Consider what you've got listed for multiple indexes:

<<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/00000000-1fffffff/geotestdb07.1506046378/74147bfc38a36db1c692f787673e15df">>
<<"/Users/jiangph/couchdb/db/dev/lib/node1/data/geo/shards/00000000-1fffffff/geotestdb07.1506046378/7ac670379186b68685b22ee460224496">>

For simplification I'll call those:

prefix/range/dbname.suffix/index_a
prefix/range/dbname.suffix/index_b

Your rename function takes this input:

prefix/range/dbname.suffix/index_a

And renames the directory:

prefix/range/dbname.suffix

To:

prefix/range/dbname.date.deleted.suffix

When you fold across the original list of index directories this means that when you process the entry for index_a you are also moving index_b, thus when you go to process index_b the original directory won't exist and the rename will fail.

What I think you should be doing instead is moving an index at a time, something along the lines of:

do_rename(IdxDir) ->
    IdxName = filename:basename(IdxDir),
    DbDir = filename:dirname(IdxDir),
    DeleteDir = calculate_delete_directory(DbDir),
    RenamePath = filename:join([DeleteDir, IdxName]),
    filelib:ensure_dir(DeleteDir),
    file:rename(IdxDir, RenamePath).

Also I noticed while re-reviewing that we're throwing away the actual shard name and then falling back to mem3:shards(DbName) which is wrong as a user could recreate a database quickly which would have us looking in the wrong place. Rather than use mem3:dbname on the db event (which is a shard name) I think we should process each shard individualy rather than what we do currently and process all shards on the node repeatedly (ie, we currently try and clean all shards N times, where N is the number of shards on the node).

davisp · 2017-09-26T19:22:50Z

To clarify, throwing away the shard name and using mem3:shards in the cleanup functions means that we're losing track of the suffix from dbname.suffix that caused the event. Thus if a user cycles a database and design doc its possible we end up grabbing the index from the wrong dbname.suffix and accidentally remove an index that should not have been moved (as it could be from a new database that has a newer suffix timestamp).

jiangphcn · 2017-09-27T04:06:11Z

@davisp Hi Paul, thanks for your comments. I thought that if we need to rename all indexes, we can rename them in one call. Even if there is failure for second index or later, all indexes belonging to one shard were already renamed. However, I agree with you that we need to modify codes to only rename specified index. Also, I adjust codes to avoid unexpected situation where database was deleted and re-created quickly. Could you please help review when you get time? Thanks a lot.

davisp

Few minor changes but good work otherwise.

davisp · 2017-09-28T18:26:34Z

src/hastings_util.erl

Don't need to alias this anymore since its not being modified.

davisp · 2017-09-28T18:27:09Z

src/hastings_vacuum.erl

clean_sharddb is a bit difficult to read. Can you change it to clean_shard_db?

davisp · 2017-09-28T18:29:42Z

src/hastings_vacuum.erl

You're gonna have a case match issue here since you don't have a clause matching the compaction for Context.

jiangphcn · 2017-09-29T01:34:45Z

@davisp can you double check my recent change when you get time? Thanks

davisp

Minor change to help reading that case statement.

davisp · 2017-09-29T14:37:43Z

src/hastings_vacuum.erl

Ah, this case clause is broken but in a non-obvious manner. We already asserted that Context == delete up in handle_cast so only deletion contexts come through this function. If you want to keep the assertion here, you should do something like delete = couch_util:get_value(context, Options, delete) on the first line and then remove the Context pattern match from the case.

davisp

+1

Bugzid: 86318

jiangphcn force-pushed the 86318-rename-geo-indexfiles-when-dbdeleted branch from 7e34516 to 6866129 Compare September 12, 2017 10:14

davisp suggested changes Sep 21, 2017

View reviewed changes

davisp suggested changes Sep 28, 2017

View reviewed changes

davisp suggested changes Sep 29, 2017

View reviewed changes

davisp approved these changes Sep 29, 2017

View reviewed changes

jiangphcn force-pushed the 86318-rename-geo-indexfiles-when-dbdeleted branch from dd71e18 to a8ce2bb Compare October 11, 2017 02:28

jiangphcn merged commit 1cca585 into master Oct 11, 2017

Rename geo index directories in place when database is deleted

a8ce2bb

Bugzid: 86318

Rename geo index directories in place when database is deleted #7

Rename geo index directories in place when database is deleted #7

Uh oh!

Conversation

jiangphcn commented Sep 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing recommendations

GitHub issue number

Related Pull Requests

Checklist

Uh oh!

davisp left a comment

Choose a reason for hiding this comment

Uh oh!

jiangphcn commented Sep 22, 2017

Uh oh!

davisp commented Sep 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davisp commented Sep 26, 2017

Uh oh!

jiangphcn commented Sep 27, 2017

Uh oh!

davisp left a comment

Choose a reason for hiding this comment

Uh oh!

davisp Sep 28, 2017

Choose a reason for hiding this comment

Uh oh!

davisp Sep 28, 2017

Choose a reason for hiding this comment

Uh oh!

davisp Sep 28, 2017

Choose a reason for hiding this comment

Uh oh!

jiangphcn commented Sep 29, 2017

Uh oh!

davisp left a comment

Choose a reason for hiding this comment

Uh oh!

davisp Sep 29, 2017

Choose a reason for hiding this comment

Uh oh!

davisp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiangphcn commented Sep 6, 2017 •

edited

Loading

davisp commented Sep 26, 2017 •

edited

Loading