Patch to fix DONT_CARE locality enforcement flag that is ignored at the moment #3

mesmorkalov · 2016-04-05T17:46:43Z

Issue description:

'DONT_CARE' locality enforcement option is ignored when client specifies it in the resource request. The problem is that after YARN allocates the resources they are matched by Llama against the request queue and the matching pattern includes node names.

Patch description:

Note: Addresses only the configuration with both resource caching and normalization disabled.

Introduce internal collection for ‘DONT_CARE’-attributed requests which were not granted yet.
After resources allocated by YARN have been ‘strongly’ matched against the existing requests, match the remaining of them against the new collection in a ‘weak’ fashion (only by number of vCPUs and amount of memory)
Matched resources are taken by Llama and removed from the new collection, while the rest of them are returned back into YARN pool.
Fix the inconsistency between the Llama source code and Llama config w.r.t. name (llama.am.caching.enabled vs llama.am.cache.enabled) of the property responsible for resource caching – currently the property change is ignored due to this
Add new config property (llama.am.resource.normalizing.enabled.#QUEUE#) to make disabling normalization for particular queue possible (rather than disabling normalization globally for all queues).

Signed-off-by: Mikhail Smorkalov [email protected]

moment Signed-off-by: Mikhail Smorkalov <[email protected]>

kambatla · 2016-04-06T22:32:37Z

llama/src/main/java/com/cloudera/llama/am/api/LlamaAM.java


  public static final String CACHING_ENABLED_KEY =
-      PREFIX_KEY + "caching.enabled";
+      PREFIX_KEY + "cache.enabled";


Thanks for updating this. Can we also update the reference to "caching.enabled" in llama-site.xml as well?

kambatla · 2016-04-06T23:12:19Z

Can we add unit tests to verify this gives us the desired behavior. If that turns out to be hard, it would be nice to know why. Also, would be good to hear any manual validation.

mesmorkalov · 2016-04-07T10:29:09Z

The problem with the additional unit test is that the existing test infrastructure seems to miss the ability to generate different names for the fake nodes. When I create miniYarn configuration with several nodes, all of them have the same name (the name of the host where I am running the tests). Modification of test infrastructure seems to be not trivial and may result in significantly more efforts than fixing the actual issue, so your ideas on how to overcome this are very much appreciated.
As for the manual validation, I've run the following test on real cluster:

disable resource caching and normalization
two requests are sent by Llama client to LlamaAM with the same node name specified as the 'location' and 'DONT_CARE' specified for the 'locality'.
both requests are granted and the second one got the resources from the node different from the originally specified one.

I have also checked that the fix doesn't affect the default scenario with resource caching and normalization enabled (since in this case we simply don't have requests with DONT_CARE or PREFERRED locality as MUST is hard-coded when creating normalized request).
Please let me know if I missed some scenarios which should be taken into account.

Signed-off-by: Mikhail Smorkalov <[email protected]>

Patch to fix DONT_CARE locality enforcement flag that is ignored at the

17434bd

moment Signed-off-by: Mikhail Smorkalov <[email protected]>

kambatla reviewed Apr 6, 2016
View reviewed changes

Applied first portion of comments after review with Karthik

572892b

Signed-off-by: Mikhail Smorkalov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patch to fix DONT_CARE locality enforcement flag that is ignored at the moment #3

Patch to fix DONT_CARE locality enforcement flag that is ignored at the moment #3

Uh oh!

mesmorkalov commented Apr 5, 2016

Uh oh!

kambatla Apr 6, 2016

Uh oh!

mesmorkalov Apr 7, 2016

Uh oh!

kambatla commented Apr 6, 2016

Uh oh!

mesmorkalov commented Apr 7, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Patch to fix DONT_CARE locality enforcement flag that is ignored at the moment #3

Are you sure you want to change the base?

Patch to fix DONT_CARE locality enforcement flag that is ignored at the moment #3

Uh oh!

Conversation

mesmorkalov commented Apr 5, 2016

Uh oh!

kambatla Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

mesmorkalov Apr 7, 2016

Choose a reason for hiding this comment

Uh oh!

kambatla commented Apr 6, 2016

Uh oh!

mesmorkalov commented Apr 7, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants