Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Tensor.dataToString() #272

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

cowwoc
Copy link

@cowwoc cowwoc commented Apr 3, 2021

Fixes #268

@google-cla
Copy link

google-cla bot commented Apr 3, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@cowwoc
Copy link
Author

cowwoc commented Apr 3, 2021

@googlebot I signed it!

@cowwoc cowwoc changed the title Added Tensor.dataToString() Added Tensor.dataToString() #268 Apr 3, 2021
@cowwoc cowwoc changed the title Added Tensor.dataToString() #268 Added Tensor.dataToString() Apr 3, 2021
@cowwoc
Copy link
Author

cowwoc commented Apr 3, 2021

@rnett Please review.

Apologies in advance. The implementation got hairy once I added maxWidth support. Please refactor this as you see fit to improve readability.

@cowwoc
Copy link
Author

cowwoc commented Apr 3, 2021

The build failure doesn't seem to be related to my changes. Please confirm.

@Craigacp
Copy link
Collaborator

Craigacp commented Apr 3, 2021

The quick build failing in javadoc generation is a known issue we're working on in other PRs.

@cowwoc cowwoc marked this pull request as draft April 3, 2021 14:30
@cowwoc cowwoc marked this pull request as ready for review April 3, 2021 14:48
Copy link
Contributor

@rnett rnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have a few documentation requests and a request for float formatting. Did you consider adding an Operand.dataToString() default method that calls asTensor().dataToString() (or similar)?

@cowwoc
Copy link
Author

cowwoc commented Apr 4, 2021

Looks good, I have a few documentation requests and a request for float formatting. Did you consider adding an Operand.dataToString() default method that calls asTensor().dataToString() (or similar)?

Yes, I could do that. Are you saying that you would leave the main implementation on Tensor and simply add a convenience method on Operand that redirects to it?

cowwoc added 3 commits April 4, 2021 14:56
* Added support for Tensors.toString(RawTensor).
* Test multidimensional tensor, RawTensor.
@cowwoc
Copy link
Author

cowwoc commented Apr 4, 2021

FYI, I decided to remove trailing commas after closing brackets. I think it looks better this way and I think that tf.print does the same. In order words:

[
  [1, 2],
  [3, 4],
  [5, 6]
]

Will now show up as:

[
  [1, 2]
  [3, 4]
  [5, 6]
]

@rnett
Copy link
Contributor

rnett commented Apr 4, 2021

Looks good, I have a few documentation requests and a request for float formatting. Did you consider adding an Operand.dataToString() default method that calls asTensor().dataToString() (or similar)?

Yes, I could do that. Are you saying that you would leave the main implementation on Tensor and simply add a convenience method on Operand that redirects to it?

Yeah, that's exactly what I meant.

Copy link
Contributor

@rnett rnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the small comments, I made a PR (cowwoc#1) against your branch with additional data type tests, and (because of said tests) wrapping string elements in quotes. Use the PR if you want, but regardless, both of those would be good to have.

@google-cla
Copy link

google-cla bot commented Apr 6, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@cowwoc
Copy link
Author

cowwoc commented Apr 6, 2021

Other than the small comments, I made a PR (cowwoc#1) against your branch with additional data type tests, and (because of said tests) wrapping string elements in quotes. Use the PR if you want, but regardless, both of those would be good to have.

Merged. Thank you.

Let me know if there is anything else. Otherwise, how do we fix the quick-build so I can merge this PR?

@Craigacp
Copy link
Collaborator

Craigacp commented Apr 6, 2021

The test failure will be fixed in #276. The Javadoc failures we're still working on. Once #276 is merged in then I'll rerun the action and check that the tests complete.

@rnett
Copy link
Contributor

rnett commented Apr 6, 2021

@googlebot I consent.

Copy link
Contributor

@rnett rnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @cowwoc , I'm not done with the review but I'll wait to see you have to say about my comments so far.

* is only a reference to a native tensor allowing basic operations and flat data access.</p>
* {@link RawTensor raw tensors}. The former maps the tensor native memory to an n-dimensional typed
* data space, allowing direct I/O operations from the JVM, while the latter is only a reference to
* a native tensor allowing basic operations and flat data access.</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of (undesirable?) format change in this PR, can you please revert those and only preserve changes related to the dataToString feature?

* @param maxWidth the maximum width of the output in characters ({@code null} if unlimited). This
* limit may surpassed if the first or last element are too long.
*/
static ToStringOptions maxWidth(Integer maxWidth) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this method is in the right place. Maybe move it in ToStringOptions? Also you need to describe the returned value in the javadoc or the checks might complain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do it similar to this:
Layers.Options.create().inputShape(Shape.of(2,2))
without the ellipsis in the CTORs.

* @return the String representation of the tensor elements
* @throws IllegalStateException if this is an operand of a graph
*/
default String dataToString(ToStringOptions... options) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the use of a vararg to handle the optional presence of options is a good idea. Having a second method accepting no parameter would be better.

We use vararg options in the op wrappers because we want to limit the number of methods that ending up in the *Ops classes, which is already more than a thousand. But here it's fine "duplicating" it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @karllessard

* limit may surpassed if the first or last element are too long.
* @return the String representation of the tensor
*/
public static String toString(Tensor tensor, Integer maxWidth) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about having that many ways to print a tensors (if we can call tensor.toString, why having Tensors.toString(tensor)?) It's a bit confusing and I find that that feature maybe spreads out a bit in multiple parts of the code.

I have a new design to propose, let me know what you think. But what about having this toString logic in another called, let say, TensorPrinter, which can be returned or directly invoked in the Tensor class? i.e.

class TensorPrinter {

    TensorPrinter(Tensor tensor, int maxWidth) { ... }

    TensorPrinter withMaxWidth(int maxWidth) {
        return new TensorPrinter(this.tensor, maxWidth);
    }

    String print() {
        return .... (this logic here)
    }
}

interface Tensor {

    String print() {
        return new TensorPrinter(this, null).print();
    }

    TensorPrinter printer() {
        return new TensorPrinter(this, null);
    }
}

Tensor t = TFloat32.scalarOf(10.0f);
t.print();
t.printer().maxWidth(10).anotherOption(234).print();

Copy link
Author

@cowwoc cowwoc Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer your proposal. It is exactly what I was hoping to implement in the long term but I ended up pushing what you see above as a stepping stone in the right direction.

Also, consider the relationship (if any) between Ops.print() and this functionality. I may be wrong, but I believe the C++ implementation of tf.print() returns roughly what we're trying to implement in this PR. Does it make sense to have Ops.print() invoke this new code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call it print() and not toString()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, then I would then expect it to work the same as Ops.print() since the two share the same name. Are we planning to merge the two?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ops.print() simply writes a given string to an output stream, like the console, when the graph is being executed.

I picked the "print" expression in this example just to show the logic but it can for sure be named otherwise to avoid the confusion. Maybe TensorStringifier, tensor.stringify() and tensor.stringifier()? Or the methods could remain dataToString() as well.

"actual : " + tensor + "\n" +
"dataType: " + tensor.dataType() + "\n" +
"class : " + tensor.getClass());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you only accept typed tensors for this method, the endpoint should probably be added to TType instead of Tensor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karllessard Is there a way for me to handle all tensors (not just TType)? It's okay if not. I'm just asking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karllessard How do you create a RawTensor that doesn't inherit from NdArray? Or more precisely how do you create a plain Tensor object that doesn't inherit from NdArray without using one of the org.tensorflow.types classes?

RawTensor rawTensor = RawTensor.allocate(TFloat32.class, Shape.of(2, 2), -1);
Still inherits from NdArray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't user-side, but all of the Tensor.fromHandle methods don't inherit, afaik. And in places where that's used (Session, somewhere in Operand.asTensor()) if a asTypedTensor() is missed somewhere it's nice to not crash when printing. However, afaik, that's covered by the tensor = ((RawTensor) tensor).asTypedTensor(); line, so the method should support all Tensors not just TType.

Copy link
Contributor

@JimClarke5 JimClarke5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have quite a bit of Tensor print helper classes that I use in test/java/org.tensorflow.framework.utils.TestSession, EagerTestSession, and GraphTestSession.

I would like to use this PR in those classes to make sure nothing is missed.
I'll test that and let you know how it goes.

@cowwoc
Copy link
Author

cowwoc commented Apr 13, 2021

@JimClarke5 Any chance you could pick up the torch (apply the changes that Karl asked for) on my behalf? I won't have time to pick this up for another week or two.

@JimClarke5
Copy link
Contributor

@cowwoc OK, one problem I have is I need to work with my latest version of the xxxSession classes that I am using for Models and Layers, I assume you could cherry pick the Tensor related changes.

@cowwoc
Copy link
Author

cowwoc commented Apr 13, 2021

@cowwoc OK, one problem I have is I need to work with my latest version of the xxxSession classes that I am using for Models and Layers, I assume you could cherry pick the Tensor related changes.

@JimClarke5 You should be able to fork my PR, merge in your changes and push a new PR. I will approve it which should merge the changes back into this initial PR.

@JimClarke5
Copy link
Contributor

I understand that, but I need to test with my latest code first.

@JimClarke5
Copy link
Contributor

Will Operand.dataToString() work in Graph mode with out first doing a try catch?

try (Tensor tensor = session.runner().fetch(input).run().get(0)) {
     tensor.dataToString()
}

@JimClarke5
Copy link
Contributor

Would we want the printed values enclosed in braces ( {} ) rather than square brackets( []}. Braces make it easier to copy and past into Java code.

@JimClarke5
Copy link
Contributor

I think we should pattern the output after java.util.Arrays.deepToString(), which takes us back to brackets with commas.

float[][][] f = new float[][][] {
            {{1,2}, {3,4}},
            {{5, 6}, {7, 8}},
            {{9, 10}, {11, 12}}
};   
System.out.println(Arrays.deepToString(f));
// [[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]], [[9.0, 10.0], [11.0, 12.0]]]

@cowwoc
Copy link
Author

cowwoc commented Apr 13, 2021

Will Operand.dataToString() work in Graph mode with out first doing a try catch?

try (Tensor tensor = session.runner().fetch(input).run().get(0)) {
     tensor.dataToString()
}

Per #268 (comment) I was asked to only support eager mode. I suggest discussing this part with @rnett.

I think we should pattern the output after java.util.Arrays.deepToString(), which takes us back to brackets with commas.

float[][][] f = new float[][][] {
            {{1,2}, {3,4}},
            {{5, 6}, {7, 8}},
            {{9, 10}, {11, 12}}
};   
System.out.println(Arrays.deepToString(f));
// [[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]], [[9.0, 10.0], [11.0, 12.0]]]

My 2 cents: Make the default output human-friendly (i.e. square brackets as TF.print() and others do) and provide a Builder option that will request Java-style output. I find the indentation I provided above easier to read (for humans) than Java code, but I agree that the latter is useful functionality.

@rnett
Copy link
Contributor

rnett commented Apr 13, 2021

Per #268 (comment) I was asked to only support eager mode. I suggest discussing this part with @rnett.

That's for Operands, Tensors should work fine regardless (which is what was in Jim's example).

My 2 cents: Make the default output human-friendly (i.e. square brackets as TF.print() and others do) and provide a Builder option that will request Java-style output. I find the indentation I provided above easier to read (for humans) than Java code, but I agree that the latter is useful functionality.

I'd agree with this: default to tf.print style but have options. I don't find the default Java style to be very readable for large tensors.

@JimClarke5
Copy link
Contributor

@rnett @cowwoc FYI, in Graph mode, accessing Operand.asTensor() will fail without calling the session fetch first.

@rnett
Copy link
Contributor

rnett commented Apr 13, 2021

@rnett @cowwoc FYI, in Graph mode, accessing Operand.asTensor() will fail without calling the session fetch first.

Yeah, that's why we didn't have dataToString() do anything in Graph mode. It just calls asTensor().dataToString(), so it will do the same thing as asTensor().

@JimClarke5
Copy link
Contributor

@cowwoc Your branch is behind master and that is causing me build issues.

@cowwoc
Copy link
Author

cowwoc commented Apr 14, 2021

@JimClarke5 One sec, I'll merge master into it.

@cowwoc
Copy link
Author

cowwoc commented Apr 14, 2021

@JimClarke5 Try now.

@JimClarke5
Copy link
Contributor

Actually the print code is keying on inheriting from NDArray. It has nothing to do specifically with TType. That is why @karllessard ’s comment is throwing me off.

@karllessard
Copy link
Collaborator

@JimClarke5 yeah maybe my comment wasn't clear. I was just pointing out that the dataToString() method could be added in TType instead of Tensor since only typed tensors are supported anyway. The dataToString() version for raw tensors is converting to a typed tensor before printing it, something the user could do directly if needed.

But I'm also fine leaving it there, in Tensor, if that makes things easier or if you think it make more sense.

@ajs1998
Copy link

ajs1998 commented Oct 16, 2022

Any updates on this? I am very new to Tensorflow and I have no idea how to visualize simple output. This seems perfect but it's been a year an a half since the last comment.

@karllessard
Copy link
Collaborator

@ajs1998 you can print the value of any operand (Operand) using TF debug operations, such as tf.print(tf.strings.stringFormat(tensors...)) for example.

But if you want to print the values of a Tensor object itself without iterating through it, then that is still lacking afaik and should probably be added to the ndarray module instead. Could be an interesting and easy first PR for anyone interested to contribute to the project!

@ajs1998
Copy link

ajs1998 commented Oct 16, 2022

@karllessard Excellent! Didn't know I could print an operand like that, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to convert Tensor to String representation
7 participants