Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big NDArray generate Operand Tensor meet protobuf exceeded maximum protobuf size of 2GB ? #464

Open
mullerhai opened this issue Jul 4, 2022 · 1 comment

Comments

@mullerhai
Copy link

Hi :
from spark DataFrame generate org.tensorflow.ndarray.DoubleNdArray , after I want to generate Operand[TFloat64] tensor , meet error


scala> val featureVector = SparkConverter.sparkDataframeFeatureVectorConvertTfTensor(finalInputDf,"final_features" )
featureVector: org.tensorflow.ndarray.DoubleNdArray = org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray@e3f6a6a0
scala> val ft  = tf.constant(featureVector)
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/message_lite.cc:451] tensorflow.AttrValue exceeded maximum protobuf size of 2GB: 6279090916
org.tensorflow.exceptions.TFInvalidArgumentException: AttrValue missing value with expected type 'tensor'
         for attr 'value'
        ; NodeDef: {{node Const}}; Op<name=Const; signature= -> output:dtype; attr=value:tensor; attr=dtype:type>
  at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
  at org.tensorflow.EagerOperationBuilder.execute(EagerOperationBuilder.java:314)
  at org.tensorflow.EagerOperationBuilder.build(EagerOperationBuilder.java:77)
  at org.tensorflow.EagerOperationBuilder.build(EagerOperationBuilder.java:64)
  at org.tensorflow.op.core.Constant.create(Constant.java:1350)
  at org.tensorflow.op.core.Constant.tensorOf(Constant.java:521)
  at org.tensorflow.op.Ops.constant(Ops.java:1669)
  ... 59 elided

but if I filter some small part Dataframe is ok


scala> val featureVector = SparkConverter.sparkDataframeFeatureVectorConvertTfTensor(finalInputDf.filter(col("pay_status").equalTo(1)),"final_features" )
featureVector: org.tensorflow.ndarray.DoubleNdArray = org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray@627077a

scala> val ft_small  = tf.constant(featureVector)
ft_small: org.tensorflow.op.core.Constant[org.tensorflow.types.TFloat64] = <Const 'Const_2'>

scala> ft_small.asTensor().numBytes()
res43: Long = 1058424696
@mullerhai
Copy link
Author

need I have to split the DoubleNdArray to some part ? or we have another way to convert it to Operand[T]?

I found we have java.util.Spliterator

scala> featureVector.shape
res46: org.tensorflow.ndarray.Shape = [900021, 147]

scala> featureVector.scalars()
res47: org.tensorflow.ndarray.NdArraySequence[org.tensorflow.ndarray.DoubleNdArray] = org.tensorflow.ndarray.impl.sequence.FastElementSequence@f9698af

scala> featureVector.scalars().spliterator
res48: java.util.Spliterator[org.tensorflow.ndarray.DoubleNdArray] = java.util.Spliterators$IteratorSpliterator@bdc74838

scala> featureVector.scalars().spliterator.trySplit
res49: java.util.Spliterator[org.tensorflow.ndarray.DoubleNdArray] = java.util.Spliterators$ArraySpliterator@f4f92e6e


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant