Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 21 additions & 16 deletions libraries/packing.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import numpy as np
import math


def pack(polycube: np.ndarray) -> int:
Expand All @@ -15,13 +16,16 @@ def pack(polycube: np.ndarray) -> int:

"""

pack_cube = np.packbits(polycube.flatten(), bitorder='big')
cube_hash = 0
for index in polycube.shape:
cube_hash = (cube_hash << 8) + int(index)
for part in pack_cube:
cube_hash = (cube_hash << 8) + int(part)
return cube_hash
# pack_cube = np.packbits(polycube.flatten(), bitorder='big')
# cube_hash = 0
# for index in polycube.shape:
# cube_hash = (cube_hash << 8) + int(index)
# for part in pack_cube:
# cube_hash = (cube_hash << 8) + int(part)
# return cube_hash

data = polycube.tobytes() + polycube.shape[0].to_bytes(1, 'big') + polycube.shape[1].to_bytes(1, 'big') + polycube.shape[2].to_bytes(1, 'big')
return int.from_bytes(data, 'big')
Copy link
Contributor

@VladimirFokow VladimirFokow Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need int.from_bytes(data, 'big'), or can we simply return data?

There is some time overhead of int.from_bytes.
Not large for polycube of size 8000, for example (about 10 µs), but can add up, and is more noticeable for larger sizes.

int.from_bytes constructs an integer (if data is very big, e.g. 100_000 bytes - this can take a very long time).

  • But the bytes objects are already comparable (for the get_canoincal_packing function in cubes.py).
  • And when adding cube_hash to the known_hashes in the generate_polycubes function in cubes.py, the set internally computes a 64-bit integer hash(cube_hash) anyway.

So the int specifically is not required; bytes will be enough for our purposes, right?


Maybe cube_hash would be better named as cube_id - because it's not yet a hash (which set uses internally), it's just another representation of a numpy array - that is hashable and comparable, and which corresponds to our cube (rotationally invariantly).

pull request: bertie2#1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had time to try it, so may or may not help in practice, but I did wonder if cube_hash should be like a true hash - e.g a value that indicates that 2 shapes MAY be identical (e.g. collisions allowed) and if a hash match is found then those candidates would be tested for true equality.

I thought the hash could be an hash of these properties combined (which I think should yield the same hash for all rotations, so maybe cutting time down for rotations?):-

  • As now the width/height/depth (sorted to ensure all rotations give same hash)
  • the number of cubes in the shape
  • a list of numbers, that are the number of cubes in each 3d "slice" of the shape, again sorted to ensure rotations give same hash.

E.g a 2x2x2 cube with one corner missing would be (2,2,2,
7,
((3,4),(3,4),(3,4) --to be sorted in some way
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RibenaJT
Your comment seems unrelated to the changes that I've proposed in my comment.

But I'll reply to you here:
oh, so you think of like a heuristic - even before any hash calculations for all 24 rotations...
However:

  • in case of a collision, how would they be tested for equality - would still need to consider all 24 rotations, right?
  • in case of not collision, would still need to consider all 24 rotations - for the future comparisons (I think almost surely some future polycubes will collide with the current one, so might as well not wait until this happens and compute the cube_id right away)
    So we're calculating these 24 rotations in any case anyway?

  • "the number of cubes in the shape" - I don't think it's needed, we are not computing the polycubes of different N at the same time; so the polycubes of different N are never stored in the same set, and so can never be confused with each other, unless you have a different application in mind
  • What do you mean by 3d "slice"? - a 2d slice of each layer I assume, from bottom to top layer, for example

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry it should probably been a new comment.

Yes, if there was a collision - you would still need to compare all 24 rotations.
If no collision - I don't think you need to rotate, since the hash should be the same for all rotations (if the properties are sorted in a way that makes the orientation of the shape irrelevant, e.g. sorting the w/h/d/slices so that e.g. a 3x2x4 oblong would have an hash of {2,3,4,{sorted_slices}} for all orientations) - therefore we know it is a new shape.

It was just a vague idea - it all depends on how "discriminating" the hash is to how effective this would be (versus the cost of rotating).

Yes, by 3d slice, I meant taking all (2d) slices of the shape in all 3 axis.



def unpack(cube_hash: int) -> np.ndarray:
Expand All @@ -36,14 +40,15 @@ def unpack(cube_hash: int) -> np.ndarray:
np.array: 3D Numpy byte array where 1 values indicate polycube positions

"""
parts = []
while (cube_hash):
parts.append(cube_hash % 256)
cube_hash >>= 8
parts = parts[::-1]
shape = (parts[0], parts[1], parts[2])
data = parts[3:]

length = math.ceil(math.log2(cube_hash))
parts = cube_hash.to_bytes(length, byteorder='big')
shape = (
parts[-3],
parts[-2],
parts[-1],
)
size = shape[0] * shape[1] * shape[2]
raw = np.unpackbits(np.array(data, dtype=np.uint8), bitorder='big')
final = raw[0:size].reshape(shape)
raw = np.frombuffer(parts[:-3], dtype=np.uint8)
final = raw[(len(raw) - size):len(raw)].reshape(shape)
return final