-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint/Restart framework #32
Comments
I really think you're asking for trouble trying to do it that way. If you pickle a class. Then change the class, then try to reload it you're gonna have a bad time. I would explicitly save all the data that needs to be saved.
etc |
that's how I thought of it in the first place but then I got stuck on the modules, as each module can have any amount of arbitrary information which you don't want to manage from the MC class. To do it from python (and manually), the only way around that I see is this: each module pickles itself in a separate pickle file, then when MC.load() is called this tries to reset the states of the modules from their respective pickle files. For example there will be mc.cp.pickle, takestep.cp.pickle and so on to load from. This is the only way I see to maintain the modularity, since MC does not necessarily know everything the modules know. |
Yes, that could work. Maybe something like
|
I started writing my own basic sqlite3 wrapper for c++ but then realised that there is a large number of c++ wrappers out there, see here. |
It would be better to find an existing package. Try to find one that is
|
this question, however, suggests that there is really no need to write an exhaustive wrapper as one can use sqlite directly from c++. So maybe I should continue writing the the most simple functions myself and then use sqlite directly from a SqliteDB.execute( command ) function (where SqliteDB is my own class)? |
OK, give it a try and see how far you get. You may run into trouble when you try to put a std::vector into the database. |
I would agree that if in doubt and if we need a wrapper we should not write it ourselves. |
So I have been working on writing an insert vector function and since sql does not understand array one has to turn the vector into a stream and then back. Picking up things around the web I came up with this serialization and deserialization of vectors, please let me know whether you think it can be improved (I am not an expert with stream and string manipulation in c++): template<typename T>
std::string serialize_vector(std::vector<T> vec)
{
std::ostringstream oss;
//store the size of the vector first
oss << vec.size();
if (!vec.empty()){
std::copy(vec.begin(), vec.end()-1, std::ostream_iterator<T>(oss, "\n"));
}
return oss.str();
}
template<typename T>
std::vector<T> deserialize_vector(std::string ser_vec)
{
std::stringstream ss;
size_t size;
std::vector<T> vec;
//turn string into stream
ss.str(ser_vec);
//read vector size
ss>>size;
ser_vec.resize(size);
for (size_t i=0; i<size; i++){
ss >> vec[i];
}
return vec;
} |
Probably in the first function the vector could be passed by (const) reference. |
sorry I had to re-edit a couple of times |
If I read correctly you are turning the vector into a string of numbers, e.g. "1.245 1.434 5.3949". You will run into problems doing it this way. First, be careful about precision, make sure you are printing at extremely high precision. It would be better to serialize it in binary. I don't know how to do this, but presumably you can copy the whole block of memory vector.data() and cast it as a char array. you'll have to store the length somhow (in a header?). |
ok thanks, I'm going to have to read more on searialization then, for example look at how boost does it or how this guy does it |
I imagine you could do something like this serialize a vector
You'll have to be careful because sizeof(size_t) is not constant across platforms. So you should probably store it as something that has constant size (e.g. int32_t) see http://en.cppreference.com/w/cpp/types/integer deserialize a vector
|
Actually, if sqlite tells you the size of the char * array (or void * array) you can use that to determine the size of the vector
or maybe
Then you don't need to bother with storing the size at all. That will simplify things |
this answer has a cleaner way of doing it using streams and reinterpret_cast |
Once I used std::ofstream stm( file_name.c_str(), std::ios::binary ); for (typename std::vector::const_iterator i = vec.begin(); i != stm.write((char_)(&_i), sizeof(F)); } Probably you are looking for more advanced stuff. On 9 September 2014 19:48, Jacob Stevenson [email protected] wrote:
|
Ah, sorry for the repetition. |
ok guys great! I'll give it a go. Let's bear in mind that these vectors must be readable from python too, so this means that the serialize and deserialize functions will have to be wrapped in cython |
Maybe this would be a chance to try out boost.python? On 10 September 2014 11:02, Stefano Martiniani [email protected]
|
that's what I thought too |
so this is what I came up with, based on the stack overflow suggestion: /* Serialize and deserialize functions. These only work for vectors of Plain Old Data structures.
* Furthermore these only work if the vectors contain only a single type and no pointers or
* references.
*/
template<typename T>
std::string serialize_vector(const std::vector<T>& vec)
{
static_assert(!std::is_pointer<T>::value || !std::is_reference<T>::value,"type is pointer or reference");
std::ostringstream strm;
strm.write(reinterpret_cast<const char*>(&vec[0]), vec.size()*sizeof(T));
return strm.str();
}
template<typename T>
std::vector<T> deserialize_vector(const std::string& ser_vec)
{
static_assert(!std::is_pointer<T>::value || !std::is_reference<T>::value,"type is pointer or reference");
std::stringstream strm(ser_vec);
const size_t length = ser_vec.size() / sizeof(T);
std::vector<T> vec(length);
strm.read(reinterpret_cast<char*>(&vec[0]), length*sizeof(T));
return vec;
} |
I think that will work just fine. I assume you are working with strings because you will save it in the SQL database as a string type? It would be cleaner to store it as a BLOB type (or something similar that is just a chunk of memory). |
so you think I should return and take a pointer to void in serialize and deserialize respectively? Then static_cast the pointer to a string and pass it to a stream? For instance for deserialize: template<typename T>
std::vector<T> deserialize_vector(const void* mem_block)
{
static_assert(!std::is_pointer<T>::value || !std::is_reference<T>::value,"type is pointer or reference");
const size_t size = *(int_type *) mem_block;
void * vec_block = mem_block + sizeof(int_type)
const size_t length = size / sizeof(T);
std::vector<T> vec(length);
std::string casted_memory(static_cast<char*>(vec_block), size);
std::istringstream strm(casted_memory);
strm.read(reinterpret_cast<char*>(&vec[0]), length*sizeof(T));
return vec;
} where I have used your earlier suggestion to figure out the size of a vector |
The documentation will tell you how SQL accepts and returns blob data |
ok, it's quite a bit of code but at least it's already written: Sqlite blob example |
How about we use this strategy instead? It's a combination of boost::serialize and blob storing in the sqlite database. Overall I think is cleaner than our current approach, it's more concise and probably the best long term strategy. |
I just don't think you need boost::serialize. Vectors are already serialized, how would boost::serialize help? |
It turns the class into one big blob and it deserialises it. Right now we Said that, this might not be so useful when it comes to reading from the
|
exatly. And we don't want to make it easy to serialize any class because that's very dangerous, especially for a rapidly changing probject. If you change the definition of a class you then won't be able to extract from the database. I can't see any need to store anything other than
All of these are trivially serialized. Let's not provide functionality that we don't want people to use. |
I gave this some more thoughts and it will be a pretty serious task, with some issues. Let's start with the one big issue I know of:
there is a bug in the stl rng for gcc4.6 (fixed in gcc4.7) : the state of the rng is not saved or loaded correctly (from stream) so that you can't just continue a calculation on the same random number sequence (hence any test, even if we reload the rng will yield different results). This issue is true at least if the state is read from a stream (I am not sure about other serialization methods, see boost serialize for example). Currently all our computers use gcc4.6.
About the implementation:
each module will have to have a save and load function and MC will have to have a save and load function in turn that saves its own state and calls the respective functions on the modules. I can't see a way around this because some of the modules have their own rng and a set of members that must be reinstated. Then it actually remains the problem of I/O for all this information, it's not obvious to me how to serialize it out of c++, considering that there is also the python layer that needs to be taken care of. Online I found a lot of references to boost serialize to do this sort of things.
Boost serialize would introduce an additional dependency (it could be made optional because check-pointing is a non essential feature). This still requires quite a bit of writing at the c++ level and remains the question of how to make this compatible with the python layer, although it doesn't seem impossible to do (one would have to rebuild the object from python and then call the MC::load function that will make sure everything goes back to the old state).
If we can do this, then re-instating the python layer state is simple because it's just a matter of reloading self.dict before calling mc.load. Unless there will be unforeseen complications.
Oh let's not forget that the potential and optimizer class should be serialized too for proper checkpointing, so the Pele potentials and minimizers would have to undergo this revision too, at least in principle.
This is the only way I can see how to implement a proper check-pointing system without something like BLCR system.
Since this sounds like a ton of work to me I would like to get as many suggestions as possible and get everyone to agree on the best course of action.
The text was updated successfully, but these errors were encountered: