Skip to content

Conversation

@afmagee42
Copy link
Contributor

@afmagee42 afmagee42 commented Jan 23, 2025

Resolves #67, and since that changes the behavior of max_infections also resolves #65, should lay the groundwork to make resolving #62 easy.

The core changes here are:

  1. Simulation.create_person(self) is now create_person(self, infector: Optional[str], t_exposed: float, generation: int)
  2. Simulation.run() no longer calls Simulation.create_person() except for the index case. All other calls are in Simulation.generate_infections()

There are a number of cascading changes required to accommodate this, including to Simulate (e.g. adding a new property to track whether an individual was actually simulated or just initialized), to summarization (e.g. now we have to filter out the not-actually-simulated infections to count detections), and to plotting (all around the need to filter out purely-initialized infections).

I think the new approach has some nice side-effects:

  1. It allows us to dispense with Simulate.register_infectee()
  2. The dataframe wrangling in prob_control_by_gen is much more straightforward

This also frees us to remove the redundant "infection_times" property, which is now entirely retrievable from other, more needed, information ([sim.get_person_property(infectee, "t_exposed") for infectee in sim.get_person_property(id, "infectees")]). But that's out of scope here.

Our new hard-coded (at least in the app) max_infections of 1 million is sufficiently robust to we can simulate 4 generations at R=10, which I think indicates we're good to go on that front.
It also means that to test that our exception, you either have to briefly adjust the source code so that it's smaller, or do something pretty absurd.

@afmagee42 afmagee42 marked this pull request as ready for review January 23, 2025 18:08
@afmagee42 afmagee42 requested a review from swo January 23, 2025 18:08
Copy link
Contributor

@swo swo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is an improvement

There are some name changes and docstrings I'd like to see

And then some unactionable comments about separations of powers


class Simulation:
PROPERTIES = {
INIT_PROPERTIES = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's docstring these? I wasn't immediately clear to me what "init" vs. "sim" meant here

Comment on lines +36 to 48
def create_person(
self, infector: Optional[str], t_exposed: float, generation: int
) -> str:
"""Add a new person to the data"""
id = str(len(self.infections))
self.infections[id] = {x: None for x in self.PROPERTIES}
self.infections[id] = {
"id": id,
"infector": infector,
"t_exposed": t_exposed,
"generation": generation,
"simulated": False,
} | {x: None for x in self.SIM_PROPERTIES}
return id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a small, ad hoc simulation, this doesn't matter, but I liked having create_person return an id and do nothing else. Now create_person is a mix of creating a database-entry-person and updating them.

I guess we would have wanted to draw a distinction between a person-as-dictionary vs. a simulated-person, who has some properties.

But again this doesn't matter here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to immediately instantiate people, it seemed to me like it no longer made sense to draw that distinction. Otherwise we'd just be adding functions to the pile that we have to call every time a new infection arises, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there would be an extra function, because the "create empty database entry" and "fill in defaults for this person" would be separate

Again, that might be good if we had a bigger simulation where wanted more clear separation between data mgmt and simulation, but here it doesn't matter so much

def run(self) -> None:
"""Run simulation"""
# queue is pairs (t_exposed, infector)
# queue is of infection ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now it means that methods like generate_infections now directly interact with the underlying data storage, rather than having run() do all that. Which violates the separation of powers a little, but again doesn't matter so much in this ad hoc simulation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure I'm missing something, but that feels like a line we crossed somewhere between "we should instantiate infections earlier" and "this will always be a branching process"

Or, put another way, the reason I saw "this will always be a BP" as necessary scope for "infections should instantiate earlier" was because I don't see how to instantiate infections when their time of infection is drawn without doing it inside the infection simulation routine. Anything else would just land us back where we were, wouldn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just talking about class/static methods vs. instance methods, not necessarily about the ordering of anything

Like, the "resolve infection" function (i.e., figure out the natural history and no. of onward infections for some person) could be a class method that doesn't touch the database, but returns the natural history information back to run(), that in turn interacts with the database

I'm saying: I think there's a way to write code so that run() is the only instance method

# and add the infections they cause to the queue, in time order
id = self.create_person()
self.generate_infection(id=id, t_exposed=t_exposed, infector=infector)
offspring = self.generate_infection(id=id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "infection" is now used to refer to too many things

Like, maybe we want to rename infection_queue to infected_queue? Because now it's a queue of IDs of people who have been infected, and whose onward infections we need to process

And now generate_infection returns a list of infectees, that gets merged with infection_queue, so these are the same species

I guess generate_infection now processes/resolves an infection, and returns the list of downstream infectees

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the names are... not great at the moment. I feel like the less-descriptive queue might be best here.

The generate_ family also seems somewhat inadequate. I was tempted to rename them simulate_ since they're simulating parts of infection histories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've replaced the (time, infector) + dict paradigm with a dict + dict paradigm since they're all just infections now, except some have not been fleshed out to full histories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cf. my note above. I think "generate" might have made more sense if/when that function was a static method that just returned information. But now that it's interacting with the database, I see that "simulate" or "process" or whatever makes more sense

# number of generations
for t in self.get_person_property(id, "infection_times"):
bisect.insort_right(infection_queue, (t, id), key=lambda x: x[0])
for child in offspring:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now "child" is essentially the same species too

== "Cumulative"
)

max_infections = 1000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_infections = 1000000
max_infections = 1_000_000

or int(1e6)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Instantiate infections earlier? Make maximum number of infections an error

3 participants