Skip to content
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions medcat-service/env/app.env
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ SERVER_PORT=5000
SERVER_WORKERS=1
SERVER_WORKER_TIMEOUT=300
SERVER_THREADS=1
SERVER_GUNICORN_MAX_REQUESTS=1000
SERVER_GUNICORN_MAX_REQUESTS_JITTER=50

# set the number of torch threads, this should be used ONLY if you are using CPUs and the default image
# set to -1 or 0 if you are using GPU
Expand Down
4 changes: 3 additions & 1 deletion medcat-service/env/app_deid.env
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ APP_MODEL_REL_PATH_LIST=
# MedCAT Model Pack path
# IMPORTANT: if this parameter has value IT WILL BE LOADED FIRST OVER EVERYTHING ELSE (CDB, Vocab, MetaCATs, etc.) declared above.
# Respect the same paths as above : /cat/models/model_pack_name.zip
APP_MEDCAT_MODEL_PACK=
APP_MEDCAT_MODEL_PACK=/cat/models/medcat_v2_deid_model_691c3f6a6e5400e7_686dfbf9c3c664e0.zip

# optionally, an filter the reported concepts by CUIs
# APP_MODEL_CUI_FILTER_PATH=/cat/models/cui_filter.txt
Expand All @@ -36,6 +36,8 @@ SERVER_PORT=5000
SERVER_WORKERS=1
SERVER_WORKER_TIMEOUT=300
SERVER_THREADS=1
SERVER_GUNICORN_MAX_REQUESTS=1000
SERVER_GUNICORN_MAX_REQUESTS_JITTER=50

# set the number of torch threads, this should be used ONLY if you are using CPUs and the default image
# set to -1 or 0 if you are using GPU
Expand Down
4 changes: 2 additions & 2 deletions medcat-service/medcat_service/dependencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@
log = logging.getLogger(__name__)


@lru_cache
@lru_cache(maxsize=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the comment below for more detail.

Perhaps we can get away with something simpler for the settings singleton, i.e

_def_settings: Optional[Settings] = None

def get_settings() -> Settings:
    global _def_settings
    if _def_settings is None:
        _def_settings = Settings()
    return settings

That way the get_settings method will always return the same instance and thus the caching for get_medcat_processor should always result in the same instance since the argument is always the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, feels right to make it explicitly a global singleton. Would keep that log line in from before and really confirm it never gets created again

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we fully force a singleton settings instance, I fail to see how we could ever have multiple MedCATProcessor instances from the cached method.

But yes, keeping the log message does still make sense!

def get_settings() -> Settings:
settings = Settings()
log.debug("Using settings: %s", settings)
return settings


@lru_cache
@lru_cache(maxsize=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is kind of counteproductive. It may (effectively) fix the memory leak. But it doesn't actually fix the underlying issue. And as far as I know, that is the fact that we expect only 1 instance of MedCatProcessor. Since the Settings class is frozen, one should always be using the same instance, and as such, getting the same instance of cached MedCATprocessor.

So I think that the issue is in how FastAPI uses the dependency and by doing so seems to omit the lru_cache when claling the get_settings method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just the easy way to enforce the mcat proc singleton, we can switch to the lifespan manager then and see how it behaves

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing though - it doesn't enforce a singleton. You'll still have multiple instances. They just won't be cached. And I understand the caching was the culprit for the memory leak, but having multiple instances doesn't seem to be the intended design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that working out how there are multiple instances is the main thing.

In general I'm surprised by this being the fix - so I'm not really feeling this is likely the right answer ?

I literally followed the fastapi docs for settings here https://fastapi.tiangolo.com/advanced/settings/#lru-cache-technical-details. Only diff I see is I imported Settings directly from config instead of import config.

I'd be really surprised here if we have some situation that nobody else has encountered, like FastAPI + torch is standard. The only rare thing I'd think we do is that I set it to use gunicorn just to keep that prexisting torch thread code that exists - so maybe theres an issue somewhere there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard suggestion is we switch to unicorn (and turn the logging up to info on those lru_cache methods) and confirm if it still has the error. My hope is fastAPI copied code + torch + everything default = no issue. Could make a flag to in the startup script to switch between uvicorn and gunicorn.

Copy link
Member Author

@vladd-bit vladd-bit Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok clearly my misuse of the term 'singleton' was oblivious, but yes, the 'fix' proposed is hacky, so, update time Regarding gunicorn with uvicorn, we are using 1 worker anyways, it shouldn't be an issue unless there some specific bug we didnt discover yet, we can also try an alternative to uvicorn and just use Asgi middleware (but for now im not into this approach, I didn't test it yet), and now with the singleton mc-proc if there will still be memory leaks it will be from the CAT object itself. I will give it a test today.

def get_medcat_processor(settings: Annotated[Settings, Depends(get_settings)]) -> MedCatProcessor:
log.debug("Creating new Medcat Processsor using settings: %s", settings)
return MedCatProcessor(settings)
Expand Down
11 changes: 11 additions & 0 deletions medcat-service/start_service_production.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,15 @@ if [ -z ${SERVER_WORKER_TIMEOUT+x} ]; then
echo "SERVER_WORKER_TIMEOUT is unset -- setting to default (sec): $SERVER_WORKER_TIMEOUT";
fi

if [ -z ${SERVER_GUNICORN_MAX_REQUESTS+x} ]; then
SERVER_WORKER_TIMSERVER_GUNICORN_MAX_REQUESTSEOUT=3600;
echo "SERVER_GUNICORN_MAX_REQUESTS is unset -- setting to default (sec): $SERVER_GUNICORN_MAX_REQUESTS";
fi

if [ -z ${SERVER_GUNICORN_MAX_REQUESTS_JITTER+x} ]; then
SERVER_GUNICORN_MAX_REQUESTS_JITTER=50;
echo "SERVER_GUNICORN_MAX_REQUESTS_JITTER is unset -- setting to default (sec): $SERVER_GUNICORN_MAX_REQUESTS_JITTER";
fi

SERVER_ACCESS_LOG_FORMAT="%(t)s [ACCESS] %(h)s \"%(r)s\" %(s)s \"%(f)s\" \"%(a)s\""

Expand All @@ -50,5 +59,7 @@ exec gunicorn \
--error-logfile=- \
--log-level info \
--config /cat/config.py \
--max-requests="$SERVER_GUNICORN_MAX_REQUESTS" \
--max-requests-jitter="$SERVER_GUNICORN_MAX_REQUESTS_JITTER" \
--worker-class uvicorn.workers.UvicornWorker \
medcat_service.main:app
Loading