Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 62 additions & 40 deletions src/swarm-lang/Swarm/Language/Load.hs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import Data.Map (Map)
import Data.Map qualified as M
import Data.Set (Set)
import Data.Set qualified as S
import Data.Text (Text)
import Data.Text.Encoding qualified as T
import GHC.Generics (Generic)
import Network.HTTP.Simple (getResponseBody, httpBS, parseRequest)
Expand All @@ -40,6 +41,10 @@ import Swarm.Pretty (prettyText)
import Swarm.Util (readFileMayT, showT)
import Swarm.Util.Graph (findCycle)
import Witch (into)
import Swarm.Language.InternCache
import Control.Monad.IO.Class (MonadIO, liftIO)
import System.IO.Unsafe (unsafePerformIO)
import Swarm.Language.InternCache qualified as IC

type ResLoc = ImportLoc Import.Resolved

Expand Down Expand Up @@ -153,6 +158,11 @@ resolveImports ::
m (Set (ImportLoc Import.Resolved), Syntax Resolved)
resolveImports parent = runAccum S.empty . traverseSyntax pure (resolveImport parent)

-- | Cache imported modules. TODO: fix for module tree
moduleCache :: (MonadIO m) => InternCache m (ImportLoc Import.Resolved) (Module Resolved)
moduleCache = unsafePerformIO $ hoist liftIO <$> IC.newInternCache @IO @(ImportLoc Import.Resolved) @(Module Resolved)
{-# NOINLINE moduleCache #-}

-- | Given a parent directory relative to which any local imports
-- should be interpreted, load an import and all its imports,
-- transitively. Also return a canonicalized version of the import
Expand All @@ -170,31 +180,33 @@ resolveImport parent loc = do
-- Compute the canonicalized location for the import, and record it
canonicalLoc <- resolveImportLoc (unresolveImportDir parent <//> loc)
add $ S.singleton canonicalLoc

sendIO (IC.lookupCached moduleCache canonicalLoc) >>= \case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big problem that I can see is that this will never consider reloading a file from disk. In other words, consider the following steps:

  1. Start Swarm.
  2. Load a scenario that imports foo/bar.sw.
  3. Edit foo/bar.sw.
  4. Reload the scenario.

You would like to get the new version of foo/bar.sw, of course, but you will get the old, cached version. To get the new version you would have to completely restart Swarm.

One solution to this I have considered is to add extra data to import locations that allow us to uniquely identify them in time as well as space: a "last modified" timestamp in the case of local files, and a content hash in the case of remote files.

However, there also needs to be a mechanism to identify when we are willing to update. For example, if we are loading a scenario and encounter import https://foo seven times while recursively loading imports, we should only fetch it over the network the first time --- even if (especially if!) the remote file changes while we are still in the process of loading. However, if we then restart the scenario, we have to fetch it over the network again to see whether it changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking that "last modified time" would need to be tracked as well, but I was not sure how exactly.

Here is my proposal:

  • add the timestamp info only to the cache, so the import location data stay the same
  • when loading file use getModificationTime and compare it with the time in cache
    • if they match, reuse the value in cache
    • if not, overwrite the value in cache - the previous value will never be useful again

I have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.

Copy link
Member

@byorgey byorgey Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.

Oh, good point! I like this, it's much simpler. It seems reasonable that files fetched over the network are much less likely to change (and if they do, I might not even be expecting it). Hence it makes sense to never invalidate them from the cache.

As for files on disk, I like your proposal. I still think to be 100% correct we need to account for the fact that while, say, recursively loading all the code for a given scenario, we do not want to update a cached module even if it changes on disk in the middle of loading --- because all the imports of a given module should be consistent across a given scenario. However, I think we can make an issue for this / put it off until later. It won't happen very frequently.

Just cachedModule -> do
sendIO $ putStr "C\a"
modify @(SourceMap Resolved) (M.insert canonicalLoc cachedModule)
Nothing -> M.lookup canonicalLoc <$> get @(SourceMap Resolved) >>= \case
Just _ -> pure () -- Already loaded - do nothing
Nothing -> do
-- Record this import loc in the source map using a temporary, empty module,
-- to prevent it from attempting to load itself recursively
modify @(SourceMap Resolved) (M.insert canonicalLoc $ Module Nothing () mempty)

srcMap <- get @(SourceMap Resolved)
resMod <- case M.lookup canonicalLoc srcMap of
Just m -> pure m -- Already loaded - do nothing
Nothing -> do
-- Record this import loc in the source map using a temporary, empty module,
-- to prevent it from attempting to load itself recursively
modify @(SourceMap Resolved) (M.insert canonicalLoc $ Module Nothing () mempty)

-- Read it from network/disk
mt <- readLoc canonicalLoc
-- Read it from network/disk
mt <- readLoc canonicalLoc

-- Recursively resolve any imports it contains
mres <- traverse (resolveImports (importDir canonicalLoc)) mt
-- sequence :: Maybe (Set a, b) -> (Set a, Maybe b)
let (imps, mt') = sequence mres
-- Recursively resolve any imports it contains
mres <- traverse (resolveImports (importDir canonicalLoc)) mt
-- sequence :: Maybe (Set a, b) -> (Set a, Maybe b)
let (imps, mt') = sequence mres

-- Finally, record the loaded module in the SourceMap.
let m = Module mt' () imps
modify @(SourceMap Resolved) (M.insert canonicalLoc m)
-- Finally, record the loaded module in the SourceMap.
let m = Module mt' () imps
modify @(SourceMap Resolved) (M.insert canonicalLoc m)
sendIO $ IC.insertCached moduleCache canonicalLoc m

pure m

-- Make sure imports are pure, i.e. contain ONLY defs + imports.
validateImport canonicalLoc resMod
-- Make sure imports are pure, i.e. contain ONLY defs + imports.
validateImport canonicalLoc m

pure canonicalLoc

Expand All @@ -217,33 +229,43 @@ validateImport loc = maybe (pure ()) validate . moduleTerm
TConst Noop -> pure ()
t -> throwError $ ImpureImport loc (prettyText t)

-- | Cache file contents for import location.
fileCache :: (MonadIO m) => InternCache m (ImportLoc Import.Resolved) Text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to cache file contents separately from caching processed modules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, parsing the file could fail, so in that case we could reuse a little bit.

But mainly it was easier for me to write as a first test. 😁

fileCache = unsafePerformIO $ hoist liftIO <$> IC.newInternCache @IO @(ImportLoc Import.Resolved) @Text
{-# NOINLINE fileCache #-}

-- | Try to read and parse a term from a specific import location,
-- either over the network or on disk.
readLoc ::
(Has (Throw SystemFailure) sig m, Has (Lift IO) sig m) =>
ImportLoc Import.Resolved ->
m (Maybe (Syntax Raw))
readLoc loc = do
let path = locToFilePath loc
badImport :: Has (Throw SystemFailure) sig m => LoadingFailure -> m a
badImport = throwError . AssetNotLoaded (Data Script) path
withBadImport :: Has (Throw SystemFailure) sig m => (e -> LoadingFailure) -> Either e a -> m a
withBadImport f = either (badImport . f) pure

-- Try to read the file from network/disk, depending on the anchor
src <- case importAnchor loc of
-- Read from network
Web_ {} -> do
-- Try to parse the URL
req <- parseRequest (into @String path) & withBadImport (BadURL . showT)
-- Send HTTP request
resp <- sendIO $ httpBS req
-- Try to decode the response
T.decodeUtf8' (getResponseBody resp) & withBadImport CanNotDecodeUTF8

-- Read from disk
_ -> sendIO (readFileMayT path) >>= maybe (badImport (DoesNotExist File)) pure

src <- sendIO (IC.lookupCached fileCache loc) >>= \case
Just cachedSrc -> pure cachedSrc
Nothing -> do
s <- case importAnchor loc of
Web_ {} -> readFromNet
_ -> readFromDisk
_ <- sendIO $ IC.insertCached fileCache loc s
pure s

-- Finally, try to parse the contents
readTerm' (defaultParserConfig & importLoc ?~ loc) src
& withBadImport (SystemFailure . CanNotParseMegaparsec)

where
path = locToFilePath loc
badImport :: Has (Throw SystemFailure) sig m => LoadingFailure -> m a
badImport = throwError . AssetNotLoaded (Data Script) path
withBadImport :: Has (Throw SystemFailure) sig m => (e -> LoadingFailure) -> Either e a -> m a
withBadImport f = either (badImport . f) pure
readFromDisk = sendIO (readFileMayT path) >>= maybe (badImport (DoesNotExist File)) pure
readFromNet = do
-- Try to parse the URL
req <- parseRequest (into @String path) & withBadImport (BadURL . showT)
-- Send HTTP request
resp <- sendIO $ httpBS req
-- Try to decode the response
T.decodeUtf8' (getResponseBody resp) & withBadImport CanNotDecodeUTF8
80 changes: 80 additions & 0 deletions src/swarm-util/Swarm/Language/InternCache.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{-# LANGUAGE BlockArguments #-}
module Swarm.Language.InternCache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment with a link to the source, https://chrispenner.ca/posts/intern-cache

( InternCache,
newInternCache,
lookupCached,
insertCached,
intern,
hoist,
)
where

import Control.Monad.IO.Class (MonadIO (..))
import Data.HashMap.Strict (HashMap)
import Data.HashMap.Strict qualified as HashMap
import Data.Hashable (Hashable)
import System.Mem.Weak
import UnliftIO.STM

-- | Parameterized by the monad in which it operates, the key type,
-- and the value type.
data InternCache m k v = InternCache
{ lookupCached :: k -> m (Maybe v),
insertCached :: k -> v -> m ()
}

-- | Creates an 'InternCache' which uses weak references to only
-- keep values in the cache for as long as they're reachable by
-- something else in the app.
--
-- This means you don't need to worry about a value not being
-- GC'd because it's in the cache.
newInternCache ::
forall m k v. (MonadIO m, Hashable k)
=> m (InternCache m k v)
newInternCache = do
var <- newTVarIO mempty
pure $
InternCache
{ lookupCached = lookupCachedImpl var,
insertCached = insertCachedImpl var
}
where
lookupCachedImpl :: TVar (HashMap k (Weak v)) -> k -> m (Maybe v)
lookupCachedImpl var ch = liftIO $ do
cache <- readTVarIO var
case HashMap.lookup ch cache of
Nothing -> pure Nothing
Just weakRef -> do
deRefWeak weakRef

insertCachedImpl :: TVar (HashMap k (Weak v)) -> k -> v -> m ()
insertCachedImpl var k v = liftIO $ do
wk <- mkWeakPtr v (Just $ removeDeadVal var k)
atomically $ modifyTVar' var (HashMap.insert k wk)

-- Use this as a finalizer to remove the key from the map
-- when its value gets GC'd
removeDeadVal :: TVar (HashMap k (Weak v)) -> k -> IO ()
removeDeadVal var k = liftIO do
atomically $ modifyTVar' var (HashMap.delete k)

-- | Changing the monad in which the cache operates with a natural transformation.
hoist :: (forall x. m x -> n x) -> InternCache m k v -> InternCache n k v
hoist f (InternCache lookup' insert') =
InternCache
{ lookupCached = f . lookup',
insertCached = \k v -> f $ insert' k v
}

-- | When a value is its own key, this ensures that the given value
-- is in the cache and always returns the single canonical in-memory
-- instance of that value, garbage collecting any others.
intern :: (Hashable k, Monad m) => InternCache m k k -> k -> m k
intern cache k = do
mVal <- lookupCached cache k
case mVal of
Just v -> pure v
Nothing -> do
insertCached cache k k
pure k
6 changes: 6 additions & 0 deletions swarm.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,9 @@ common transformers
common unicode-show
build-depends: unicode-show >=0.1 && <0.2

common unliftio
build-depends: unliftio >=0.2 && <0.3

common unordered-containers
build-depends: unordered-containers >=0.2.14 && <0.3

Expand Down Expand Up @@ -942,6 +945,8 @@ library swarm-util
template-haskell,
text,
transformers,
unliftio,
unordered-containers,
vector,
witch,
witherable,
Expand All @@ -955,6 +960,7 @@ library swarm-util
Swarm.Effect.Metric
Swarm.Effect.Time
Swarm.Failure
Swarm.Language.InternCache
Swarm.Language.Phase
Swarm.Language.Syntax.Direction
Swarm.Language.Syntax.Import
Expand Down
Loading