-
Notifications
You must be signed in to change notification settings - Fork 62
WIP: Cache imports #2588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/import3
Are you sure you want to change the base?
WIP: Cache imports #2588
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,7 @@ import Data.Map (Map) | |
| import Data.Map qualified as M | ||
| import Data.Set (Set) | ||
| import Data.Set qualified as S | ||
| import Data.Text (Text) | ||
| import Data.Text.Encoding qualified as T | ||
| import GHC.Generics (Generic) | ||
| import Network.HTTP.Simple (getResponseBody, httpBS, parseRequest) | ||
|
|
@@ -40,6 +41,10 @@ import Swarm.Pretty (prettyText) | |
| import Swarm.Util (readFileMayT, showT) | ||
| import Swarm.Util.Graph (findCycle) | ||
| import Witch (into) | ||
| import Swarm.Language.InternCache | ||
| import Control.Monad.IO.Class (MonadIO, liftIO) | ||
| import System.IO.Unsafe (unsafePerformIO) | ||
| import Swarm.Language.InternCache qualified as IC | ||
|
|
||
| type ResLoc = ImportLoc Import.Resolved | ||
|
|
||
|
|
@@ -153,6 +158,11 @@ resolveImports :: | |
| m (Set (ImportLoc Import.Resolved), Syntax Resolved) | ||
| resolveImports parent = runAccum S.empty . traverseSyntax pure (resolveImport parent) | ||
|
|
||
| -- | Cache imported modules. TODO: fix for module tree | ||
| moduleCache :: (MonadIO m) => InternCache m (ImportLoc Import.Resolved) (Module Resolved) | ||
| moduleCache = unsafePerformIO $ hoist liftIO <$> IC.newInternCache @IO @(ImportLoc Import.Resolved) @(Module Resolved) | ||
| {-# NOINLINE moduleCache #-} | ||
|
|
||
| -- | Given a parent directory relative to which any local imports | ||
| -- should be interpreted, load an import and all its imports, | ||
| -- transitively. Also return a canonicalized version of the import | ||
|
|
@@ -170,31 +180,33 @@ resolveImport parent loc = do | |
| -- Compute the canonicalized location for the import, and record it | ||
| canonicalLoc <- resolveImportLoc (unresolveImportDir parent <//> loc) | ||
| add $ S.singleton canonicalLoc | ||
|
|
||
| sendIO (IC.lookupCached moduleCache canonicalLoc) >>= \case | ||
| Just cachedModule -> do | ||
| sendIO $ putStr "C\a" | ||
| modify @(SourceMap Resolved) (M.insert canonicalLoc cachedModule) | ||
| Nothing -> M.lookup canonicalLoc <$> get @(SourceMap Resolved) >>= \case | ||
| Just _ -> pure () -- Already loaded - do nothing | ||
| Nothing -> do | ||
| -- Record this import loc in the source map using a temporary, empty module, | ||
| -- to prevent it from attempting to load itself recursively | ||
| modify @(SourceMap Resolved) (M.insert canonicalLoc $ Module Nothing () mempty) | ||
|
|
||
| srcMap <- get @(SourceMap Resolved) | ||
| resMod <- case M.lookup canonicalLoc srcMap of | ||
| Just m -> pure m -- Already loaded - do nothing | ||
| Nothing -> do | ||
| -- Record this import loc in the source map using a temporary, empty module, | ||
| -- to prevent it from attempting to load itself recursively | ||
| modify @(SourceMap Resolved) (M.insert canonicalLoc $ Module Nothing () mempty) | ||
|
|
||
| -- Read it from network/disk | ||
| mt <- readLoc canonicalLoc | ||
| -- Read it from network/disk | ||
| mt <- readLoc canonicalLoc | ||
|
|
||
| -- Recursively resolve any imports it contains | ||
| mres <- traverse (resolveImports (importDir canonicalLoc)) mt | ||
| -- sequence :: Maybe (Set a, b) -> (Set a, Maybe b) | ||
| let (imps, mt') = sequence mres | ||
| -- Recursively resolve any imports it contains | ||
| mres <- traverse (resolveImports (importDir canonicalLoc)) mt | ||
| -- sequence :: Maybe (Set a, b) -> (Set a, Maybe b) | ||
| let (imps, mt') = sequence mres | ||
|
|
||
| -- Finally, record the loaded module in the SourceMap. | ||
| let m = Module mt' () imps | ||
| modify @(SourceMap Resolved) (M.insert canonicalLoc m) | ||
| -- Finally, record the loaded module in the SourceMap. | ||
| let m = Module mt' () imps | ||
| modify @(SourceMap Resolved) (M.insert canonicalLoc m) | ||
| sendIO $ IC.insertCached moduleCache canonicalLoc m | ||
|
|
||
| pure m | ||
|
|
||
| -- Make sure imports are pure, i.e. contain ONLY defs + imports. | ||
| validateImport canonicalLoc resMod | ||
| -- Make sure imports are pure, i.e. contain ONLY defs + imports. | ||
| validateImport canonicalLoc m | ||
|
|
||
| pure canonicalLoc | ||
|
|
||
|
|
@@ -217,33 +229,43 @@ validateImport loc = maybe (pure ()) validate . moduleTerm | |
| TConst Noop -> pure () | ||
| t -> throwError $ ImpureImport loc (prettyText t) | ||
|
|
||
| -- | Cache file contents for import location. | ||
| fileCache :: (MonadIO m) => InternCache m (ImportLoc Import.Resolved) Text | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to cache file contents separately from caching processed modules? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, parsing the file could fail, so in that case we could reuse a little bit. But mainly it was easier for me to write as a first test. 😁 |
||
| fileCache = unsafePerformIO $ hoist liftIO <$> IC.newInternCache @IO @(ImportLoc Import.Resolved) @Text | ||
| {-# NOINLINE fileCache #-} | ||
|
|
||
| -- | Try to read and parse a term from a specific import location, | ||
| -- either over the network or on disk. | ||
| readLoc :: | ||
| (Has (Throw SystemFailure) sig m, Has (Lift IO) sig m) => | ||
| ImportLoc Import.Resolved -> | ||
| m (Maybe (Syntax Raw)) | ||
| readLoc loc = do | ||
| let path = locToFilePath loc | ||
| badImport :: Has (Throw SystemFailure) sig m => LoadingFailure -> m a | ||
| badImport = throwError . AssetNotLoaded (Data Script) path | ||
| withBadImport :: Has (Throw SystemFailure) sig m => (e -> LoadingFailure) -> Either e a -> m a | ||
| withBadImport f = either (badImport . f) pure | ||
|
|
||
| -- Try to read the file from network/disk, depending on the anchor | ||
| src <- case importAnchor loc of | ||
| -- Read from network | ||
| Web_ {} -> do | ||
| -- Try to parse the URL | ||
| req <- parseRequest (into @String path) & withBadImport (BadURL . showT) | ||
| -- Send HTTP request | ||
| resp <- sendIO $ httpBS req | ||
| -- Try to decode the response | ||
| T.decodeUtf8' (getResponseBody resp) & withBadImport CanNotDecodeUTF8 | ||
|
|
||
| -- Read from disk | ||
| _ -> sendIO (readFileMayT path) >>= maybe (badImport (DoesNotExist File)) pure | ||
|
|
||
| src <- sendIO (IC.lookupCached fileCache loc) >>= \case | ||
| Just cachedSrc -> pure cachedSrc | ||
| Nothing -> do | ||
| s <- case importAnchor loc of | ||
| Web_ {} -> readFromNet | ||
| _ -> readFromDisk | ||
| _ <- sendIO $ IC.insertCached fileCache loc s | ||
| pure s | ||
|
|
||
| -- Finally, try to parse the contents | ||
| readTerm' (defaultParserConfig & importLoc ?~ loc) src | ||
| & withBadImport (SystemFailure . CanNotParseMegaparsec) | ||
|
|
||
| where | ||
| path = locToFilePath loc | ||
| badImport :: Has (Throw SystemFailure) sig m => LoadingFailure -> m a | ||
| badImport = throwError . AssetNotLoaded (Data Script) path | ||
| withBadImport :: Has (Throw SystemFailure) sig m => (e -> LoadingFailure) -> Either e a -> m a | ||
| withBadImport f = either (badImport . f) pure | ||
| readFromDisk = sendIO (readFileMayT path) >>= maybe (badImport (DoesNotExist File)) pure | ||
| readFromNet = do | ||
| -- Try to parse the URL | ||
| req <- parseRequest (into @String path) & withBadImport (BadURL . showT) | ||
| -- Send HTTP request | ||
| resp <- sendIO $ httpBS req | ||
| -- Try to decode the response | ||
| T.decodeUtf8' (getResponseBody resp) & withBadImport CanNotDecodeUTF8 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| {-# LANGUAGE BlockArguments #-} | ||
| module Swarm.Language.InternCache | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should add a comment with a link to the source, https://chrispenner.ca/posts/intern-cache |
||
| ( InternCache, | ||
| newInternCache, | ||
| lookupCached, | ||
| insertCached, | ||
| intern, | ||
| hoist, | ||
| ) | ||
| where | ||
|
|
||
| import Control.Monad.IO.Class (MonadIO (..)) | ||
| import Data.HashMap.Strict (HashMap) | ||
| import Data.HashMap.Strict qualified as HashMap | ||
| import Data.Hashable (Hashable) | ||
| import System.Mem.Weak | ||
| import UnliftIO.STM | ||
|
|
||
| -- | Parameterized by the monad in which it operates, the key type, | ||
| -- and the value type. | ||
| data InternCache m k v = InternCache | ||
| { lookupCached :: k -> m (Maybe v), | ||
| insertCached :: k -> v -> m () | ||
| } | ||
|
|
||
| -- | Creates an 'InternCache' which uses weak references to only | ||
| -- keep values in the cache for as long as they're reachable by | ||
| -- something else in the app. | ||
| -- | ||
| -- This means you don't need to worry about a value not being | ||
| -- GC'd because it's in the cache. | ||
| newInternCache :: | ||
| forall m k v. (MonadIO m, Hashable k) | ||
| => m (InternCache m k v) | ||
| newInternCache = do | ||
| var <- newTVarIO mempty | ||
| pure $ | ||
| InternCache | ||
| { lookupCached = lookupCachedImpl var, | ||
| insertCached = insertCachedImpl var | ||
| } | ||
| where | ||
| lookupCachedImpl :: TVar (HashMap k (Weak v)) -> k -> m (Maybe v) | ||
| lookupCachedImpl var ch = liftIO $ do | ||
| cache <- readTVarIO var | ||
| case HashMap.lookup ch cache of | ||
| Nothing -> pure Nothing | ||
| Just weakRef -> do | ||
| deRefWeak weakRef | ||
|
|
||
| insertCachedImpl :: TVar (HashMap k (Weak v)) -> k -> v -> m () | ||
| insertCachedImpl var k v = liftIO $ do | ||
| wk <- mkWeakPtr v (Just $ removeDeadVal var k) | ||
| atomically $ modifyTVar' var (HashMap.insert k wk) | ||
|
|
||
| -- Use this as a finalizer to remove the key from the map | ||
| -- when its value gets GC'd | ||
| removeDeadVal :: TVar (HashMap k (Weak v)) -> k -> IO () | ||
| removeDeadVal var k = liftIO do | ||
| atomically $ modifyTVar' var (HashMap.delete k) | ||
|
|
||
| -- | Changing the monad in which the cache operates with a natural transformation. | ||
| hoist :: (forall x. m x -> n x) -> InternCache m k v -> InternCache n k v | ||
| hoist f (InternCache lookup' insert') = | ||
| InternCache | ||
| { lookupCached = f . lookup', | ||
| insertCached = \k v -> f $ insert' k v | ||
| } | ||
|
|
||
| -- | When a value is its own key, this ensures that the given value | ||
| -- is in the cache and always returns the single canonical in-memory | ||
| -- instance of that value, garbage collecting any others. | ||
| intern :: (Hashable k, Monad m) => InternCache m k k -> k -> m k | ||
| intern cache k = do | ||
| mVal <- lookupCached cache k | ||
| case mVal of | ||
| Just v -> pure v | ||
| Nothing -> do | ||
| insertCached cache k k | ||
| pure k | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big problem that I can see is that this will never consider reloading a file from disk. In other words, consider the following steps:
foo/bar.sw.foo/bar.sw.You would like to get the new version of
foo/bar.sw, of course, but you will get the old, cached version. To get the new version you would have to completely restart Swarm.One solution to this I have considered is to add extra data to import locations that allow us to uniquely identify them in time as well as space: a "last modified" timestamp in the case of local files, and a content hash in the case of remote files.
However, there also needs to be a mechanism to identify when we are willing to update. For example, if we are loading a scenario and encounter
import https://fooseven times while recursively loading imports, we should only fetch it over the network the first time --- even if (especially if!) the remote file changes while we are still in the process of loading. However, if we then restart the scenario, we have to fetch it over the network again to see whether it changed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking that "last modified time" would need to be tracked as well, but I was not sure how exactly.
Here is my proposal:
getModificationTimeand compare it with the time in cacheI have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point! I like this, it's much simpler. It seems reasonable that files fetched over the network are much less likely to change (and if they do, I might not even be expecting it). Hence it makes sense to never invalidate them from the cache.
As for files on disk, I like your proposal. I still think to be 100% correct we need to account for the fact that while, say, recursively loading all the code for a given scenario, we do not want to update a cached module even if it changes on disk in the middle of loading --- because all the imports of a given module should be consistent across a given scenario. However, I think we can make an issue for this / put it off until later. It won't happen very frequently.