-
-
Notifications
You must be signed in to change notification settings - Fork 84
Performance
Since ble.sh
is implemented in Bash script, it can be slow depending on the
environment, setup, programs running in background, and other factors. In some
cases, performance might be improved by changing the setup.
Related discussions are also found in Issues - label:performance
.
Bash 3: We recommend using ble.sh
with Bash >= 4.0 although ble.sh
supports Bash >= 3.0. Bash < 4.0 lacks the feature necessary to run
additional processing in background, so ble.sh
blocks the user's inputs until
it completes all the processing for every byte in the input stream. For the
same reason, ble.sh
cannot provide the support for auto-complete
and
menu-filter
with Bash < 4.0. In addition, Bash < 4.0 does not provide a way
to reliably bind a shell function to C-d, so ble.sh
uses a child
process to detect C-d, which can increase the overhead and also can
be fragile depending on the setup.
macOS Bash: Bash < 4.0 are outdated and currently rare in modern distributions, but macOS still ships Bash 3.2 for a licensing issue. If you are a macOS user and use Bash 3.2 shipped with macOS, please install the latest version of Bash and switch the login shell of the user account.
Here, we illustrate the Bash installation and setup using Homebrew. If you have not yet installed Homebrew, please complete it by following the instruction on the homepage of Homebrew. After Homebrew is set up, you can install the latest version of Bash by the following command:
# brew install bash
The latest version of Bash will typically be installed at
/opt/homebrew/bin/bash
. However, to use the installed version of Bash in
terminals, you also need to *change the login shell of your account to
/opt/homebrew/bin/bash
. To do it, you first need to make sure
/opt/homebrew/bin/bash
is included in /etc/shells
. If it is not included,
please add a newline /opt/homebrew/bin/bash
in /etc/shells
.
# edit /etc/shells # <-- please replace "edit" with a text editor (such as vi, nano, etc)
Then, run the following command to change the login shell of your account
(which is typically recorded in /etc/passwd
):
# chsh -s /opt/homebrew/bin/bash
Some terminals directly reference the passwd entry to determine the user's
login shell, but most terminals in the market reference the environment
variable SHELL
, which is initialized by the passwd entry on the user login to
the GUI session (i.e., the window manager). For this reason, you will probably
have to reboot the system to make sure the latest version of Bash is used by
your terminal.
After rebooting, you can confirm the Bash version of the current session by pressing C-xC-v. Another way to check the Bash version is to use the following command:
$ "$BASH" --version
Note: The following is INCORRECT. It prints the version of Bash found first in
the current environment variable PATH, which is unrelated to the version of
Bash of the current session.
$ bash --version # WRONG WAY TO CHECK THE CURRENT SHELL
Note: The following is also INCORRECT. It prints the version of the login
shell of the current user, which is not the version of Bash of the current
session. Your terminal probably uses the login shell stored in SHELL for the
interactive shell, but it is not ensured actually. Even if it is the case,
when a different shell is started inside the terminal, the shell of the current
session can be different from the login shell.
$ echo "$SHELL" # WRONG WAY TO CHECK THE CURRENT SHELL
MSYS1 Bash: Until recently, MSYS1 provided by the MinGW project was shipped with Bash 3.1. MSYS1 Bash has other compatibility issues as well as the aforementioned performance issues. If you still use MSYS1, please switch to MSYS2.
Bash devel/alpha/beta: We recommend using a release version of Bash. The
devel/test versions of Bash can be extremely slow with ble.sh
because of the
slow memory allocation for debugging used by the devel/test versions of Bash.
When ble.sh
detects a devel/test version in its initialization, it prints a
warning to stderr
. If you want to use ble.sh
with a devel/test version of
Bash, it is recommended to build Bash with the configure option
--with-bash-malloc=no
for practical performance:
~/bash-devel$ ./configure --with-bash-malloc=no
~/bash-devel$ make all
~/bash-devel$ make install
To suppress the warning message on the startup, please specify the option
--bash-debug-version=short
or once
or ignore
when sourcing ble.sh
.
# bashrc
# Show a short version of the message
source /path/to/ble.sh --bash-debug-version=short
# Do not print the warning message more than once
source /path/to/ble.sh --bash-debug-version=once
# Show the warning message only once for each debug version of Bash
source /path/to/ble.sh --bash-debug-version=ignore
4.2 <= Bash < 5.3: If you also care about the memory footprint, one should avoid the version range 4.2 <= Bash < 5.3, which has a bug of storing duplicate data of shell functions. The bug is fixed in Bash >= 5.3 (which has not yet been released as of 2024-11-07).
ble.sh
is mainly tested in Linux (Fedora), where I do not feel any
performance issue except for the case of massive completion candidates.
However, for specific systems, there seems to be sometimes performance issues.
For example, Bash may consume much computational resources when Bash is run on
top of an emulation layer, such as Cygwin/MSYS2/WSL1/Termux. The overhead by
ble.sh
can also be an issue with devices with a lower power, such as
Raspberry Pi and smart phones (with Termux). In the past, macOS users also
seemed to report the slow performance, but we do not hear a similar report
recently.
The performance issue in such systems is due to the systems' inherent
limitation on the computational power, so there is no simple solution. If you
want to use ble.sh
with such systems, you will need to adjust various
ble.sh
settings to see if the situation is improved. See the later sections.
File systems can also be slow. For example, WSL2's /mnt
contains bridges to
the file systems in the Windows subsystem, which internally seems to cause a
round-trip communication for every single syscall and thus extremely slow when a
directory contains many file entries. This affects the executable file search
since PATH
contains the bridges in WSL2 by default
(GitHub#96). A similar
problem may happen with network-based file systems such as NFS, FUSE-SSHFS,
SMBFS, CIFS, etc.
For another example, macOS 10.15 (Sequoia) seems to have introduced the
security check for every attempt of opening file, which may cause a serious
delay in every part of ble.sh
.
The slow file systems cause the performance problem when a TAB completion or
auto-complete
is attempted or when the highlighting of filenames are
attempted. In this case, one can either remove the file systems from the
related search paths (such as PATH
or the ones referenced in completion
settings), or turn off the affected feature.
In the case of WSL2 (which is explained above), one may try removing /mnt/*
from PATH
to see if the situation changes.
# blerc
ble/path#remove-glob PATH '/mnt/*'
Or another way is to change the WSL setting in
/etc/wsl.conf
so that the initial value of PATH
does not include /mnt/*
. If
/etc/wsl.conf
does not exist, you can create a new text file. Then, you can
add the following lines to /etc/wsl.conf
:
[interop]
appendWindowsPath = false
According to the report by @dlyongemallo and reference therein, macOS started to check the file contents on every attempt to open an (executable) file by sending and receiving some information to an Apple server. Although the communication with the Apple server seems to be cached in the system, it still seems to perform the check for every new file. The discussion also suggests that an attempt to open an existing file after changing the file contents would also cause the server access.
ble.sh
uses Bash 5.3's ${ funsub; }
or an equivalent feature in lower Bash
version, but ${ funsub; }
involves a new temporary file for every evaluation.
This causes a serious delay in ble.sh
. According to the report, the shell
startup may take up to 30 seconds with macOS's check enabled. Even after the
starting the session, ble.sh
seems to be affected by the delay.
If you use Terminal.app, in the macOS system settings, you may turn off the security check for the processes started inside the terminal.
-
Input "security" on the top left text box to find "Privacy & Security".
-
Go to the category "Developer Tool". If the category is not found, you can enable it by running the following command in a terminal:
$ sudo spctl developer-mode enable-terminal
-
Press the button [+] and add "Terminal" to the list.
Warning
This setting will turn off the security check of all the processes started in the terminal, so the user needs to be careful not to run random scripts and programs obtained from the internet.
Note
You may add other terminals to the list, but it doesn't seem to be effective. This workaround can only be used with Terminal.app.
Highlighting of the words based on the filenames can be turned off by the following setting:
# blerc
bleopt highlight_filename=
Instead of completely turning off the highlighting of the words, one may set timeouts and limits for the highlighting. One can basically try to reduce the numbers:
# blerc
bleopt highlight_timeout_async=5000
bleopt highlight_timeout_sync=50
bleopt highlight_eval_word_limit=200
Highlighting of the completion candidates can be turned off by the following setting:
# blerc
# Note: This internally sets "bind 'set colored-stats off'".
bleopt complete_menu_color=off
# Note: This internally sets "bind 'set colored-completion-prefix off'".
bleopt complete_menu_color_match=on
Completions and related features can also cause a performance issue when the
number of generated candidates is too large or when the third-party
programmable completion setting takes time. This typically becomes a problem
with auto-complete
because auto-complete
tries to perform the completion
generation in background while the user inputs the text.
To reduce the processing time for the completion generation, you can set related limits, timeouts, and frequency to check the user's input. These settings can be set to smaller numbers to reduce the blocking time.
# blerc
bleopt complete_limit_auto=2000
bleopt complete_limit_auto_menu=100
bleopt complete_timeout_auto=5000
bleopt complete_timeout_compvar=200
bleopt complete_polling_cycle=50
To reduce the time of constructing the menu, one might set the following settings:
# Limit the menu height
bleopt complete_menu_maxlines=10
# Use a simple layout for the menu
bleopt complete_menu_style=dense
To generate completions, ble.sh
uses programmable completion settings, which
are set up by users and third-party frameworks. ble.sh
can be blocked when
the corresponding programmable completion setting hangs or takes time. This is
typically the case when the input delay is increased for the arguments to
specific commands.
This is an issue with the programmable completion setting, and not the problem
of ble.sh
. In the original Readline, this problem might be worked around by
canceling the completion by pressing C-c. However, this still breaks
the terminal layout, so the programmable completion setting is anyway broken or
badly designed. See the following discussions:
- https://github.com/akinomyoga/ble.sh/issues/121#issuecomment-863586579
- https://github.com/akinomyoga/ble.sh/issues/487#issuecomment-2310118328
In this case, one should try to fix the programmable completion setting. To identify the provider of the programmable completion setting, one can first check the completion setting by running the following command:
$ complete -p <command_name>
where <command_name>
is the affected command name (for which the input delay
of arguments becomes significant). If you find -F <func>
(where <func>
is
a string) in the output, <func>
is the shell function name that generates the
completion candidates for the specified command. If you find -C <cmd>
(where
<cmd>
is a string) in the output, <cmd>
is the command name that generates
the completion candidates.
The first thing to consider is to optimize the implementation of the programmable completion setting. If you are not sure, you can consider reporting the performance issue to the provider of the programmable completion setting. You can run the following command to check the filename where the function is defined:
$ (shopt -s extdebug; declare -F <func>)
where <func>
is the identified function name. Or if the completion setting
is specified by -C <cmd>
, you can check the location of the command by
running
$ type -p <cmd>
Based on the file location, you may try to identify the package that provides the file and to improve the implementation in the upstream.
Another possibility is to adjust the behavior of the existing completion
setting using ble/function#advice
.
For example, one may write the setting to turn off the completion for a
specific command in auto-complete
with the following settings:
# blerc
function blerc/disable-progcomp-for-auto-complete.advice {
if [[ $BLE_ATTACHED && :$comp_type: == *:auto:* ]]; then
return 0
fi
ble/function#advice/do
}
_comp_load <command_name1> && ble/function#advice around <func1> blerc/disable-progcomp-for-auto-complete.advice
_comp_load <command_name2> && ble/function#advice around <cmd2> blerc/disable-progcomp-for-auto-complete.advice
...
where <command_name1>
, <command_name2>
, ... are the command names for which
the programmable completion settings are provided, <func1>
, <cmd2>
,
etc. are the function and command names that generate the completion
candidates, which are identified above. The original discussion is found in
#522.
If the programmable completion setting hangs, it might not be a performance
issue but rather a problem with the setup of the programmable completion.
Possible problems with auto-complete
is described on the Reporting
Issue page.
Completions performed in the background might be turned off entirely if you do
not use them. The auto-complete
, menu-filter
, and auto-menu
features can
be turned off by the following settings:
# blerc
bleopt complete_auto_complete=
bleopt complete_menu_filter=
bleopt complete_auto_menu= # This is default; auto-menu is off by default
Instead of completely turning off auto-complete
and auto-menu
, one can
instead specify a delay of starting the processing with the following settings:
# blerc
bleopt complete_auto_delay=500
bleopt complete_auto_menu=500
If a long command line is input, ble.sh
's response becomes slow. ble.sh
is
implemented in a Bash script and cannot use a proper and efficient data
structure. The command line string is stored in a single scalar variable, and
related metadata are stored in a flat array. This implementation limitation
causes the bad scaling of the processing time for a long command line.
ble.sh
limits the command-line length by default, but the limits are set to a
relatively large number by default. If the long command line would be the
problem for you, you may lower the limits of the command length with the
following options.
bleopt line_limit_length=10000
bleopt history_limit_length=10000
The behavior when the limit is reached can also be changed by the following option:
bleopt line_limit_type=editor
For behavioral consistency in navigating through the command history by
up, down, etc., history_limit_length
is better to be
set to equal or less than line_limit_length
.
A large command history can also affect the initialization time and the memory footprint of the interactive Bash process. If you have extremely large command history, you might consider reducing the size limit or removing irrelevant old commands.
If you have the setting like history -a; history -c; history -r
to
synchronize the command history between sessions, please instead use the
following setting:
bleopt history_share=1
The setting history -a; history -c; history -r
loads the command history of
Bash every time a command is executed, and then ble.sh
needs to reload and
process the entire command history every time, which is extremely inefficient.
The overhead becomes particularly significant when the command history size
becomes large. Even if the history size is small, it is a redundant process to
reload the entire command history. The setting like history -a; history -c; history -r
should not be used. The setting bleopt history_share=1
enables
processing only the newly added entries in the command history, and thus it is
recommended.
If ble.sh
naively would emulate the Bash feature HISTCONTROL=erasedups
,
when the feature is enabled with a large command history, it could have been
slow because ble.sh
implements all the processing on Bash arrays in Bash
scripts. For this reason, by default, ble.sh
limits the target command range
within the commands that are added after the current session started when
removing duplicates in the command history.
If you want to recover the original behavior where all the commands are checked for duplicates, you can use the following setting (which may increase the delay of the command starting):
bleopt history_erasedups_limit=
The performance issue shouldn't happen with the default setting basically.
However, if you run a large number of commands in a single session. the
performance of the erasedups
feature might become a problem. In this case,
one can limit the target commands within the last N commands, where N can be
specified in the following way for N = 100:
bleopt history_erasedups_limit=100
For the delay of ESC in the vi/vim editing mode, please check Vi (Vim) editing mode.
Other Bash configurations might set the DEBUG trap, which makes the entire
shell execution slow. Although ble.sh
temporarily removes the user's DEBUG
trap for its internal processing, it may still impact the performance when
ble.sh
's utility is called outside the ble.sh
internal
state.
The DEBUG
trap is typically used to emulate the preexec
hook of Zsh.
- For this purpose, please use
blehook PREEXEC
instead of theDEBUG
trap. - An external framework,
bash-preexec
uses theDEBUG
trap to implement itspreexec
hook. If you want to use thebash-preexec
feature, please useble-import integration/bash-preexec
instead ofbash-preexec
. Even whenbash-preexec
is loaded in able.sh
session,ble.sh
tries to adjust the hook introduced bybash-preexec
. However, this may not be robust, so please consider switching toble-import integration/bash-preexec
. -
Starship uses the
DEBUG
trap when it does not detect other frameworks providing apreexec
mechanism. Since Starship usesble.sh
'sblehook PREEXEC
when it detectsble.sh
, please loadble.sh
before initializing Starship byeval "$(starship init bash)"
. -
Atuin also wants to use the
preexec
hook and relies on eitherbash-preexec
orble.sh
to make itself work properly. Whenble.sh
is loaded,bash-preexec
does not need to be loaded because Atuin internally callsble.sh
'sblehook integration/bash-preexec
.
If you want to identify the cause of the bad performance you experience, you
might use the ble.sh
profiler. The profiler can be started and stopped by
running ble/debug/profiler/start
and ble/debug/profiler/stop
.
$ ble/debug/profiler/start
$ # Do the operation that takes time
$ ble/debug/profiler/stop
The execution times of functions and lines are summarized in files
./prof.$$.*.txt
(where .
represents the working directory when
ble/debug/profiler/start
is called). If you want to change the prefix of the
output files, you can specify it as ble/debug/profiler/start name
so that the
output is written to name.*.txt
. When the output files already exist, the
existing files will be updated by summing up the execution times of the old and
new results.
Warning
Since the profiler internally saves a large size of execution log (increasing by 10 megabytes per second or more when processing undergoes), the profiler should not be turned on for a long time.
Tip
The measurement results include the sleeping time. In particular, entries
containing the string ble/util/idle
, sleep
, msleep
, etc. are related to
the waiting for the user's inputs, so their long execution time would
probably be unrelated to the performance issue.
To get a different type of statistics or to export the result in a different
format, you can adjust the option
bleopt debug_profiler_opts
.
You can also check the option
bleopt debug_profiler_tree_threshold
.
If you want to get a hint on the initialization time, you can also enable the
measurement of ble.sh
's loading time using a modified version of ble.sh
.
You can first rewrite the macro variable measure_load_time
at the beginning
of ble.pp
:
diff --git a/ble.pp b/ble.pp
index a8703e2b..82fb3362 100644
--- a/ble.pp
+++ b/ble.pp
@@ -1,7 +1,7 @@
#!/bin/bash
#%$> out/ble.sh
#%[release = 0]
-#%[measure_load_time = 0]
+#%[measure_load_time = 1]
#%[debug_keylogger = 1]
#%[leakvar = ""]
#%#----------------------------------------------------------------------------
Then, you can rebuild ble.sh
by running make
(and install it if necessary).
The initialization times will be measured and printed when the modified
ble.sh
is loaded.
If you want to add new features in Bash or modify existing parts of ble.sh
,
you will probably want to care about the performance because Bash is in general
slower than C/C++ implementations. Here, we do not discuss the general
optimization universal to any languages (such as proper algorithm and data
structures). We here discuss things that needs a special care for Bash.
The shell functions mentioned here may not be documented in the manual, but you
can check ble.sh
's source code in that case. We usually have code
documentation to important utilities. You can also check the actual
implementation to understand the detailed behavior. The code documentation
might be occasionally explained in Japanese, but even in that case, you can use
machine translation to understand the usage.
In Bash, subshells are implemented by forking processes, which is significantly
slower than the other shell operations. Depending on the operating system, a
single subshell may take about the order of 100 milliseconds. Even on faster
systems like Linux, a fork would take a few milliseconds, which is 50x or 100x
slower than the typical shell operations without forking. The subshells are
used in many places of the shell, such as pipes [cmd1 | cmd2
], command
substitutions [$(cmd)
and `cmd`
], process substitutions [<(cmd)
and
>(cmd)
], and subshell groups [(cmd)
].
Calling external commands are also slow for the same reason. To launch an
external command, Bash first forks itself and also performs exec
. The exec
syscall needs to load the binary image from the file system and perform all the
process initialization, so exec
is also a slow operation at the same order as
fork
.
If the same result can be easily obtained by built-in Bash features, one should
use the built-in Bash features without using subshells. If you want to obtain
stdout
of a command as a string, you can use ble.sh
's shell function
ble/util/assign
:
ble/util/assign varname 'command'
where varname
is the variable name to store the result, and the command can
be specified to 'command'
, which is internally executed by the eval
builtin. There are variants for arrays such as ble/util/assign-words
and
ble/util/assign-array
. The former splits stdout
by white spaces and store
the elements in an array. The latter does the similar but splitting stdout
using newlines.
However, if you need to use an external command to process some part of the
operation, you should try to minimize the number of calls of external commands.
In particular, even when only a part cannot be implemented in the built-in
features, one should consider implementing the whole processing in the external
commands. The awk
command is useful for that purpose because it can easily
do most of the processing that the other tools like sed
, seq
, tr
, cut
,
etc. offer, and because it can also do many operations in a single call.
Also, reading data from streams using the read
builtin could also be
extremely slow because the read
builtin needs to read one byte at most at
once from a stream. This is related to the expectation that the read
builtin
leaves the unprocessed data in the stream for the subsequent operations
(possibly performed by a different process). By design, the read
builtin
reads only a part of the stream, and its length is determined on looking at
newline characters (or specified delimiters) in the stream. For this reason,
there is no way for the read
builtin to predict when the data to be read
would end at all, and the read
builtin can only read one byte at once from
unseekable streams. If the read
builtin would have read the data too much,
that data cannot be seen from the other commands run after the read
builtin.
However, this means that the read
builtin issues system calls as many times
as the data size of the processed data, which would be extremely slow
(e.g. sometimes 2000x slower). Therefore, one should usually avoid using the
read
builtin for reading a large data supplied through a pipe, etc.
When you implement an operation that may take time, you should also consider
allowing a way to cancel the operation in the middle. In particular, the slow
operation should monitor the user's inputs in stdin
, and if the user starts
inputting anything, the operation should be canceled at a good timing. To
check if there is a user input, you can use the shell function
ble/util/is-stdin-ready
:
while (some slow loop); do
if ble/util/is-stdin-ready; then
cancel processing
break
fi
process something
done
Inside completion settings, one should use ble/complete/check-cancel
instead
of ble/util/is-stdin-ready
. When ble/complete/check-cancel
succeeds, one
should cancel the current processing of completion and return exit status
148
.
When one wants to call a slow external shell function or command but wants to
cancel it with a certain condition such as ble/complete/check-cancel
without
re-implementing or modifying the shell function or the command, you may
consider calling it through ble/util/conditional-sync
. This utility calls
the provided command in a background subshell and monitor the condition. When
the condition becomes unsatisfied before the command completes,
ble/util/conditional-sync
kills the background command and immediately
returns. One can also specify a timeout
for the command in the option of
ble/util/conditional-sync
.