-
-
Notifications
You must be signed in to change notification settings - Fork 164
Language Design Principles
andychu edited this page Aug 8, 2023
·
95 revisions
-
Failures should not be ignored.
- Example: in bash, when evaluating
strftime
in printf strings like%(%Y)T
, if the result overflows a 128 byte buffer, it's silently truncated!
- Example: in bash, when evaluating
-
Syntax and Semantics Should Correspond
- The same semantics should use the same syntax
- Different semantics should use different syntax
- e.g. discussion in The Five Meanings of #
- e.g.
find -type f
and the I'm too lazy to write a lexer pattern is BANNED!
- Shell has "topped out" in terms of its syntax. It's too elaborate and unfamiliar. We won't add more syntax that looks like
${x@P)
,${x^^}
,cat <<< 'hi'
, orexec 2>&-
. - The common behavior should be the default behavior. The short thing should be the right thing.
- For example, simple word evaluation makes it so that you can use
$var
instead of"$var"
. That's almost always what you want. -
read -r
should have been the default in bash -- i.e. it inhibits backslash processing, which most people didn't intend withread
- Note that
bin/oil
has all the right defaults withshopt --set oil:all
.bin/osh
is compatible.
- For example, simple word evaluation makes it so that you can use
-
Avoid inventing syntax that doesn't exist in any other language. Most of Oil should look familiar to programmers and shell users.
-
@
has precedent in Perl, PowerShell, etc. - the expression syntax comes from Python, JavaScript, etc.
- However, a corollary of the principle above is: If Oil has completely new semantics, then inventing a new syntax is justified.
- See Oil Language Influences
-
-
Minimize the use of global options (
shopt
)- Oil started out with many such options, but I eliminated them over time because it got unwieldy to explain and document.
- There are still many of them and they should be used sparingly. But note that the
strict_
ones don't really have any cost, because they abort your program on disallowed behavior. They don't silently change the semantics. - Rationale: Global state makes code harder to read. It's a "hidden mode".
- They should mostly be hidden under groups like
oil:all
- Counterexample:
simple_word_eval
is probably the most important one that silently changes behavior, and I think it's justified in that case.
- Every feature should have Predictable, Linear Performance (extended globs break this rule with backtracking, so they're in OSH but not Oil)
OSH is a "cleaned up shell/bash" and heavily constrained by compatibility. But there are edge cases where we have to make choices. The spec tests have uncovered dozens of cases where existing shells disagree, so we have to make a choice!
- The Common Subset Principle -- In general, OSH shouldn't introduce incompatible semantics for the same syntax and be very compatible with its legacy shells. It might not run every last bash script. However, in those cases, you should be able to make small modifications to allow your script to run under both, OSH and bash. Most often these changes are to improve clarity.
- Example: In bash,
echo X > @(*.py)
means the same thing asecho X > '@(*.py)'
(yes really). OSH disallows the former for clarity, but the latter is in the common subset of OSH and bash. - Example: The meaning of
()
indeclare -A assoc=()
is changed to obey the common subset principle. It means empty assoc array rather than empty indexed array because the context is clear, and because in bashdeclare -A dict
means something different.
- Example: In bash,
-
Static Parsing
- Dynamic Parsing (parsing at runtime) Confuses Code and Data.
- Consider Interactions Between Language Features (bash doesn't do this, e.g. extended globs)
- Minimize the combined OSH+Oil language size to the degree possible.
- Where Oil duplicates functionality from OSH (like arithmetic), it has to be significantly better.
- This partly explains why we keep OSH string literals in Oil, and why bash
declare -a/-A
behave differently in Oil, and whydeclare -i
isn't supported. - It also explains some constraints on the syntax, i.e. that we only have a
ShCommand
lexer mode, and noOilCommand
lexer mode
-
Don't Silently Change What Code Means. Instead choose a new syntax
- Early on, I wanted to take over
set
for assignment (leaving all options forshopt
. But now it'ssetvar
. It was tempting to take it over, but a bad idea. -
cols
could have beenselect
, but that rare feature was taken. - An exception is
shopt -s simple_word_eval
, which does (silently) change the meaning of unquoted$x
. But most newcomers and even some long-time shell users are surprised by the splitting; that is, many shell scripts actually only operate correctly on names without spaces. So in many cases this option will silently fix bugs, but will require adding an explicit split() where looping over unquoted variables.
- Early on, I wanted to take over
-
Local reasoning about code. You shouldn't have to look at the top of the file constantly to figure out how code behaves.
- Blocks like
shopt --set errexit { }
allow local reasoning, rather than setting the global permanently -
redefine_proc
prevents distant definitions from clobbering your code - TODO: tag procs with
oil:all
? issue 1147
- Blocks like
Blog: HOW OSH Is Designed / Why OSH Isn't Bash
YSH is less constrained by compatibility, although there is still some consideration for it.
- It should be a smooth ugprade from OSH. Avoid "wild" breakage.
- We keep all the good concepts and throw out some bad ones.
- It should be explainable as clean slate language! This principle is heavily in conflict with the first, but there were surprisingly few compromises necessasry!
- YSH should be familiar to Python and JavaScript users. Common features like assignment should behave similarly.
- This principle has "leaked" into OSH when omitting
declare -i
. Also to some degree our reluctance to implement$a == $ {a[0]} is shaped by this.
- This principle has "leaked" into OSH when omitting
-
Don't break the interactive shell / top level / examples printed in books
- e.g. We don't break redirect syntax, and we don't break
PYTHONPATH=. foo.py
- e.g. We don't break redirect syntax, and we don't break
- There Should Only Be One Kind of Expression
- Shell has 3 to 4 recursive expression languages: arith, bool, word. And bash has regexes.
- In contrast, YSH has just one expression language. Note that eggexes are "first class".
- Exception: Globs are still a separate expression language. (But in Oil, they're unchanged and compatible. And they don't have recursive structure, unlike extended globs.)
-
Avoid single-letter flags and names. This was OK in the 70's but no longer scales!
- For example,
shopt --set
is better thanshopt -s
;test --file
is better thantest -f
- For example,
-
Arrays are first class
- In particular, no silent splitting and joining, as happens with unquoted substitutions,
$@
,echo
andeval
, etc.
- In particular, no silent splitting and joining, as happens with unquoted substitutions,
- YSH has reference semantics in general, but value semantics for everything that shell does
- Making copies of List
- Passing List as ARGV
- But for Python and JS stuff, you have reference semantics
- You should be able to express arbitrary byte strings. Everything should be "8-bit clean" by default.
- UTF-8 is an optional (but common) layer on top. (Ditto for other encodings.)
- You should be able to use existing Unix tools with new protocols. (e.g.
grep
still works with lines of QSN. In contrast, the\0
delimited format offind -print0
is doesn't work withgrep
.)- This is a narrow waist argument -- conforming to the waist enables code reuse
(referring to: CSTR Proposal and TSV2 Proposal. And the deferred Shellac Protocol Proposal, and Coprocess Protocol Proposal)