This document explains some syntactical innovations included in merd.
See here for a more detailed syntax description.
Here are the different syntax used for function calls:
gcd(10, 4)
- math
- Algol-like (C, C++, Java, Perl, Python...)
(gcd 10 4)
gcd(10,4,r)
Facts:
- people know well the math notation
- ML notation can be counter-intuitive
eg: map gcd(0) list when you mean map (gcd 10) list
- one solution is to use horizontal layout
- another solution is to disallow this syntax (rationale: keeping both
notations is bad)
- ML notation enables easy partial application [1]
! partial application can be misleading if you don't know the function
! ML partial application mix up two different beasts:
- let f1 x y = x + y which needs both arguments to compute something
- let f2 x = (print "f2 called" ; fun y -> x + y)
which really produces a function using the first argument.
The time of evalution is different, but the type signatures of f1 and f2 are
the same.
This forbids eta-expansion: f2 x is not equivalent to y -> f2 x y
In merd, i propose to use gcd(10,) as sugar for x -> gcd(10,x),
instead of ML's (gcd 10).
In gcd(10, ), i call the empty parameter a hole.
Example of use: length = foldl(, 0, (+ 1))
Pros:
- keep the simple math notation
- exhibit the partial application: in ML's (gcd 10) you can't
know whether it is a partial application, whereas in gcd(10,) you
know it is a partial application
- partial application is expressive enough compared to ML
- no pb for the partial application on second parameter (=> no need for
haskell's flip) [2]
- no pb of currying/uncurrying
- enables default values and overloading based on number of parameters (think C++, Java)
- no evaluation time problem caused by partial evaluation:
- f1(x,y) = x+y has type Int,Int -> Int and is
partial evaluated using f1(10,) whereas
- f2(x) = (print "f2 called" ; y -> x+y) has type
Int -> Int -> Int and is partial evaluated using f2(10) [3]
Cons:
- not as clean theoretically
- more sugared (Lisp doesn't need "," as a tuple constructor)
This hole can be used in function declaration too:
member?(e,) =
[] -> False
e : _ -> True
_ : l -> member?(e,l)
is "id(1,2)" allowed when "id" expects one argument?
With "id(x) = x", one could allow "id(1,2)" where
x's value is the tuple (1,2).
Disallowing this makes higher-order programming harder:
apply(f,x) = f(x)
myfirst(a,b) = apply((x,_ -> x), (a,b))
That's why merd will allow id(1,2) and rely on type-checking to catch
bad use of functions.
WYSIHIIP = What You See Is How It Is Parsed
I invented this word to classify the cases when the parsing is misleading. It
belongs to the more general idea of least surprised.
Here are some examples
- Perl/Ruby/Python:
-3 ** 2 gives -9
- Haskell:
-3 ^ 2 gives -9
- OCaml:
- 3.**2. gives 9.
- Ruby:
Math.sqrt (1-2).abs
will fail because it is parsed as (Math.sqrt(1-2)).abs [4]
- C:
1 + i>>4
is parsed (1+i) >> 4
- ML (Haskell...):
map gcd(10) list
when you meant map (gcd 10) list
- Perl $c ? $i=2 : $j=3
is awfull: Perl parses it as ($c ? ($i=2) : $j) = 3, and you don't get any warning
There are 2 (non-exclusive) solutions:
- Restrict or disallow those expressions
- Lisp uses a radical solution with no sugar at all
- latest GCC warns about most operator mixing [5]
- Use horizontal layout to disambiguate.
- zero or more space is different [6]
- tokens non-separated by spaces are parenthesized
so 1+2 * 3 is parsed as (1+2)*3
It is also called vertical layout.
The classic example is
if (C1)
if (C2)
S1;
else
S2;
which is terribly misleading because the indentation suggests that
- S2 is executed when !C1, whereas
- S2 is executed when C1 && !C2
The solution is to base the grouping on indentation.
Pros:
- follow WYSIHIIP
- standardize the code style
Cons:
- tabulations can make everything go wild: so tabulations must be
forbidden or the tabulation size must be precised in the source code (via a
pragma) (and anyway don't use tabulations in any language)
- complete rigid indentation is not possible (think emacs), but
intelligent editors can be smart enough to make things smooth (try emacs
mode for python or haskell)
- can need special syntax for overrule indentation rules
=> complexify the language
proposal
merd completly generalizes the layout scheme found in haskell (python's layout
is even simpler):
aaaaa
bbb
c
c
aaaaa
is the same as (aaaaa ; (bbb ; (c) ; (c))) ; (aaaaa)
Choosing the operator and function names
Choice of functions name
Rules for choosing:
- choose the more common used function names (cf "Syntax Across Languages")
- keep the whole coherent
- choose the longer name if it enhances readability and the function is not used very often (huffman compression) (eg: rev vs reverse)
- choose the shorter name when the longer doesn't enhance readability (eg: foldl vs fold_left) ??
- use "_" as word separator: separate_all_words_with_underscores instead of
capitalizeTheSecondaryWords and CamelCase.
(some rationale:
GNU Coding Standards (Stallman),
Ada,
Eiffel,
glasses emacs mode,
various)
See "Syntax Across Languages"
to see what other languages are using.
- ``.'' more common method invocation operator
(C++, Java, Python, Beta, Cecil, Delphi, Eiffel, Sather, Modula-3, Ruby, Visual Basic, Icon).
- ``::'' common package resolution operator (C++, Perl). The
``.'' operator can't be used for this (as in Java, Python, Ruby,
Modula-3) otherwise Module.method(para) would mean method(Module,
para) whereas when imported is is used as method(para). Aka
the syntax Module.method(para) would need a special syntax rule,
disallowing module as first class values.
- ``{ ... }'' record selector
- ``:='', ``='' both assignment/declaration operator are available, with different priorities.
- ``!!'' type operator
- ``#'' most standard Unix commenting char (Perl, Ruby, Python, Tcl, Icon, Awk, Shell)
- ``+'' string concatenation (Ruby, Python, Java, C++)
- ``+'' list concatenation (Ruby, Python)
- ``[ a, b, c ]'' list constructor (Haskell)
- ``:'' list cons (Haskell)
- ``||'' logical or
Various
Introduction
Do you know FORTRAN? No? Well FORTRAN didn't have explicit typing. Instead it had
implicit typing based on the variable name. I, J, K,
L, M and N are ints and all others are
floats.
Of course, this is very limitative to have a type associated with each
variable name. That's why, since FORTRAN, languages have avoided this feature.
But people like that idea. The hungarian
notation is based on this:
Long, long ago in the early days of DOS, Microsoft's
Chief Architect Dr. Charles Simonyi introduced an identifier naming convention
that adds a prefix to the identifier name to indicate the functional type of the
identifier.
A big limitation of this hungarian notation is that it's only a convention, not
enforced by the C compiler[7]. It also take away some readability.
Perl is another case of association variable name and type. It uses the prefix
$, @, %. This is quite verbose as most variables are
$ prefixed. It doesn't help readability and lowers expressivity.
Proposal
Give the programmer the ability to associate a variable name with a type. It is
different from a global variable. It just tells that everytime the variable is
used, its type must be compatible. eg (inspired by Haskell's Prelude):
vartype c = Char
isDigit c = c >= '0' && c <= '9'
...
primExitWith :: Int -> IO a
primExitWith c = IO (\ f s -> Hugs_ExitWith c)
will fail to typecheck because of c in primExitWith.
Another example inspired by Scheme;
vartype ".*\?" = a -> Bool
vartype ".*\!" = a -> Unit
this enforce the convention that functions of the form xxx? are predicates and
xxx! are mutators.
A good scope for this association is the module. Exporting this association
seems a nice feature to ensure a global behaviour.
Pros:
- stricter typechecking retaining expressivity
- give the ability to ensure a common behaviour for some variables
Cons:
- variable type and usage is separated. Good error reporting is needed:
the typechecker must issue special error messages when variables are
global-typed.
- complexify the language
- it may complexify the typechecker (for the type error reporting)
animals = [
"cat",
"dog",
]
is not a valid Haskell code because of the last comma. This is very annoying
because the last line must be treated differently.
(OCaml, Python, Ruby, Perl, C... are ok)
But beware, it also means than
f(foo,
bar,)
f(foo,
bar,
)
are not the same. The first introduces a hole, but not the second one.
Why is 1-uple needed?
In languages allowing computing tuples (eg: (1,2) + (3,4) => (1,2,3,4)), it
is necessary to have 1 element tuples. Otherwise you have to allow:
(1,2) + 3 => (1,2,3)
which is no good for catching errors
(at compile-time for merd, at run-time for python)
The ability to compute tuples is very important to handle things like the
compile-time typed printf, or things alike macro-processing.
The 1-uple syntax issue
merd uses the comma to construct tuples. Alas this doesn't handle 0-uple and
1-uple.
- the most commonly accepted syntax for empty tuples is "()".
- since merd makes a distinction between a value and the 1-uple containing
that value (for better type checking), you need a way to write this 1-uple.
- Python uses "(a,)". This syntactic construct is already used in merd
for partial application
- Perl use parentheses for both grouping and tuples (called lists). This
causes some problem:
"Hello " x 3 #=> "Hello Hello Hello "
("Hello ") x 3 #=> ("Hello ", "Hello ", "Hello ")
("Hello " . world()) x 3 #=> ("Hello world", "Hello world", "Hello world")
In ("Hello " . world()) x 3, parentheses are necessary because the
priority of "x" is higher than ".". As a result, you must
write ("Hello " . world())[0] x 3 to have the string concatenation
behaviour.
- To escape the ambiguity of using parentheses for both grouping and
tuples, merd's parentheses have a tuple meaning only when
double-parenthesing is used: (2) is equivalent to 2
whereas ((2)) is the 1-uple containing 2.
- Of course, 1-uple could also be handled using a normal function:
"tuple(2)" would be the 1-uple containing 2. No need for special
sugar.
Some examples in the various syntaxes:
raw | Python | Merd |
tuple(2) | 2, | ((2)) |
tuple(1,2) | 1,2 | 1,2 |
tuple(tuple(1,2)) | (1,2), | ((1,2)) |
tuple(tuple(2)) | (2,), | ((((2)))) |
tuple(1) + tuple(2) --> tuple(1,2) | 1, + 2, --> (1,2), | ((1)) + ((2)) --> ((1,2)) |
tuple(tuple(1,2)) + tuple(tuple(3,4)) --> tuple(tuple(1,2), tuple(3,4)) | ((1,2),) + ((3,4),) --> ((1,2), (3,4)), | ((1,2)) + ((3,4)) --> ((1,2), (3,4)) |
Recursivity
- Haskell: a variable definition is recursive (unless you define another
value in a where clause)
- OCaml: a variable definition is not recursive. Recursive functions are
introduced using a special construct let rec.
- Merd: a function definition is recursive, a variable definition is non
recursive. Detecting whether this is a variable or a function declaration is
based on the syntax. Examples of function declarations:
f(x) = x
f := x -> x
Notes
[1]
[2]
- This problem is partially solved in OCaml with named parameters
- There is a work-around in Haskell for partially applying the second parameter:
"(`f` 2)" is "(\x -> f x 2)"
[3]
And eta expansion is preserved:
- f1(10,) is the same as x -> f1(10,x) by definition of
the partial evaluation
- f2(10) is the same as x -> f2(10)(x)
But note that evaluation time is kind of weird is merd. Partial evaluation is
used...
[4]
Even worse return (1-2).abs is parsed return((1-2).abs) which
show that return is parsed differently even if it has a functional
syntax just like Math.sqrt. return has a lower precedence.
another non-WYSIHIIP ruby example: p (1..10).to_a parsed as (p(1..10)).to_a.
example of why raising method priority would fail is sin(0.7).to_i
"ruby -w" catches most of this problem, so use it!
[5]
You can't even use the fact that && has precedence over || or
you get
``warning: suggest parentheses around && within ||''
[6]
experimentation is needed to know if this rule could work for more
than one space, eg:
1 + 2 * 3 parsed as (1+2)*3
[7]
Associating a type with a variable is not easy, especially in C where coercions
are everyday life. I don't think it would be possible to enforce the association
without loosing a lot of expressivity.
Some more info about the hungarian notation.
Pixel
This document is licensed under GFDL (GNU Free Documentation License).
Release: $Id: choices_syntax.html,v 1.21 2002/06/08 21:35:55 pixel_ Exp $