Personally I'm a little wary of being ghettoised into something overly domain-specific for scientific/numerical computing. Really good interop may mitigate that -- something which can navigate the unholy mix of C, C++, fortran, matlab, octave, R and python routines one comes across trying to reproduce others research work, would indeed be awesome.
I do wonder if some of the noble demands of this project might be better delegated to library developers though, after adding a bare minimum of syntax and feature support to a powerful general-purpose language. For now Python+numpy+scipy seems a great 90% solution here.
I huge part of the goal here is to reduce the need for the "unholy mix of C, C++, fortran, matlab, octave, R and python routines" in both academic research work and machine learning / data science code in industrial settings. The whole project kicked off with me ranting about how I was sick of cobbling things together in six or seven different languages.
So interop is a very, very high priority. We have pretty good C ABI interop now. You can just call a C function like this:
We still want better C/C++ interop though. I've already looked into using libclang so that if you have the headers you don't even have to declare what the interface to a function is. That was very preliminary work, but I got some stuff working. Making Julia versions of C struct types transparently would be another goal here.
Another major interop issue is support for arrays of inline structs (as compared to arrays pointers to heap-allocated structs). C can of course do this, but in any language where objects are boxed, it becomes very tricky. We're working on it, however, anyone who wants to discuss, hop on julia-dev@googlegroups.com :-)
in any language where objects are boxed, it becomes very tricky. We're working on it, however
Naive question (and I'll hold my tongue on my naive guesses): I understand why it is necessary to box user-defined types on the JVM, but why build this restriction into a new language that doesn't run on a restricted platform? Especially when performance and C interop are high priorities?
Or perhaps I misunderstood, and your statement should be read as sparsevector suggested.
Boxed values are pretty much necessary for dynamic languages — that's where the information about what kind of value something is gets stored. It is a pain for this kind of thing, however. In a fully statically compiled language like C, however, you can eliminate the need for a box entirely. If you want dynamic typing, that's the price we've gotta pay.
Not necessarily. You just store a pointer to the type info inline with the data (like C++'s vtables). You can have unboxed "value-types" (const structs, essentially) in dynamic languages. In fact, you could even differentiate between a "boxed" ref type and a value type at runtime, because refs don't need all 64 bits of the pointer. So a ref is a 64 bit pointer with the first bit set to, say, 0, and a value (struct) type always begins with a 64 bit pointer to it's type information, only it's tagged with MSB of 1. Since you can't extend concrete types, you can easily store value types inline in an array, and just have the type-info pointer (which must be the same for all elements, b/c there is no inheritance) at the beginning of the array. And if your structs are aligned OK, you could easily pass them to C by skipping the type-info pointer both in the single value case and in the array case.
Ok, having read this comment again, here's a more measured response. You're assuming in this comment the value-type vs. object-type dichotomy that's used in, e.g., C#. That's one way to go, but I'm not sold that it's the best way. Deciding whether you want something to be storable inline in arrays or not when you define a type is kind of a strange thing. Maybe sometimes you do and sometimes you don't. So the bigger question is really if that's the best way to go about the matter.
It seems that in dynamically typed languages, you either need to have two kinds of objects (value types vs. object types), or two kinds of storage slots (e.g. arrays that hold values inline vs. arrays that hold references to heap-allocated values). The boxing aspect is really only part of that since you can't get the shared-reference behavior unless the storage is heap-allocated, regardless of whether there's a box header or not.
This is an interesting scheme and seems like it might work, but I'm not sure. Would you be willing to pop onto julia-dev@googlegroups.com and post this suggestion there so we can have a full-blown discussion of it? Hard to do here — and some of the other developers would need to chime in on this too.
As a compromise, I think it would be helpful to be able to define structs that have a known layout in memory but no dynamic identity. They would be treated like primitive types in Java, but with the crucial difference that users could define their own. That way users could write Julia code that stores and accesses data in the same format they need for interoperating with whatever native libraries they use, instead of serializing and deserializing between Julia objects and C struct arrays (or using int or byte arrays in their Julia code and giving up most of the advantages of a modern programming language.)
We've discussed our way down that path but the design ends up being unsatisfying because "objects" and "structs" end up being so similar yet different. It may be what we have to do, but I haven't given up yet on having structs and Julia composite types be compatible somehow. pron's scheme is interesting.
Ah, there are finalisers. The function finalizer lets you define a function to be called when there are no more references to an object. I guess maybe the idea is to use this from within the constructor.
No, currently it's an inline array of immutable 128-bit numeric values and we use bit-twiddling to pull the real and imaginary parts out. However, that's a temporary hack. (It's also why the mandel benchmark is relatively slow — all the bit-twiddling is not very efficient.)
The longer-term approach is still up in the air and that's what I was talking about above. My favorite approach at this point is to allow fields to be declared as const — which in Julia means write-once. Then if all fields are const the object is immutable and can automatically be stored in arrays inline.
Can you please explain or give a link what the problem really is (that is, why do you have "bit twiddling" at all) and how you imagine that const arrays of complex can be write once and still efficient?
What's the problem to have arrays of doubles and complexes as "basic" types even in the dynamic language? I believe this could give you a C footprint and C performance with array operations?
It sounds like he's saying it's hard for Julia to interop with languages that don't support arrays of inline structs (e.g. Java). I could be misreading it though.
I've been waiting for Fortress for a LONG time, but either progress has been real slow or the Fortress team has trouble communicating their progress to the community. Probably both.
From a quick look I can tell that while Fortress is more similar to Scala, with classes, mixins and static typing, Julia is closer to Clojure, with no encapsulation, dynamic typing, homoiconicity, and separation of behavior (methods) from concrete types (akin to Clojure's protocols).
I really like the choices Julia's designers have made, particularly multiple-dispatch methods, final concrete types and lispy macros. The language seems very elegant. Not too crazy about "begin" and "end" syntax, though :)
Yeah, we've been waiting for a while for Fortress too. It seems like a lot of time and effort has gone into a WYSIWYG IDE and not as much into the actual language implementation :-(
Chapel has made a lot of progress in the past 2.5 years (while we've been working on Julia) and is certainly a contender.
Julia certainly aims to be more dynamic like Clojure, but specially designed to be good for numerical and technical stuff, and yes, the begin/end is due to Matlab — scientists who have a lot of code in Matlab already are a major target demographic. As it is, a lot of Matlab code ports over with relatively trivial changes (see: http://julialang.org/manual/getting-started/#Major+Differenc...), which was the point of syntactic similarity to Matlab.
Octave has solved this by allowing both matlab-style "end", as well as block-specific endings, like "endif", "endfunction", "endfor", etc.
Having several end end end endings without any hint of what they are closing is one of the ugliest parts of matlab. Even though you are reimplementing this for compatibility reasons, it is rather sad that you chose to make this the default behaviour for Julia.
I agree. If all they say is "end end end" you might as well use braces. I'm depressed to hear that Matlab is an inspiration to the language design at all; the Matlab language is by far the worst aspect of Matlab, and it has nothing to recommend it. Think of the damage that Sun did to Java to make it look familiar to C programmers; no need to repeat that mistake (though if you're cynical, you might think it was the smartest thing they ever did.)
Braces would be better than ending keywords, in my opinion, but I'm disappointed not to see whitespace-defined blocks of a la Python. Perhaps that doesn't work in a statically-typed, type-inferred language, but if it does, I think it would be much better. Scientists have no problem with it (certainly less of a problem than programmers do) and Python is pretty well accepted in the scientific computing community.
Using curly braces for blocking is a non-starter because they're used for a lot of other things, and honestly, bracket pairs like (), [], {} are way too precious, imo, to squander on something like blocks. Parens () are exclusively for function application; square brackets [] are exclusively for indexing operations; curly braces {} are for type parameterization. The other option that C++ popularized the use of is <> — but that's syntactically sketchy since both < and > are valid by themselves. It makes parsing a complete nightmare — both for machines and, to a lesser extent, people. I wouldn't be completely averse to indentation-based blocking, but I'm not really a huge fan of it either. I'm cool with the way both Matlab and Ruby do it, which is using `end`. Could conceivably change, but relatively trivial syntactic alterations like this are a really low priority. What we have now works well and is familiar to both Ruby and Matlab programmers.
Neither do many foreign language tokens like 我是加拿大人 but that doesn't stop me typing them in quickly using the IME. Admittedly those brackets aren't in any IME I know, but maybe they should be.
Matlab already allows you to omit the 'end' from function definitions, which many people find easier to read, and my experience seeing people transition from Matlab/Octave to Python+Numpy+Scipy is that people get on board with indentation-delimited blocks, because that's the way they write code anyway. But I agree - at the moment it's a pretty trivial thing. As a heavy user of Matlab, R and Py+N+S I'm looking forward to trying out Julia.
I personally found it worse to read, because it makes function blocks different from other blocks. Also, if you define nested functions in Matlab, all functions in the file have to be closed with end anyway.
Nothing is as bad, though, as the default Matlab behavior of not indenting first-level function code. This makes it really hard to scan a file with multiple functions and see where they separate.
I was referring specifically to supporting the Octave conventions, a superset of Matlab's, which make it easier to match up begins and ends.
When I said "easy fix" I meant "a trivial extension to the parser" without realizing you would translate that to "really low priority". Guess I'll save my non-trivial suggestions. :)
While you are at it could you support alternate syntaxes for the same Expr ? Perhaps encoding the syntax version at the top of the file or setting the reader at the top of the file? For those porting matlab code over either provide a tool to translate the source or put the reader in matlab compatible mode.
After that proposal is implemented (perhaps in version 3 of Scala, almost ten years after its first release, though granted it wasn't their top priority) they'll still have to write at least a little bit of boilerplate for each type they want to store in unboxed arrays, and they'll be limited to types that can be stored as Java primitive types under the covers. That seems like a pretty nasty limitation for a general-purpose scientific computing language, not only because it decreases native performance but because it makes C interop much more difficult.
For a general-purpose scientific language, I think it is imperative to adopt the principle that any user-designed type should be able to achieve the same power, performance, and elegance as if it were a built-in type. To the best of my knowledge, that rules out JVM languages.
(Scala was never meant to be a "JVM language," but in terms of its strengths, its weaknesses, its user base, and its future, it is very much a JVM language, and the obstacles to implementing efficient user-defined types is one of the trade-offs they accepted when they went with the JVM.)
I agree, and I think Julia's designers have chosen correctly not to use the JVM (although, in the comments below you'll find that Julia uses boxed types for "structs" as well, though it would be far easier to build true array-embeddable complex types on LLVM than on the JVM). However, value types for the JVM are a work in progress (as well as tail-calls), and I think they might have already been implemented in the DaVinchi project (future JVM improvements), and so will find their way to the JVM in due time (see https://blogs.oracle.com/jrose/entry/tuples_in_the_vm). Until then, scientific computing languages will most likely choose a different platform.
And what should the library writers use? The great advantage of this is that much of it can be done in one language.
From the article: "The library, mostly written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing."
I wouldn't say that it sucks - that's way too strong. I am one of the authors of Circuitscape (http://www.circuitscape.org), which uses python+numpy+scipy. The community loves it, the fact that its python, open source, embeddable in other tools, etc.
However, much of the code is written in a vectorized style for performance reasons, as is the case with many high level scientific computing languages. This leads to unnatural code in some cases, and also uses too much memory. First I thought IronPython was the way, but have been looking forward to pypy+numpy+scipy eagerly.
If I were to use julia, the code would be a lot more natural, because type inference and all the compiler goodies make it possible to simply write loops over arrays when I need them. This was one of the reasons we started working on julia, because everything else just seemed to fall just a little bit short.
OK, perhaps I should have qualified that. It's a good 90% solution for a lot of use cases, although obviously more useful for implementing algorithms in terms of numerical building blocks, than ultra-performance-critical code where you want to write a lot of your own tight inner loops. Is that what you meant?
The parallelisation story isn't great at the moment either, although seems like it has the potential to improve. Still 'sucks' is pretty harsh. For me, looking at how far it's come since I first played with it, I'm impressed. For machine learning, the majority of the building blocks one needs are there, and you get to sit back and put them together using a nice, clean, widely-adopted general purpose programming language. And unlike MATLAB still maintain a decent amount of control over things like memory usage and which BLAS routines it's calling.
Adding bindings for new libraries is more of a pain than it should be though on the occasions where you do really need some fortran or C++ library that doesn't have bindings yet. A language which bridges the gap between high and low levels (not C++!) and has great interop would be very interesting. I guess I'm just hopeful that this kind of thing can be achieved in a general-purpose language (new or existing) which like Python is adopted across the wider software engineering community. Perhaps that's my unreasonable demand to add to their list :)
Seriously? I mean, I know you are very excited about your work on PyPy, but why do you have to go around saying baseless stuff like this? There are many, many thousands of people who write lots of Python code every day that rely on Numpy, Scipy, and Matplotlib to get their work done, and who seem to be quite pleased with it. There is a lot of work left to do on all parts of that stack, but that's a far cry from "it sucks".
I guess I should clarify that - performance sucks. There are obviously various ways around it, but you just can't write a lot of performance critical python that way and I guess this is one of the reasons why julia exists in the first place.
But if you look at the actual cutting edge research work in the HPC and scientific space, they are working on languages that allow domain experts to express computation using higher level primitives, not on fancy compiler techniques to make general purpose imperative languages like C or Python "automagically" run faster on single cores.
The general consensus, if you look at languages like Chapel and Fortress and X10, seems to be that most scientific codes shouldn't be written using for-loops. That is the low-level control flow construct dating from the age of assembler. Instead, what scientists generally want to say is, "Apply this kernel across this domain, with these windowing conditions", or "Reduce values from this computation along these keys in my dataset". As software developers, our job is to provide the language runtime to allow them to do that; such a runtime will be the most robust, correct, maintainable, and performant.
Right. I think we actually violently agree with each other on that :) Note that in general it's ok to not write much python and live very happily just using it for non-performance critical parts. No doubt we both know a lot of people who are quite happy with that.
The problem with numpy's performance is twofold:
* Numpy expressions might not be fast enough. I believe you guys at continuum are trying to address that one way or another. In general the kernel expressed using high-level constructs in python should not be slower than an equivalent loop in C.
* Sometimes you actually want to write a for loop, because you don't care, because it's faster, because it's a single run, because the data is manageable etc. You should not be punished for doing that with 100x performance drop. You can still be punished for that with 2x performance drop.
Personally I'm a little wary of being ghettoised into something overly domain-specific for scientific/numerical computing. Really good interop may mitigate that -- something which can navigate the unholy mix of C, C++, fortran, matlab, octave, R and python routines one comes across trying to reproduce others research work, would indeed be awesome.
I do wonder if some of the noble demands of this project might be better delegated to library developers though, after adding a bare minimum of syntax and feature support to a powerful general-purpose language. For now Python+numpy+scipy seems a great 90% solution here.