Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm really happy to hear about Standard Chartered's success with this, and I want to know more. This is really promising stuff.

My current company is looking into our "next generation" platform for when our datasets exceed what we can do in R. R may not be the best language, but it's great for exploratory data science, has the best or the only library out there for some ML purposes, and we've done a lot of things to make it production-worthy (on small- and medium-data). We'll probably need to involve something else, down the road, when our data sets get larger than what fits on one box. The leading candidates are Haskell, Clojure, and Scala (Scala because of Spark). I'll have to evaluate the languages fairly and relative to our needs, but I hope Haskell wins for a number of reasons, including the fact that Chicago + Haskell is an unfilled niche and we'd attract a ton of talent.

For those who've taken Haskell this far into production: have you encountered any negatives? Are there any times when you think it might be better not to have a strong type system?

To me, the biggest drawback of Haskell isn't anything intrinsic to the language, but the amount of stuff it forces a person to learn. For me, that's a fun challenge... but trying to convince 110 programmers to use a language that forces I/O into a set of types (loosely) called a monad seems like an epic task. Clojure has the advantage of being simple and beautiful once you get past the parentheses. Haskell is demanding and frustrating for the first 6 months (and pays off handsomely later on, but this can make it a hard sell).

Also, how does the Any type in Mu (if anyone familiar with it is here) differ from Data.Dynamic?



I've put a large Haskell app in production, and previously to that have seen the reasearch process at a couple of hedge funds. I'd have a couple of suggestions:

* You don't need everyone to be a ninja Haskeller. Get a couple of early ninjas to flesh out the architecture and you'll find that you'll be able to "fill in the gaps" around them with less experienced people (a bit like yummyfajitas experience above). FWIW I very quickly became one of the fill in the gaps people ;-)

* If you can move to stream based abstractions, you'll be onto a winner. Event streams are inherantly immutable, and force you into a much purer data model. See the "unified log" linkedin blog post or read [1] for more

* I don't think that you'll be able to retrain 110 programmers in Haskell. While I think that anyone with the right mindset can learn it, there will be a significant portion of any team who lack that mindset.

Hope that helps. You know where to find me if you'd like anyone to help you sell adopting Haskell to your management. :-)

[1] http://manning.com/dean/


I'm sorry - I love Haskell, and it's a great choice for a lot of domains, but it's a poor choice for numeric/scientific computation. The ecosystem just isn't there yet.

Here's a recent r/haskell thread discussing this: http://www.reddit.com/r/haskell/comments/2rsxrb/is_haskell_a...


Exploratory data analysis isn't killer in Haskell. I think types can help a lot here (god knows the amount of pain that R/Numpy spaghetti has inflicted) but the jury is still out for how to do it well. Row types seem like an important technology which gets a little bolted on to Haskell---both exciting and unclear at the moment.

https://acowley.github.io/Frames/

On the other hand, if you've got a solid data pipeline you're scaling then there are grand tools for streaming data in Haskell. You're in the complete sweet spot for strong typing. You can also drop down into C easily when speed is an issue. Matrix libraries exist for the intrepid user. HMatrix is BSD now so you can at least touch LAPACK.


I haven't encountered any negatives, by which I mean I never wished I was using another language. In fact if I'd been using any other language I'm confident I would have wished to use Haskell. (This is for reporting software using Opaleye to generate multi-hundred line Postgres queries in a composable and type safe manner.)


Not something I've used myself yet, but is there any particular reason Julia isn't on that list?


I haven't checked on Julia for like a year, but back then Julia didn't offer anything above and beyond R when it comes to "big" (really just not fitting into memory) data. As it is Spark with Scala is both faster and more convenient than R or Julia.


Spark has a python API and python has a bigger ecosystem for exploratory data analysis than spark.

Also python has a big data interface with out of core matrices and tables as well as a compiler than can speed numerical code, with a just a function decorator, to C like speeds. The former is an interface to the latter.

http://blaze.pydata.org/docs/dev/index.html https://github.com/numba/numba




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: