Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My First Recommendation to New Scientific Coders: Learn Visualization (vincebuffalo.org)
66 points by nkurz on Nov 17, 2012 | hide | past | favorite | 29 comments


Jesus, is that visualization supposed to help? I stared at it for minutes and I still don't really grok it. Seems like it is trying to channel a similar method as the Napolean March: http://bryanrulli.files.wordpress.com/2011/05/tufte_napolean...

But failing completely at it.

In any case, is this a common type of visualization now? Maybe with some experience it becomes easy to grok what is happening?


No, it's not common to see this kind of visualisation. Bad visualisations? Yeah, they're pretty common. The usual mistake is overuse of Pie Charts.

I'd recommend the OP (and anyone else who has an interest in communication) to read:

The Visual Display of Quantitative Information http://www.amazon.co.uk/The-Visual-Display-Quantitative-Info...

Information Dashboard Design http://www.amazon.co.uk/Information-Dashboard-Design-Effecti...

Now You See http://www.amazon.co.uk/Now-You-See-Stephen-Few/dp/097060198...


Ironically, the OP recommends that you read The Visual Display of Quantitative Information, as well. I suspect that the mediocre example (I thought it was reasonably readable and somewhat interesting) was there more as a demonstration of the fact that R makes non-scatter plots easily, too.


It's from http://www.jasondavies.com/parallel-sets/ and is one of the example plots for http://d3js.org/


I must say that the icicle plot of the same data is way more readable. The crisscrossing lines are all but useless to get a feel of the data.

go to http://www.jasondavies.com/parallel-sets/ and click on icicle plot to see what is going on.


I eventually figured it out, but there was no AHA! about it. I had to slowly pick it apart every step of the way, like working through an obfuscated C contest entry.


A dot chart would have been substantially better. I haven't seen that sort of visualization before. It's probably a mistake to try to show the relative size of each group in the same picture as the absolute size.


I've seen a clearer version of the same chart; when it doesn't show children it is much easier to understand what is going on.


That's a terrible visualization. It takes way too long to work out what "the story" is.


It's really only useful when interactive - http://www.jasondavies.com/parallel-sets/ - and it still takes a while to figure out what's going on.


Clicking on a Show icicle plot! checkbox down in the article converts this visualization to something much more readable.


I wouldn’t say it’s a terrible visualization, although in general I do believe conditional 2D plots are the best way to show trends in higher dimensions. For example, here are some conditional plots for similar data (Kaggle Titanic competition) that I’ve found insightful.

http://i.imgur.com/C4uyH.png

The conditioning on Sex/Class seems to be the best way to illustrate the trends in these 4 dimensions.


I really thought hard for a minute if the author was joking, the visualization is really terrible. I would rather look at the table than the visualization, i found this pycon talk really nice explaining dataViz patterns: http://pyvideo.org/video/637/data-design-meaning


This is true of really all programming disciplines. I've lost count of the times that dumping a CSV for Excel or a dot file for graphviz. Visualisation is just as much an essential part of my toolbox as a text editor or debugger.


More generally - be aggressive with your data: probe it, sort it, graph it, sanity check it, and never take it at face value.


It looks to me like the tabular data is inconsistent with the visualization - the tables showed 0 perished from first or second class or crew, the visualization showed something entirely different.


There is a mistake in the tables. Top two are copies of each other but they are supposed to show different things.


The table for "Age = Child, Survived = No" is the same as the table for "Age = Adult, Survived = No". Odd, to say the least. One wonders where this interesting fact is shown in the visualization.


If that were true, that would mean first and second class completely survived. But the visualization shows this is not the case.


learn visualization but not the underlying statistical common sense that allows you to produce meaningful data. yah right... R will do everything for you... no need to worry about the real work...


Curious about what other libraries people use. Matplotlib? Matlab?


Python is my go-to language, so I use matplotlib frequently. But I think R's ggplot2 is easily the best visualization library out there.

Bokeh is supposed to be a ggplot2 clone for python, but it's still in the very early stages.


I’d strongly recommend matplotlib. It has a matlab like procedural interface that makes it very easy to pickup if you’re coming from matlab.

Additionally, it’s wonderful to be able to “dive under the hood” if you need to create a special type of visualization: i.e. the internal model is easy to understand and highly customizable. This has enabled me to create several awesome plots that would have been very difficult, if not impossible to create with any other plotting program or library.


I completely agree about the breadth of matplotlib, though the difficulty can be looking for examples. I've often found the examples on the matplotlib website to be horribly out of date compared to features added in recent releases. Are the plots you've done available online / Any recommended blogs for interesting examples of plotting?


In the color science community lots of people use matlab. I find matplotlib and numpy to be a great combination.


I've used VTK before. The learning curve is much larger and it's a huge library. But it does have a lot of inbuilt functionality like database access with filters/conversions to format the data the way you need it.


Physicist here.. I use CERN ROOT as it does just about everything. Occasionally I'll mix it up with python and matplotlib. GNUPlot is still great though too.


Tulip is great if you deal with huge datasets that can be represented as graphs.

http://tulip.labri.fr/


In engineering we use a lot of Matlab.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: