Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> A fun (but impractical and frustrating) variant of this feature is to have the image change as soon as the viewer looks away. So you’re looking at a painting, glance away to another room, and look back to find a new painting hanging on the wall.

Re this comment. There is this thing called "saccadic masking", the gist of which is that we are effectively blind while our eyes move from focusing on one position to the next. Depending on the "distance" (angle) traveled by the eye, this can last up to tens of milliseconds. Enough time to do some cool stuff!

One of the studies that conclusively showed this effect, a fairly long time ago, had participants wear some elaborate headgear that allowed the researchers to track where the eyes of their participants were looking at. They had them look at a standard sentence like

"the quick brown fox jumped over the lazy dog",

but with a twist! Every word in the sentence was masked, except for the word the participant would be focusing on at that moment. So if he focused on the word fox, the screen would show

"xxx xxxxx xxxxx fox xxxxxx xxxx xxx xxxx xxx".

Whenever the system detected a saccade by the eyes, they would recalculate which word the participant would be looking at (e.g. switch to dog), and changed the display of the screen accordingly, now showing

"xxx xxxxx xxxxx xxx xxxxxx xxxx xxx xxxx dog"

Participants were asked if they noticed anything strange about the sentence, but they reported there was nothing strange about it! (Disregarding having a heavy set of mirrors strapped to your head).

My point: If you were able to detect saccades of the eyes, and relatively accurately calculate their position, you could have the region of the image that's unattended be colourful noise, the theory says that your visitors would be none the wiser (of course this would break down with multiple people looking at the same image)

Even cooler: procedurally re-generate parts of the image that are unattended, so that you're looking at an ever shifting image, but wouldn't quite be able to pin down what's happening. Similar to this video: https://www.youtube.com/watch?v=ubNF9QNEQLA



This is also the principle behind foveated rendering (https://en.wikipedia.org/wiki/Foveated_imaging) where you only render the graphics in high detail where the viewer is looking, and use a low-res image for the rest with the goal of saving computing power. Big area of development right now for VR!


That's really cool! I'm not 100% sure, but what I remember from reading a few psych papers about this is that as long as the "mask" is similar to the original in a few easy to calculate sampling statistics (colour distribution, overall hue), it doesn't actually need to look anything like the original image, even calculating low-res imagery shouldn't be necessary. Our peripheral vision is just that bad (at least for static imagery, movement detection is actually pretty good). The foveated part is where the magic happens, and we've gotten pretty good at fooling ourself into believing this "foveated" part is much larger than it actually is.


Hey, something I know about! I actually worked on one of those research projects in college, programming the experiments. The idea is that there is a certain radius around the focal point where you stop being able to detect changes. I'm not sure what the final results were, but the theory was that you can calculate how blurry an image can be and still be discernable based on how far it is from the focal point of your vision. It was surprising how good people are at detecting changes in a blurry picture that's way out in their periphery.


My prof used to joke that, whenever he'd discuss fovea and periphery, that if we had a fovea the size of our full field of view, we'd need a brain the size of an elephant to process all that information. It's interesting how we're very sensitive to sudden changes (thus movement) in our periphery, but are so bad at classifying/identifying static imagery.

I remember reading about right eye and left eye dominance. Where they'd keep an image on the screen saccade invariant (ie, compensate for any saccades that were made). Slowing moving a letter/character/word/whatever to the edges of the participant's field of view and asking when the character was no longer legible. This happened surprisingly quickly, but at different positions for the left eye and right eye for pretty much all participants..


Heh - I remember back in the very late '80s or early '90s, my dad was working on the flight simulator for the "swing wing" F111E's new avionics package that Australia was getting.

The sim cockpit had a pair of Silicon Graphic Reality Engine^2s - one driving a projector that lit up a 5 or 6 meter diameter 1/4 spherical screen at low-ish resolution, and another driving a projector that was aimable and slaved to the pilot's helmet (and maybe even eye tracking, I can't remember) that projected a small patch of high resolution imagery exactly where the pilot was looking. If you knew what it was doing, it was easy enough to "catch out" that system and see the edges where the two images joined, but once immersed in flying it disappeared completely. It was spectacularly obvious what was going on if you were watching the screen while someone else had the helmet on.

I _so_ wanted one of those Reality Engines back then, I suspect my phone now has more graphics processing power though (I'm pretty sure my Galaxy S6 in a Gear VR does a significantly better job than that multi million dollar military project ~20 years ago...)


The SGI "Reality Engine" was also the GPU that ended up in the Nintendo 64, wasn't it? How much more powerful was the RE2?


It's always seemed to me that we could get much higher-quality VR if we could manage to set up a foveated display: a high-DPI display embedded in concentric rings of progressively-lower-DPI displays. It's easy enough to get quality-control clamped for a 600DPI display if you only have to make them a square inch in size; and your memory bandwidth and parallel processing needs go way down if the outside rings can actually be treated as a small screen, rather than as an extremely-high-resolution screen displaying a (monotonous and blurry) image.

Of course, the big obvious problem with such a display is that the eye moves, and moves faster than you could possibly move around the display. The real key, I would think, would be something equivalent to a metamaterial convex lens, that could be "tilted" and "flexed" in the same way the lenses in our own eyes can, to redirect and "refocus" the centre of the image to the new eye position without actually moving it per se.

We already have a technology to achieve this sort of "tilting" and "flexing", it turns out: magnetic deflection, as seen in CRTs. No reason you couldn't use it to deflect a continuous parallel matrix of rays by a constant amount, rather than one continuously-shifting beam. Heck, you could use an array of coherent emitters (laser diodes) rather than point-source diodes, and use phosphorous on the intermediary panel like the good old days.


I've seen this idea used in machine learning as well.

There was a paper [1] whose goal was a binary classification of the center pixel in a region of interest. Interestingly, their results improved when they applied a foveal blurring surrounding the pixel to be classified.

[1]: http://people.idsia.ch/~juergen/nips2012.pdf


Interesting! Thanks for sharing!

There are algorithms that mimic fixation paths the eyes follow when presented with a novel image, very much related to modeling dopaminergic systems. It seemed to find task-relevant information dense areas first, and then slowly spread out to less information dense areas. I wonder if there'd be any benefit to running these algorithms on images, basically turning them into a video, and then running classifiers based on this video (with or without foveal blurring


Do you have any more information about these algorithms? Or perhaps links to a paper or two? It gives me a couple ideas and sounds very interesting! I do wonder as you do as well. Feeding an image to a neural network instead of as a single input, but as a series of inputs separated over time, 'reusing' the same neurons for different portions of the image might allow for interesting feedback to develop.


Apparently I can reply now, repost just so it shows up in your comment thread

I remember seeing a video of a robot that would attend to different parts of a scene based on the "saliency", in the sense of novelty, of its features. I can't find the specific video, but I think the model running the robot is related to: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3930917/ The paper is quite dense, as it describes biologically plausible models of dopaminergic systems, which makes the model quite complex as well, but it's interesting because this system is considered quite 'low-level', no cortex involved. I'll add more if I can find any..


@otakucode The thread has gone too deep, so I can't reply to your comment directly.

I remember seeing a video of a robot that would attend to different parts of a scene based on the "saliency", in the sense of novelty, of its features. I can't find the specific video, but I think the model running the robot is related to:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3930917/

The paper is quite dense, as it describes biologically plausible models of dopaminergic systems, which makes the model quite complex as well, but it's interesting because this system is considered quite 'low-level', no cortex involved.

I'll add more if I can find any..


I just posted this in another comment, but M$ research has awesome foveated rendering: http://research.microsoft.com/apps/pubs/default.aspx?id=1766...


I got a live demo of this at a conference a few years back. There was a line to see the demo, so I first waited while others did it. They first showed the full resolution image, then enabled the foveated rendering. The latter looked terrible - a tiny patch of hires imagery and a massive blur for everything else.

When it was my turn, they started with the full resolution image. I was waiting for them to enable the foveated rendering - until they told me it was already running. I could not tell at all that it was not all rendering at full resolution. Really impressive.

The main presenter mentioned that the optic nerve/brain processing 'shuts down' for up to 40ms during a saccade, so they have that long to render the small region at full resolution between the time when your eye has its new target and the optic nerve comes back on line.


I'm so glad to hear this! Do you know of any progress since then? That very closely reminds me of my first time trying VR (last week). I sat and watched on the 2D monitor someone else doing it, and thought it looked incredibly stupid. Then I put on the headset..


I don't know of any explicit details of progress, but everyone I've mentioned it to in the VR space nods knowingly when I mention it, so I take that to mean that it is an area of active research and development.


I am pretty excited about this, I could see this being used with an variant of DLP technology to deliver ultra-high resolution and pixel density only to the area of screen you are looking at.


Real-time raytracing would benefit so much from that.


I apologize for the stupid response to your intelligent and well-written comment, but this reminded me of the classic "Creepy Watson" video, playing around with a poorly implemented follower AI in a budget video game. Audio not required, but recommended. https://www.youtube.com/watch?v=13YlEPwOfmk


Crimes and Punishment, the follow up to that game is actually quite good. There's a new one out this year called The Devil's Daughter.


Thanks for the explanation, that was great!

I imagine you could play Where's Waldo without changing screens. Just "find the next Waldo" on an ever changing scenario. The moment you find one, another is created at other spot.

Could be very addictive, although sounds kind of stressfull also.


You could also just have it decide exactly when it wants you to find Waldo. Imagine you had a Where's Waldo ipad app that had 20 different "pages" and you could set the amount of time it takes to get through them all so you could keep your kid busy for a guaranteed minimum time. :)

Or even better, it's listening for you to say "ok, it's time to go in two minutes. Finish up," and it makes sure Waldo appears where the kid is looking just in time.

The downside is when your kid finds out he's going to murder you.


That's brilliantly evil. Part of me wants to implement it just for the evil factor.


+1 for evil


Or it places Waldo just outside of your field of focus, and constantly moves him as you look around.


It would be easier if each Waldo was looking or pointing towards the next one.


Found the article I was referencing - McConkie & Rayner (1975) - http://link.springer.com/article/10.3758/BF03203972


There was an exhibit demonstrating a similar optical phenomenon in the Exploratorium (http://www.exploratorium.edu/) in San Francisco:

The viewer watches a series of projected photographs of a streetscape with small differences between each (a pedestrian appears, a taxi disappears), but they don't see these changes because when the image changes, a small flash occurs, which somehow resets the optic system's change-detection system. The viewer can then press a button to suppress the interstitial flashes, and with no flashes, they can easily see the differences as they occur.


M$ research has AWESOME stuff on foveated graphics. They essentially track your eye movement and only render detail in the specific spot on the display you are looking out. Everything outside your eye's focus area is still rendered but a a much lower quality. Saves immense amounts of resources, imagine your gpu only having to render 20% of the screen at a time!

http://research.microsoft.com/apps/pubs/default.aspx?id=1766...


As an analog to lossy compression - lossy rendering?


Was it really just replacing each other letter with x? I can notice the string of x's with my peripheral vision. Also, the words must have been quite a bit farther apart than they are on my screen.


I was paraphrasing this from memory. The dependency is quite complex, it depends on the amount of degrees the text occupies in your field of vision, but also on the mode of reading you're in (very focused reading seems to narrow the fixation span). Also, please note that your eye makes constant micro-saccades (https://en.wikipedia.org/wiki/Microsaccade), and that the authors of the paper were probably compensating for these saccades as well, more or less.

Quote from the actual paper

    This experiment has provided data which begin to
    answer the question about the size of the perceptual
    span during a fixation in reading. Although it may be
    possible in tasks other than reading for subjects to
    identify letters. word shapes, and word-length
    patterns some distance into the peripheral areas, in
    fluent reading this information appears to be obtained
    and used from a relatively narrow region. Thus. a
    theory of fluent reading need not suppose that
    word-shape and specific letter information is obtained
    from a region occupied by more than about three or
    four words during a fixation. and perhaps not that
    large if the span is not symmetrical around the point
    of central vision, a question not tested in the present
    study. Thus. it does not appear to be true that entire
    sentences are seen during a fixation; in fact. for most
    fixations. not even a complete phrase will lie within
    this area.
So the "window" or "span" that needed to be un-masked was about 3-4 words wide (interestingly, this did not necessarily depend fully on the length of the words)


Yes but you're also able to focus on the words made up entirely of 'x'. I don't think you can really judge the effect because 1) it's not the same effect here and 2) you already know about it now.


Your video reminded me of this selective attention test: https://www.youtube.com/watch?v=vJG698U2Mvo. This Wikipedia article also seems to be related to the phenomenon and talks about the video too: https://en.wikipedia.org/wiki/Inattentional_blindness.


Yes, this is always named in the same breath as my GP post, the gorilla video is a classic. The video I posted much the same thing. It makes you realise how much of what we perceive is really what we think we perceive


This is what happens when I try to read text in a dream.


Basically the inverse being applied? Sentence fully displayed, with the word that's focused on masked...


This could be very useful for optimizing 3d videogames :) Basicaly rendering only part of screen each frame.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: