Walt Disney Animation Studios | Multiple Roles | Los Angeles, CA | Full Time | Onsite
At Walt Disney Animation Studios, technologists and artists work together to advance the art and science of animation. Inspired by our rich legacy, we look ahead to discover new tools and techniques that will shape the future of animated storytelling. Some open roles in Production Technology include:
on the blue ray dvd of pixar's finding dory, closely examine the sand and water (ignore the living creatures) in the first few minutes of the short film piper. you know a-priori it's not real, but i'd like to hear people describe what they think looks off with the environments (not the creatures). try to normalize that with the intent of a director trying to craft an identical ambiance from a recorded video (color temperatures may be made warmer in post, increased saturation, more vignetting, etc).
in particular, i don't think people watching that night-time sequence of deepwater horizon in theaters would identify it as entirely computer generated.
i read this discussion as sillysaurus3 arguing the wrongness of ALL models, while not acknowledging dahart's examples of utility: constrained/simplified for some is good enough for others.
Disney Animation | Interns | Burbank, CA | INTERNS
The Walt Disney Animation Studios Technology Department develops software for our animated films like Moana, Zootopia, and Frozen. Software Engineers work closely with production users to create tools for modeling, rigging, animation, dynamics, shading, effects, look, and/or rendering while leveraging experience in graphics technology, mathematics, and research.
what about using histograms: a histogram is N bins, where N is the number of values that an integer could assume and each bin stores the count of the number of times that value is seen. assume an integer is 32 bits. 2^32 ~= 4 billion bins. to store counts of up to a trillion, we'd use a data type that goes up to at least a trillion, we can use a 64 bit uint for that. so 2^32 bins * 2^3 bytes per count = 2^35 or ~32GB. my hard drive is bigger than that, so we can potentially just break the bins down into a bunch of files, maybe even 1 file per bin if the OS lets us. after we've stored all the numbers in our bins, we just iterate from our first bin adding up all our counts till we hit half a trillion. the bin that one is in is our median.
if we more than 1 computer, we could map the range among all the computers (so if we had 2 computers, 1 computer would take the first 2^31 bins, the 2nd computer would only care about the second 2^31 bins, etc). then you could iterate through all the computers in order, just passing along the current count so far, stopping when you hit half a trillion.
Note that the range of integer values to consider is not specified. We don't know if they are signed and 32 or 64bit values.
The histogram is indeed the best algorithm and the one I would use. I call it the hash select. Note that it may be applied recursively, by narrowing the bins. Quick select is then in fact the hash select with only two bins.
Note also that since histogram filling is distributed, one needs to add up the histograms to find the one containing the median value. This is a good reason for not using histograms with 2^32 or 2^64 bins.
A 1024 bin histogram, would allow to find the median value of 32 bit integers in at most 4 iterations, and 64bit integers in at most 7 iterations.
The algorithm to identify the bin containing the median value is then very similar. One adds up the number of all lower values, until the bin containing the n/2 th value is found. This bin contains the median value.
Instead of adding up all histogram, which may be done pairwise (OlogN), one could do this for each bin, progressively, until the bin containing the median value is found. I guess there is a nice optimal and distributed algorithm to find out there. But we get the general picture.
Another optimization would be to adjust the histogram boundaries to fit the biggest and lowest value in the value set.
I believe the original problem definition called for the data to be randomly spread over 1000 computers. Bringing together 32GB of data from 1000 nodes is going to stress your network, and pre-emptively sharding would be worse.
I think the best way to use histograms is 4 passes with 256 buckets (and bit-twiddling micro-optimizations), but other values are possible.
the "brandykinin hypothsis" links poor COVID-19 outcomes to vitamin D deficiency as vitamin D is a regulator of the Renin Angiotensin System.
https://elifesciences.org/articles/59177#s1