This week on “Inside the Stack,” we hear from Steven Benario, a Product Manager at Ufora. Steven and his team have hired one Underdog.io candidate. Ufora has openings on its engineering team.
The Ufora platform empowers data scientists to analyze any data quickly and easily by automating the engineering. It combines speed and scale with the ease-of-use and flexibility of scripting languages like Python or Matlab, so you can work as easily with 100 gigabytes of data as you can with 100 kilobytes.
C++, Python, CoffeeScript, Fora
Ufora is in a unique situation because we built a programming language, Fora, specifically to be the foundation of our software stack. Fora is a dynamically typed, just-in-time compiled, implicitly parallel language. Because we were writing our own JIT compiler, we chose to develop a large part of our infrastructure using C++ so that we could take advantage of LLVM and Clang to rapidly generate machine code.
The vast majority of our web components (both front-end and back-end) is written in CoffeeScript. Python could be considered our “outer shell”, as it’s the glue that holds many of our other components together. Python has a great set of wrappers and libraries that make it easy to directly interact with both AWS and our in-house C++ code.
Angular.js, Backbone.js, Node.js
Historically, we’ve used a combination of Backbone.js and Node.js as our workhorse web frameworks, though we’re currently transitioning from Backbone to Angular.js. We really like the clean separation between logic and presentation that Angular enables.
Node, of course, is a popular choice—it’s flexible, fast, and has a dynamic ecosystem and community.
Redis, S3
We use Redis as our key-value store inside the product. It’s fast, lightweight, flexible, and did we mention fast? For our cloud deployments, we use Amazon S3 as our “big data” store for analytics performed in the platform, as well as for things like backups of user state, log files, etc.
Which DevOps tools do you use?
Chef, Docker, CloudFormation, Pingdom, CloudWatch, TestLooper (proprietary)
Our cloud deployments are built on AWS, so we use a lot of the AWS DevOps tools like CloudFormation and CloudWatch. We also use Pingdom to make sure we’re on top of it in case something goes down.
For all deployments, we recently started using Docker and Chef — we’re big fans of Docker. It’s so easy to spin up docker containers and try something out that we find ourselves doing it all the time, rather than turning on real “machines” in AWS. We’re also starting to use Docker pretty extensively in our test infrastructure.
The last major tool that we use for DevOps is TestLooper — a homegrown framework we use for for statistically backed testing. Much like other CI tools, it continuously monitors Github for new commits, and automatically assigns machines from our test pool to run hundreds or thousands of hours of tests on each and every commit we make.
TestLooper is particularly useful for tracking down subtle defects — things like race conditions or non-deterministic failures that occur once every 100+ hours of runtime. Once it identifies an increase in the failure rate, it will walk backward in the commit chain until it can statistically identify which commit introduced the failure. Once there’s a high confidence that a commit is “clean”, we stop allocating test machines to that commit.
TestLooper allows us to take advantage of AWS Spot Instances, which gives us access to dozens of machines at a fraction of the normal hourly price. For example, as I write this, we have 80 machines up and running in our test pool, for a total cost of around $8/hr. This infrastructure has been instrumental to our success.
Tell us about an interesting engineering challenge that you’ve overcome.
One way to think about Ufora is as an operating system that joins many small machines into one unified large machine. One of the primary challenges in building this operating system has been managing data and thread locality–in other words, ensuring that each computation instruction and the data it operates on are in the right place at the right time. This piece of infrastructure is called Cumulus.
Cumulus manages memory, computation schedules, cache locality, message passing, etc. It handles all of the tasks that you might take for granted when working on a single machine, and it does it all extremely quickly and completely automatically.
For example, when a user imports a 100GB dataset into Ufora, Cumulus is responsible for determining how to stripe that dataset in memory across all the machines in the cluster. When a user wants to start analyzing that data (using a GBM for example), Cumulus breaks down that computation into smaller chunks and ensures that each compute thread is physically co-located with the data it needs, even as the data is being dynamically distributed and redistributed across the cluster while computations are running.
This isn’t nearly as simple as job scheduling—Cumulus needs to make “smart” decisions about whether it’s cheaper to move data over the network to a specific execution thread, or if it’s easier to pick up a running thread and move it to the data on another machine. It also has the ability to pause and split a thread while it’s running, and move one half around the cluster, and resplit recursively.
For example, in one of our popular demos, we count how many Prime numbers occur in the first 100M integers. On a 60-core cluster, Cumulus splits this operation over 780 times, moving threads across CPU cores to automatically saturate all 60 cores with work.
Cumulus poses some very complex computer science challenges and we focus a lot of development time here because it’s one of the most critical components of our infrastructure. But it’s only one piece! The Ufora stack is quite large relative to other projects, spanning from the language level to our web front-end. And, of course, it’s constantly evolving to incorporate the latest developments in complementary technology and to meet the needs of our customers.
If these are areas you have expertise in, what are you waiting for? Get in touch!
Thanks to Steven and the whole Ufora team for a great post. Visit ufora.com to read more about the company and to see their technology in action.
Cumulus Cloud image courtesty of Jeff Kubina.