Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Cooliris - Cooliris transforms your browser into a lightning fast, cinematic way to browse photos and videos, both online and on your hard drive.
    • We have a 15-node Hadoop cluster where each machine has 8 cores, 8 GB ram, and 3-4 TB of storage.
    • We use Hadoop for all of our analytics, and we use Pig to allow PMs and non-engineers the freedom to query the data in an ad-hoc manner.<<BR>>
  • Dataium
    • We use Pig to sort and prep our data before it is handed off to our Java Map/Reduce jobs.
  • DropFire
    • We generate Pig Latin scripts that describe structural and semantic conversions between data contexts
    • We use Hadoop to execute these scripts for production-level deployments
    • Eliminates the need for explicit data and schema mappings during database integration

...

  • Mendeley
    • We are creating a platform for researchers to collaborate and share their research online
    • We moved all our catalogue stats and analysis to HBase and Pig
    • We are using Scribe in combination with Pig for all our server, application and user log processing.
    • Pig helps our business analytics, user experience evaluation, feature feedback and more out of these logs.
    • You can find more on how we use Pig and HBase on these slides: http://www.slideshare.net/danharvey/hbase-at-mendeley
  • Mortar Data
    • We provide an open-source development framework and Hadoop Platform-as-a-Service
    • Our service is powered by Pig, which we run on private, ephemeral clusters in Amazon Web Services
  • Ning
    • We use Hadoop to store and process our log file
    • We rely on Apache Pig for reporting, analytics, Cascading for machine learning, and on a proprietary [/hadoop/JavaScript] API for ad-hoc queries
    • We use commodity hardware, with 8 cores and 16 GB of RAM per machine
  • Nokia | Ovi
    • We use Pig for exploring unstructured datasets coming from logs, database dumps, data feeds, etc.
    • Several data pipelines that go into building product datasets and for further analysis use Pig tied together with Oozie to other jobs
    • We have multiple Hadoop clusters, some for R&D and some for production jobs
    • In R&D we run on very commodity hardware: 8-core, 16GB RAM, 4x 1TB disk per data node
  • PayPal
    • We use Pig to analyze transaction data in order to prevent fraud.
    •  We are the main contributors to the Pig-Eclipse project.
  • Realweb - Internet Advertising company based in Russia.
    • We are using Pig over Hadoop to compute statistics on banner views, clicks, user behavior on target websites after click, etc.
    • We've chosen Cloudera Hadoop (http://www.cloudera.com/hadoop/Image Removed) packages on Ubuntu servers 10.04. Each machine has 2/4 cores, 4 GB ram, and 1 TB of storage.
    • All jobs are written using Pig language and only few user defined functions were needed to achieve our needs.
  • Salesforce.com
    • We have multiple clusters in production, a 10 node and 20 node development clusters
    • Hadoop (native Java MapReduce) is used for Search and Recommendations
    • We are using Apache Pig for log processing and Search, and to generate usage reports for several products and features at SFDC
    • Pig makes it easy to develop custom UDFs. We developed our own library containing UDFs and loaders and are actively contributing back to the community
    • The goal is to allow Hadoop/Pig to be used across Data Warehouse, Analytics and other teams making it easier for folks outside engineering to use data
  • SARA Computing and Networking Services
    • We provide a Hadoop service for scientific computing in The Netherlands
    • Pig is being used by a number of scientists for fast exploration of large datasets
    • Sciences extensively using Pig include Information Retrieval and Natural Language Processing
    • Read more on our use of Hadoop in this presentation
    • Read about selected use cases on Hadoop in this blogpost

...

  • Tynt
    • We use Hadoop to assemble web publishers' summaries of what users are copying from their websites, and to analyze user engagement on the web.
    • We use Pig and custom Java map-reduce code, as well as chukwa.
    • We have 94 nodes (752 cores) in our clusters, as of July 2010, but the number grows regularly.

...