This page is in draft. Refer to the dev mail list for more information

Droids Web Services is a proposed module (i.e. not yet implemented) that offer web crawling functionalities in cloud computing platform. It works as follows:

  • A web application that expose Droids core functions in Web APIs
    • support URL fetching, HTML/Image parsing, and data extraction
    • Spring HTTP Invoker is chosen. (any binary web remoting technology is fine.)
  • The original Droids client component that is configured to use a remote worker
    • The worker will no longer make local request to do fetch. Instead, it make remoting call to the web services and collect results.

Requirement

  • unlimited scalability / extreme throughput
  • support any cloud computing platform, e.g. Google App Engine, Amazon EC2 etc.
  • share nothing in the server application. no use of session. every remote method call is a complete process.

Dependency

  • Spring
    • allow transparently switch from a local component to a remote component in the client
    • allow easy exposing any service with a Web API
  •  Google App Engine API
    • for use in GAE
    • URL Fetching Service

 Restrictions

  • any component that pass to the remote API must be serializable (for sure!)
  • the master task/link queue is in a single JVM like the original Droids. 

Reference- restrictions in Google App Engine

  • 30s per request
  • No labels