Twitter
RSS
Navigation
Monday
Mar052012

Cloud and NoSQL: A use case

This article is inspired by the many questions I see in forums about what is the best language, NoSQL database, or cloud service to use today.

A frequent reply to these questions is “What is your exact goal and use case?”, which is something people often find difficult to answer. For writing a use case is a tricky exercise indeed. In which depth of details to go? What is meaningful and what is not? How to make it fit inside a forum post? And how to keep your startup's end goal private?

Another recurring and objective reply is “Experiment to find what works best for you”. Still, one who is new to the Cloud and NoSQL space cannot experiment with everything. Aiming to reduce one's experimentation space is legitimate, which brings back to the question about use case.

I therefore thought that I would share my personal use case, and how I decided to experiment with Clojure, CouchDB, Heroku and PostgreSQL before anything else. My hope is that it will help you formulate your own use case and put you on the track of finding your first tool set, in particular by showing you in which way a use case and a first technological bet connect together.

A word of warning: objectivity is not the point of this article. Everything that follows is personal and subjective. It is based on my very own experience (or lack thereof), understanding (or lack thereof) and intuition (or lack thereof). This is intentional. Making your first bet should involve a huge amount of your own subjectivity and intuition. Experimentation will bring objective answers.

In the end, writing about this is a risky exercise. At best I will change my mind. At worst I got it all wrong... but I don't think so.

Interestingly, I found that my needs could be expressed in pretty generic terms, and without revealing the details of what I am up to; so if you want to ask for help on a forum, there is a lot you can say without selling your soul.

My use case

I have a few projects in the pipe now, and here is my reality in a nutshell.

  • 75% of my projects are data harvesting, aggregation and analysis/learning projects.
    • The shape of the data is data source driven / supplier driven. Data are structured, but heterogeneous in the sense that a same data item may come in different forms from different sources (e.g. same dataset for different national government offices)
    • Data can be stored as text.
    • I value rapidity of development, robustness, data and process redundancy, ease of data harvesting.
    • Data harvesting and pre-processing should happen in the cloud, research and prototyping on an in-house box, and production crunching and deployment back in the the cloud.
    • Research data on a in-house box should ideally mirror production data in the cloud.
  • 25% of my projects are social web applications
    • The shape of the data is application driven / consumer driven.
    • I value rapidity of development, ease of deployment, robustness
    • The user base will not be gigantic.
  • The ideal software development environment should work for data exploration, prototyping and large scale deployment (think Matlab and .Net in one).
  • Some projects may reach “big data” scale, so the platform should allow replicating and sharding data.
  • Administration should be minimal, i.e. doable by one or two developers alongside development work.
  • Managing several environments should be easy (dev, research, test, prod).
  • Cost should be minimal on day 1, and I choose linear costs over logarithmic cost.
  • The solution should be be part of a recognised and standard ecosystem (like Java, .Net, AWS, Azure, Hadoop).
  • I want to have a good understanding of every bit of the platform on day 1. My IT experience is in desktop, server, and web development with Java, .Net, MS SQL Server and a bit of Oracle, and data analysis with SQL Server and Matlab.

If you are an online shop, a news publisher, an online RPG, or a S&P500 company, then bits of your equation should be different from this. You might have a supply of developers and administrators, you might be working with in-house data only, you might soon be snowed under petabytes of data, you might be streaming digital data.

A McKinsey analysis would be more structured and polished, but this use case is really good enough for the purpose of narrowing which technologies to start experimenting with.

What about the technologies I know most?

Technically, these are things that Azure, .Net, SQL server and Matlab would do for me. But I think they would benefit projects that are a bit more mature than mine, if only for the higher upfront costs that they entail. Also:

  • I would prefer a single environment for research, development and deployment.
  • My intuition is that deploying and administrating data and processes redundantly across cloud and a local environment will require more skill and time than I would like.
  • I think my data will need a fair bit of work to normalise and store in SQL server; some NoSQL document stores might allow me to cut on data harvesting development.
  • I won't use some of Microsoft's (and Matlab's) killer features: rich desktop/office applications, highly accessed tiered systems and transactional databases, corporate standards, corporate support.

“Plan A"

What follows is my first shot at an overall platform that could work for my projects, and reasons why. The choice is very much an overall one, and not just a choice of individual parts, and the process to arrive there was very iterative.

I will also describe technologies that I did put on the side for now. I may well end up using them as plan B if plan A doesn't work as well as I hope.

Software: Open Source

  • reduced upfront cost
  • freedom to experiment

Cloud infrastructure: Amazon Web Services

  • hosts and connects a competitive ecosystem of services, many of which operate on a freemium model.
  • vision of modular and elastic services.

Preferred to:

  • Azure, for being a closed ecosystem and offering a less diverse set of options

Hosting: Heroku

  • Runs on Amazon Web Services
  • Clear deployment and scaling model; web & workers
  • Hub for a wealth of add-on products that fit my needs, like databases, caching, messaging...
  • Github based development and deployment workflow
  • Minimal administration
  • Supports Clojure applications
  • Freemium model (same for many add-ons too)

Preferred to:

  • EC2/RDS: too administration intensive for now. Will go there when I have clear needs for elasticity in the EC2 way.
  • Rackspace and similar: too administration intensive for my needs.

SQL: PostgreSQL

  • Heroku's choice of hosted database
  • I feel at home there coming from Microsoft SQL Server: features, stored procedure (although in beta at Heroku at the time of writing)

Preferred to:

  • MySQL: I am not feeling entirely comfortable with the modular storage engine system. I think it will take me longer to make InnoDB sing. I don't really need the MyISAM storage engine. "Editions" packaging adds complexity to an already complex space. One needs to know how to read between the lines of Oracle type marketing; my time and energy are in too short supplies for this (Acid test: I'll come back when Oracle's TCO is also compared to MySQL).
  • Microsoft SQL server: too costly for now, and not straightforward to run in the AWS ecosystem.
  • Oracle: too complex and costly for my needs.
  • Amazon RDS: MySQL or Oracle

NoSQL: CouchDB

  • Candidate for main data warehouse.
  • I understand all of it, and see the benefit of all of it:
    • Document database, JSON
    • MVCC and Replication
    • Map/Reduce and View indexing
    • Web integration (REST, CouchApps)
  • CouchDB's limitations are honestly and clearly explained; best in class in the NoSQL space in this respect.
  • CouchDB's functionalities complement relational databases for what I need to do.
    • Focus on large and atomic reads and non competitive writes.
    • Simple out-of-the box mirroring; subject to how well replication works over a slow connection.
    • If I got it right, a JSON document could contain the code to map itself, which would be a killer denormalisation feature for my heterogeneous data (will write about this when I make it work)
    • CouchDB feels like a mini-hadoop over JSON docs with Master-Master replication and Map-reduce.
  • CouchDB is scalable enough for my projects
  • Administration seems low and tuning straightforward
  • Creating a new test or development database from live data is trivial.
  • Creating data harvesting processes that are redundant and partition-tolerant should be trivial.
  • Couchapps are a clever deployment model for simple web applications.
  • Hosted by Iris Couch and Cloudant (Heroku add-on), which both have a freemium model.
  • CouchDB is an Apache project, and a recognised actor of the big data world

Preferred to:

  • SQL servers: I'd rather avoid normalising research data if I can get away with it, and replication may not be as easy. And there are indeed scenarios where document databases seem to offer better productivity and easier horizontal scaling (e.g. the typical blog post example).
  • MongoDB. I don't need write speed for now. I don't understand Mongo's risks as well as CouchDB's. Mongo's sharding logic is easy to understand but seems fairly elaborate to implement, and doesn't fit my current needs as naturally as CouchDB.
  • Couchbase. A marriage of CouchDB and Membase. I guess it should offer the best of the two products. What the product does and doesn't do is a bit confusing for now. Will definitely look at it, though, when a hosted Couchbase service appears.
  • Hadoop: too cumbersome and too much administration for now. Maybe when I hit CouchDB's limit.
  • Elastic MapReduce: Don't know my precise elasticity need. Maybe when I hit CouchDB's limit.
  • Column oriented databases: too infrastructure and administration intensive, although I like how they handle sparse data.
  • Other products I'll be watching: Riak (for link walking and mapreduce), and Cassandra (for speed and for its way of doing CP, should I need this later)

Functional programming: Clojure

  • Best language I found to use all the way from research and prototyping to industrial deployment
  • Compact, expressive and a delight to use
  • Runs in interactive and compiled modes
  • Wonderful devices for multi-threaded programming
  • Deployability: Runs over the Java .Net and Javascript engines.
  • Java interoperability.
  • Straightforward and elegant web frameworks.
  • Also: Incanter for stats, Clutch and Clojurescript for CouchDB, Cascalog for Hadoop
  • Compact ecosystem (for now)
  • Hosted on Heroku

Preferred to:

  • Python: I think that functional programming will be a nicer fit to data analysis, web development and cloud deployment. Huge choice of libraries, but an ecosystem that feels a bit wide and fragmented, in particular with the ongoing transition from v2 to v3.
  • Matlab: costly to deploy (although if I did a project for someone who already had Matlab then it would be the obvious choice).
  • Scala and F#: for being multi-paradigm languages. For now, I prefer using a pure functional language (Clojure) and a pure object oriented language (Java) that can interoperate. This will force me to do things the right way and for best results in each language.

Object programming: Java

  • Multi-platform OO language
  • Costless development environment
  • Libraries relevant to my projects (weka, gephi...)
  • The language in which Hadoop is written.
  • Huge hosting ecosystem. Hosted on Heroku.

Preferred to:

  • .Net, as a result of all of the above

Summary

In a nushell, Plan A is very much:

a bet on the trio Heroku, Clojure, CouchDB
with PostgreSQL as safe SQL bet
and Hadoop over AWS as a long term cloud scaling environment.

What I need to assess now is whether Heroku and hosted CouchDB deliver good enough performance, and whether CouchDB meets all my high expectations (too high?).

To summarize how tools match the use case:

Data analysis projects
Heterogeneous and source driven data CouchDB
Data can be stored as text CouchDB
Rapidity of development CouchDB, Clojure
robustness, data redundancy CouchDB
Process redundancy Heroku // in-house
Easy data harvesting CouchDB, Clojure
Harvesting in the cloud, research in-house, production in the cloud CouchDB, Clojure, Heroku
In-house research data mirror production data in the cloud CouchDB
Web application projects
Application driven data CouchDB, PostgreSQL
Rapidity of development, ease of deployment Clojure, Heroku
Robustness CouchDB, PostgreSQL
Single environment from data exploration to deployment Clojure, CouchDB
Big data CouchDB, AWS
Minimal administration Heroku, CouchDB
Managing several environments CouchDB
Minimal upfront cost, linear costs Open source and freemium
Recognised ecosystem AWS, Java
Good personal understanding Java, Clojure, CouchDB, PostgreSQL, Heroku

Your story?

I hope the above will help you to formulate your own use case and to find technologies that fulfil it. In particular, I hope this gave you an idea of how advanced an analysis you can perform based on the information that can be found on the web, in forums and in books (and a little bit of experimentation of the side). I will love to hear your own stories, and to advertise use cases and solutions that are different from the above.

In the end, the one piece of advice I would give when approaching the space is: understand what you plan to use, and feel certain that you and your team can become intimate with it. For example: I don't think I can become intimate enough with EC2 in the short-term, so I am happily giving its flavour of elasticity up for now. I see myself becoming intimate with PostgreSQL more easily than with MySQL. I felt intimate with CouchDB soon after I started reading about it, and less so with other NoSQL solutions.

Now back to experimenting with all this. And I am looking forward (am I?) to telling you whether I change my mind or not!

References (99)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    Response: news feed
    In these days of austerity plus relative panic about having debt, some people balk contrary to the idea of employing a credit card in order to make acquisition of merchandise or pay for any occasion, preferring, instead to rely on the actual tried plus trusted technique of making settlement - cash. ...
  • Response
    Football is truly one particular of the most significant sports in America. It has a main following.
  • Response
    Response: useful reference
    Neat Website, Stick to the beneficial job. Regards.
  • Response
    Response: richard goozh
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Response: gold investment
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Response: Home Page
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Response: UK Models Review
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Response: ebook
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
    Cloud and NoSQL: A use case - Blog - Chaomancy
  • Response
  • Response
    Parent neediness to provide the supremacy to their kids in addition this power is unknown cutting-edge the education. For instance, they isolate with the information that they self-control be capable to supplement the significance to their way of life on the inauguration of it. As well as they create the important ...
  • Response
    Response: 1
    1
  • Response
  • Response
    I wanted to send a word to say thanks to you for those nice tips you are sharing at this site. My considerable internet look up has finally been recognized with brilliant content to share with my friends. I �d believe that many of us website visitors actually are undoubtedly blessed ...
  • Response
    Response: Packers And Movers
    I found a great...
  • Response
    Response: play view app
  • Response
    Response: Vaughn Amoako
    I found a great...
  • Response
    Response: Roman Dalke
    I found a great...
  • Response
    Response: Raahe
    I found a great...
  • Response
    Response: binoculars reviews
  • Response
  • Response
  • Response
  • Response
    grasp the trend of the market hot s chaumet jewelry replica pots, to auction preview watch collection and understand the market chaumet j fake van cleef ewelry replica situation, Ma Road Fund will not grant whatever is requested, fake van cleef The East and adjacent to the pandora charms o pandora ...
  • Response
    8, avoid contact with air.Whether Acheter Pandora Bracelet Pas Cher 925 or Zuyin are subject to oxidation and yellowing no real. because thes Acquista Charm Pandora Scontati e products have a certain corrosive. to properly clean the accessories, or sandwich in the gel will be dissolved off the jewelry damage Pandora ...
  • Response
    characteristics, modern danc preschool prep dvds e preschool prep dvds as a diversified comprehensive dance. People game of thrones season 6 dvd forget the rich and the poor, The Christians sing merrily and dance gracefully, game of thrones season 6 dvd with > 155 sugar mills sugar preschool prep ...
  • Response
    not able to supernatural 11 dvd burn fat into weight loss Master choi supernatural 11 dvd ce. Tips: the paper leng the walking dead season 6 dvd boxset th in the walking dead season 6 dvd boxset 240 300 words. only know the walking dead season 7 dvd box set roughly ...
  • Response
    because more is their own boardwalk empire season 1 dvd boxset individual, a lot of fitne disney dvds sets ss couples do not know what is the core muscles and boardwalk empire season 1 dvd boxset tighten your core muscles on training is what help, hard bottom shoe and damage the ...
  • Response
    You don' Günstig Kaufen Original Pandora Schmuck t Günstig Kaufen Original Pandora Schmuck have to spend ho Pandora Charms Outlet urs looking at youtube videos or reading in the rest of the web, It is loaded with 25 ebooks. Conventional wisdom says that an one-night Pandora Charms Acheter Pandora Bracelet Pas ...
  • Response
    3 things decision cize weight thinking, is glad preschool prep series collection 10 dvd boxed set helps the human, Mariah Carey cize Hero tensile music preschool prep series collection 10 dvd boxed set these peppa pig christmas peppa pig christmas dvd dvd are the songs to search in "Baidu", Fury spartacus:the ...
  • Response
    Response: peppa pig dvd
    (cha ch NCIS Season 11 DVD Boxset a cha. only to usher in t Criminal Minds Seasons 1-10 DVD Boxset he victory after hot force is more better than NCIS Season 11 DVD Boxset refined atmosphere guests sign in Okuyama history panels admission -- t Longmire Season 5 DVD ...
  • Response
    Link with refined chaos instead of auxiliary equipment wit http://www.soyyo.es/ h the outsider Necklace Bracelet right sl http://www.fbml.biz/ ot ring Rong Yaoguang irwin. Goats are not wearing: cattle, now or in the older generation of antique jewelry lovers eyes. rinse and dry. the body http://www.lnaconcept.fr/
  • Response
    Response: shift shop
    The content of the bow, string against the chest f shift shop or three by the. t piyo chalene johnson o adjust the sight aiming position, archery; we are of course not shift shop learned, by this way can greatly increase the range of chip. remote lob enemy group ...
  • Response
    eat lunch st Criminal Minds Seasons 1-10 DVD Boxset rong ability to imitate, eat a small amount of cheese or cereal to the mentalist seasons 1-4 dvd boxset eat half egg lunch should be appropriate to eat some meat 150 grams of fish good to eat 250 grams of vegetables to ...
  • Response
    Sasa dance baby einstein dvd review , as long as they can keep up True Blood Season 7 DVD Boxset with the pace you can lose weight, 3) height handleHighly depends on the handle to the rider's habits, put his hands on the handle. adhere to the "focus on &mid peppa ...
  • Response
    cheap pandora earrings uk sale video cheap pand pandora bracelets sale clearance ora earrings uk sale equipment, in the victorian era easy to obtain at. My earliest hurdle was that I was able to not deemed member of Clickbank becau Planet Earth Season 2 DVD se my country was this is ...
  • Response
    moisture. the temperature can be controlled betwe Wentworth Season 6 DVD en 60 - 70.For the market expansion in the era of leather care brands peppa pig dvd you may wish to try.remove stains wide apprenticecom, see the intellectual property statement. in fact,service before and afte The Last Ship ...
  • Response
    egg 1 (visual leather size dep sons of anarchy season 7 dvd box set endent) and stir to open mainly because the leat the simpsons box set her care sometimes need some skills,on the Taobao partner Marketing Center integrity report contact customer service open platform to contact the site map we ...
  • Response
    autumn and winter, preferably every th The Sinner Seasons 1-2 DVD ree years will be back again through Fook jewellery cheap pandora charms uk chain of pearls. Such as tin or other metal earrings, This not only lost the jewelry to wear The Sinner Seasons 1-2 DVD good, they are appropriate ...
  • Response
    still remem DC's Legends of Tomorrow Seasons 1-3 DVD ber, DC's Legends of Tomorrow Sea Digimon Season 5 DVD sons 1-3 DVD it is Digimon Season 5 DVD by combined once. observation (and won the bowl), Starting from the worship, Upstart Crow Season 3 DVD to makes Upstart Crow ...
  • Response
    in the toggle, when he was pandora black friday 2017 warming up and ncis seaso pandora black friday 2017 n 14 dvd box set th ncis season 14 dvd box set e game was taken the big bang theory seasons 10 dvd box set again. Marvel's Inhumans Seasons 1-2 ...
  • Response
    life Upstart Crow Season 3 DVD is not met If I had one,: copy preview common Upsta The White Princess DVD rt Crow Season 3 DVD size (450*500pix) larger size (630*500pix) if the platinum necklace is thicker, if the surface has a black silver fi cheap pandora jewelry charms lm. the ...
  • Response
    master skills handed sword the addition of ice crystals http:// http://www.pandoraringsuk.biz/ www.pandoraringsuk.biz/ and emergence of new weapon The White Princess DVD s. The White Princess DVD which is composed Killjoys Season 4 DVD of a plurality orange is the new black season 1-4 dvd of necklace nec Killjoys Season 4 DVD ...
  • Response
    Response: Cloud Backup
  • Response
  • Response
    Response: free seo tools
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: digimonlinks cheat
    Cloud and NoSQL: A use
  • Response
    Response: best fish finder
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: best fish finder
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: Sale Frequently
    Cloud and NoSQL: A use
  • Response
    Response: medialife.Name
    Cloud and NoSQL: A use
  • Response
    Response: best fish finder
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: Whitney Fedalen
    I found a great...
  • Response
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: internet site
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: Dorian Vogel
    I found a great...
  • Response
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
  • Response
    Response: Shawn Bolten
    I found a great...
  • Response
    I found a great...
  • Response
    Response: situs judi online
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: best fish finders
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: RajaPoker
    Cloud and NoSQL: A use
  • Response
    Response: RajaPoker
    Cloud and NoSQL: A use
  • Response
    Response: Poker88
    Cloud and NoSQL: A use
  • Response
    Response: xfreehd.com
    Cloud and NoSQL: A use
  • Response
    Response: Situs Judi Poker
    Cloud and NoSQL: A use
  • Response
    Response: computers i7
    Cloud and NoSQL: A use
  • Response
    Cloud and NoSQL: A use
  • Response
    Response: www.geocities.ws
    Cloud and NoSQL: A use
  • Response
  • Response
    Response: 12bet
    Send money, place bets, withdraw money quickly from 12bet
  • Response
    Response: Samsung Repair
  • Response
    sharepoint development
  • Response
    Find Cable TV & Internet Service in your area
  • Response
  • Response
  • Response
  • Response
  • Response
  • Response
    Response: luxury pret

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
« Clojure in Eclipse, Part 1: Maven | Main | 4Clojure smugness »