One of the most important things when working on the cloud is monitoring

I believe that one of the most important things when working on the cloud is monitoring. If you can’t have eyes everywhere, you won’t be able to know when or where things break. As I implement our new Hadoop workflows I’m beginning to add ways for us to peek into the system status. One very fine tool to use is Edward Tufte’s Sparklines as implemented by Joe Gregorio in Python. The picture above is a region of our log file status in S3. We have many servers that upload their log files to S3 for further processing. The top part of the chart is the total size across logs per hour. The bottom is how many files we have extra or missing. As you can see we were missing over 60 files for a few hours and total size was below the mean quite often too. Now I have more work to do.



I believe that one of the most important things when working on the cloud is monitoring. If you can’t have eyes everywhere, you won’t be able to know when or where things break. As I implement our new Hadoop workflows I’m beginning to add ways for us to peek into the system status. One very fine tool to use is Edward Tufte’s Sparklines as implemented by Joe Gregorio in Python. The picture above is a region of our log file status in S3. We have many servers that upload their log files to S3 for further processing. The top part of the chart is the total size across logs per hour. The bottom is how many files we have extra or missing. As you can see we were missing over 60 files for a few hours and total size was below the mean quite often too. Now I have more work to do.


Done reading? subscribe: To get an automatic feed of all future posts subscribe here.
Link to This Post:  
Posted in Development | Share/Save/E-mail

blog comments powered by Disqus You can leave a response, or trackback from your own site.


Subscribe to our RSS Feed

Recent Posts

Archive

Post Categories

Recent Readers


What We're Reading