gmond 信息搜集和发送策略

gmond信息搜集的最小单位是metric,每个metric又属于一个group,发送信息(UDP包)的最小单位是group

collection_group {
  collect_every = 20
  time_threshold = 90
  /* Load Averages */
  metric {
    name = "load_one"
    value_threshold = "1.0"
    title = "One Minute Load Average"
  }
  metric {
    name = "load_five"
    value_threshold = "1.0"
    title = "Five Minute Load Average"
  }
  metric {
    name = "load_fifteen"
    value_threshold = "1.0"
    title = "Fifteen Minute Load Average"
  }
}

collect_every表示每过几秒搜集信息,比如上例中每过20秒,gmond搜集一次load信息。但是搜集后的信息是不发送的,发送信息的时间间隔由time_threshold控制,例如上例中每90秒才发送load的udp包。

为了及时搜集到异常数据,gmond中的metric还有一个配置项value_threshold。在上面这个例子中,每过20秒gmond会得到一个新数据,但是要过90秒间隔才会发送,但是当新数据和旧数据的差值超过阈值,即value_threshold时,将会忽略time_threshold所设置的值,直接发送数据。上述3个load指标只要一个发生异常,就会发送整个group,因为发送是以group为单位的。

gmond的python扩展

最先要参考的当然是ganglia的官方wiki

官方提供了个python扩展模块的github库

python扩展编写

因为没有看过源码,而官方wiki讲得又不是十分清晰,这里就按照wiki中描述的一份扩展对应两份文件来配置(模糊点在于多个python模块是否可以对应一个.pyconf配置文件,应该是可以,但是没有验证)。

下面以编写load监控为例

#!/usr/bin/env python
# -*- coding: utf-8 -*-

load_file = "/proc/loadavg"


def get_loads():  
  ret = None
  try:
    f = open(load_file, 'r')
  except IOError:
    return -1
  ret = f.read()
  ret = ret.split()
  return ret[0:3]

def get_load1(name):
  return float(get_loads()[0])
  
def get_load5(name):
  return float(get_loads()[1])

def get_load15(name):
  return float(get_loads()[2])  
  
def metric_init(params):
  global descriptors, load_file

  if 'load_file' in params:
    load_file = params['load_file']
  l1 = {'name': 'load1',
        'call_back': get_load1,
        'time_max': 90,  
        'value_type': 'float',
        'units': '',
        'format': '%f',
        'description': 'lond_one',
        'groups': 'load'}
  l5 = {'name': 'load5',
        'call_back': get_load5,
        'time_max': 90,  
        'value_type': 'float',
        'units': '',
        'format': '%f',
        'description': 'lond_five',
        'groups': 'load'}
  l15 = {'name': 'load15',
        'call_back': get_load15,
        'time_max': 90,  
        'value_type': 'float',
        'units': '',
        'format': '%f',
        'description': 'lond_load_fifteen',
        'groups': 'load'}
  descriptors = [l1,l5,l15]
  return descriptors

def metric_cleanup():
    '''Clean up the metric module.'''
    pass

#This code is for debugging and unit testing
if __name__ == '__main__':
    metric_init({})
    for d in descriptors:
        v = d['call_back'](d['name'])
        print 'value for %s is %f' % (d['name'],  v)

相应的配置文件

modules {
  module {
    name = "gmond_python_test"
    language = "python"
  }
}

collection_group {
  collect_every = 10
  time_threshold = 15
  metric {
    name = "load1"
    title = "load one"
    value_threshold = 1.0
  }
  metric {
    name = "load5"
    title = "load five"
    value_threshold = 1.0
  }
  metric {
    name = "load15"
    title = "load fifteen"
    value_threshold = 1.0
  }
}



blog comments powered by Disqus

Published

23 December 2013

Tags