Changeset 279
- Timestamp:
- 12/16/07 04:04:20 (8 months ago)
- Files:
-
- trunk/README (modified) (19 diffs)
- trunk/framework/core.rb (modified) (2 diffs)
- trunk/framework/packet_master.rb (modified) (2 diffs)
- trunk/server/master_worker.rb (modified) (2 diffs)
- trunk/server/meta_worker.rb (modified) (5 diffs)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
trunk/README
r272 r279 7 7 from http request/response cycle. 8 8 9 This new release of BackgrounDRb is also modular and can be used without 10 Rails. So any Ruby program or framework can use it. 9 This new release of BackgrounDRb is also modular and can be used without Rails so that any Ruby program or framework can use it. 11 10 12 11 Copyright (c) 2006 Ezra Zygmuntowicz,skaar[at]waste[dot]org, 12 13 13 Copyright (c) 2007 Hemant Kumar (mail[at]gnufied[dot]org) 14 14 … … 50 50 | :trigger_args: */10 * * * * * * 51 51 52 Above sample configuration file would schedule worker methods 'foobar' and 'barbar' 53 to be executed at different trigger periods. Also, it would load production rails environment. 54 If you skip the :environment option, development env will be loaded. 55 56 NOTE: Please note that, because of addition of this feature, format of backgroundrb.yml 57 has changed slightly and hence modify your config file according to this new option. 52 The above sample configuration file would schedule worker methods 'foobar' and 'barbar' from within FooWorker to be executed at different trigger periods. 53 Also, it would load production rails environment. If you skip the :environment option, development environment will be loaded by default. 54 55 NOTE: Because of the addition of this feature the format of backgroundrb.yml has changed slightly and you must modify your config file according to this new option. 58 56 59 57 … … 83 81 84 82 - Cron Scheduling 85 You can use configuration file for cron scheduling of workers. Method specified in configuration 86 file would be called periodically. You should take care of the fact that, time gap between periodic 87 invocation of a method should be more than the time thats actually required to execute the method. 88 If a method takes longer time than the time window specified, your method invocations would lag 89 perpetually. 83 84 You can use a configuration file for cron scheduling of workers. The method specified in the configuration 85 file would be called periodically. You should accommodate for the fact that the time gap between periodic 86 invocation of a method should be more than the time that is actually required to execute the method. 87 If a method takes longer time than the time window specified, your method invocations will lag 88 perpetually. 89 90 90 91 91 - Normal Scheduler … … 93 93 94 94 - add_periodic_timer method 95 A third and very basic form of scheduling that you can use is, "add_periodic_timer" method. You can call 95 A third and very basic form of scheduling that you can use is, "add_periodic_timer" method. You can call this 96 96 method from anywhere in your worker. 97 97 … … 99 99 add_periodic_timer(5) { say_hello } 100 100 end 101 102 Above snippet would register the proc for periodic execution at every 5 seconds. 101 The above snippet would register the proc for periodic execution at every 5 seconds. 103 102 104 103 === A Word about Cron Scheduler 105 104 106 Note that the initial field in the BackgrounDRb cron trigger ,specifies105 Note that the initial field in the BackgrounDRb cron trigger specifies 107 106 seconds, not minutes as with Unix-cron. 108 107 109 108 The fields (which can be an asterisk, meaning all valid patterns) are: 110 109 111 sec[0,59] min[0,59], hour[0,23], day[1,31], month[1,12 [, weekday[0,6], year110 sec[0,59] min[0,59], hour[0,23], day[1,31], month[1,12], weekday[0,6], year 112 111 113 112 The syntax pretty much follows Unix-cron. The following will trigger … … 116 115 0 30 1 * * * * 117 116 118 Following trigger will triggerspecified method every 10 seconds:117 The following will trigger the specified method every 10 seconds: 119 118 120 119 */10 * * * * * * 121 120 122 Following trigger will triggerspecified method every 1 hour:121 The following will trigger the specified method every 1 hour: 123 122 124 123 0 0 * * * * * 125 124 126 125 For each field you can use a comma-separated list. The following would 127 trigger on the fifth, sixteenth and twenty-third minute every hour:126 trigger on the 5th, 16th and 23rd minute every hour: 128 127 129 128 0 5,16,23 * * * * * 130 129 131 130 Fields also support ranges, using a dash between values. The following 132 triggers the eighth through the seventeenth hour,five past the hour:131 triggers from 8th through the 17th hour, at five past the hour: 133 132 134 133 0 5 8-17 * * * * … … 139 138 0 */5 6/2 * * * * 140 139 141 At last a more contrived example: months 0,2,4,5,6,8,10,12, every day 142 and hour, minutes 1,2,3,4,6,20, seconds: every fifth second counting 143 from the twenty-eighth second plus the fifty-ninth second: 140 Here is a more complex example: months 0,2,4,5,6,8,10,12, every day 141 and hour, minutes 1,2,3,4,6,20, seconds: every 5th second counting 142 from the 28th second plus the 59th second: 143 144 144 145 145 28/5,59 1-4,6,20 */1 * 5,0/2 * * … … 153 153 * Generate a Worker 154 154 155 Install the plugin, and run setup task. Create a worker, using worker generator. 156 157 ./script/generatr worker bar 158 159 You will have a bar_worker.rb in your RAILS_ROOT/lib/workers/( called WORKER_ROOT henceforth ). 160 Generated code will look like this: 155 Install the plugin and run the setup task (rake backgroundrb:setup). Now create a worker using worker generator. 156 157 ./script/generate worker bar 158 159 This will create a bar_worker.rb in your RAILS_ROOT/lib/workers/ (called WORKER_ROOT henceforth). The generated code will look like this: 161 160 162 161 class BarWorker < BackgrounDRb::MetaWorker … … 169 168 end 170 169 171 All the workers inside WORKER_ROOT directory will be automatically loaded and forked into a 172 separate process. If you don't want to start one particular worker automatically, then you can173 use following class method to disable that behaviour: 170 All the workers inside WORKER_ROOT directory will be automatically loaded and forked into a separate process. 171 If you don't want to start one particular worker automatically you can use following class method (set_no_auto_load) to disable that behaviour: 172 174 173 175 174 class DynamicWorker < BackgrounDRb::MetaWorker … … 178 177 end 179 178 180 'create' method gets called, whenworker is loaded and created. Each worker runs in its179 The 'create' method gets called when a worker is loaded and created. Each worker runs in its 181 180 own process and you can use 'create' for initializing worker specific stuff. 182 181 183 Following code snippet,would ask bdrb to execute method 'add_values' in 'foo_worker' with182 The following code snippet would ask bdrb to execute method 'add_values' in 'foo_worker' with 184 183 arguments '10+10' and return the result. 185 184 186 185 MiddleMan.send_request(:worker => :foo_worker, :worker_method => :add_values,:data => "10+10") 187 186 188 When you are using "send_request" method, you are expecting a result back and hence, above code 189 will block until, your worker invokes a send response. The worker code, for handling above 187 When you are using the 'send_request' method, you are expecting a result back. As such, the above code 188 will block until your worker invokes a send response. The worker code for handling the above 189 190 190 method would look like 191 191 … … 207 207 MiddleMan.ask_work(:worker => :foo_worker, :worker_method => :add_values, :data => "10+10") 208 208 209 You can also use register_status as described in following snippet to registerstatus of209 You can also use register_status as described in the following snippet to register the status of 210 210 your worker with master, which can be directly queried from rails. 211 211 212 212 register_status(some_status_data) 213 213 214 From rails, you can query status o bject using following code:214 From rails, you can query status of your worker object using following code: 215 215 216 216 MiddleMan.ask_status(:worker => :foo_worker) 217 217 218 Above code would return status object of 'foo_worker'. When you call register_status 219 from a worker, it replaces older state of the worker with master. Since, master process 220 stores status of the worker, all the status queries are served by master itself. It can be 218 The above code would return status object of 'foo_worker'. When you call register_status 219 from a worker, it replaces the older state of the worker with master. Since master process 220 stores the status of the worker, all the status queries are served by master itself. It can be 221 221 222 used to store result hashes and stuff. 222 223 223 224 * Starting and stopping a worker from Rails : 224 225 225 All theworkers can be dynamically started and stopped from rails. You can also use separate job_keys226 All workers can be dynamically started and stopped from rails. You can also use separate job_keys 226 227 to run more than one copy of a worker at a time. 227 228 228 For example, following code in a rails controller will start "error_worker" and schedule to run according to trigger arguments.229 For example, the following code in a rails controller will start "error_worker" and schedule it to run according to the arguments associated with trigger_args. 229 230 230 231 MiddleMan.new_worker(:worker => :error_worker, :job_key => :hello_world,:data => "wow_man",:schedule => { :hello_world => { :trigger_args => "*/5 * * * * * *",:data => "hello_world" }}) 231 232 232 NOTE: Please note that first data argument would be passed to create method inside your worker, however 233 one specified under :schedule heading would be used by worker method, when its schedule comes. 233 NOTE: The first data argument will be passed to the create method inside your worker. However, 234 one specified under the :schedule heading would be used by the worker method when its schedule comes. 235 234 236 235 237 To stop a worker, you can use: 236 238 MiddleMan.delete_worker(:worker => :error_worker, :job_key => :hello_world) 237 239 238 If not job_key is specified, general worker name itself becomes job_key. You should create job_keys with 239 care, so as for one worker, they are never the same. 240 If no job_key is specified the general worker name itself becomes job_key. 241 You should create job_keys with care so they are never the same for one worker class. 242 240 243 241 244 * Starting and stopping from CLI : … … 249 252 * Query Status/Result of a worker : 250 253 251 All Workers, can log their results with master, using 'register_status' method, this status can be queried from 252 rails using ask_status. For example: 254 All Workers can log their results with master, using the 'register_status' method. 255 This status can be queried from rails using ask_status. For example: 256 253 257 254 258 class ProgressWorker < BackgrounDRb::MetaWorker … … 264 268 end 265 269 266 And using MiddleMan proxy, you can keep que eringstatus of your progress bar:270 And using MiddleMan proxy, you can keep querying the status of your progress bar: 267 271 268 272 MiddleMan.ask_status(:worker => :progress_worker) … … 270 274 * Query status of All workers : 271 275 272 You can also , querystatus of all currently running workers in one shot.276 You can also query the status of all currently running workers in one shot. 273 277 274 278 def ask_status … … 278 282 end 279 283 280 Currently, when a worker is deleted/exits, its result/status is also gone and hence281 you can't query status of a worker, which is not running.This behaviour is expected to282 change. 284 Currently, when a worker is deleted/exits, its result/status is also gone (i.e. you can't 285 query the status of a worker which is not running). This behaviour is expected to change in future releases. 286 283 287 284 288 * Important difference between MiddleMan.ask_work and MiddleMan.send_request : 285 289 286 As noted previously ask_work is used , when you want one shot execution of a worker method,287 without waiting for results in rails. So , there aren't any explicit return statement isrequired.288 But , when you use MiddleMan.send_request, you are asking BDRB,ok please execute this method289 on worker and I w ould wait for results until method returns. Hence in this case, you must return290 As noted previously ask_work is used when you want one shot execution of a worker method 291 without waiting for results in rails. So an explicit return statement is not required. 292 But when you use MiddleMan.send_request, you are asking BDRB, "ok please execute this method 293 on worker and I will wait for results until the method returns". Hence in this case, you must return 290 294 the value you want to get back in rails. 291 295 292 Since, not all objects can be dumped in ruby, and if you are trying to send an object, which 293 can't be dumped, you will get error messages logged in your log file and will get error string in 294 controller too. 295 296 For example, lets say you are invoking method "hello_world" from 'foo_controller' like this: 296 Not all objects can be dumped in ruby. If you are trying to send an object which 297 can't be dumped, you will get error messages logged in your log file and will get an error string in your 298 controller, too. 299 300 301 For example, let's say you are invoking method "hello_world" from 'foo_controller' like this: 297 302 298 303 worker_response = MiddleMan.send_request(:worker => :foo_worker, :worker_method => :hello_world) … … 305 310 end 306 311 307 Now , since a lambda can't be dumped, the worker_response that you will receive incontroller will be,308 'invalid_result_dump_check_log' and a ppropriate error will be logged in backgroundrb.log file too.309 Now, originally this error could have potentially aborted BDRB worker hencemake sure that you312 Now since a lambda can't be dumped, the worker_response that you will receive in your controller will be, 313 'invalid_result_dump_check_log' and an appropriate error will also be logged in the backgroundrb.log file. 314 Now, such an error could potentially abort the BDRB worker. Hence, make sure that you 310 315 avoid such cases. 311 316 317 * Running BackgrounDRb clusters and storing of results in Memcache cluster 318 319 New version allows access to worker status objects even after a worker has died/exited. By default, 320 this data would be held in Master Process memory. Those of you, who want to run, BackgrounDRb in 321 a cluster, and if you run a BackgrounDRb server on each node and would rather want results to be 322 stored in MemCache, you can use following option for storing results in MemCache: 323 324 # backgroundrb.yml 325 326 | :backgroundrb: 327 | :port: 11006 328 | :ip: 0.0.0.0 329 | :log: foreground 330 | :result_storage: 331 | :memcache: "10.10.10.2:11211,10.10.10.6:11211" 332 333 334 * Using Threads inside BackgrounDRb 335 336 Remember BackgrounDRb follows event model of network programming, but sad truth of life is 337 not all networking libraries follow this model and hence they make use of blocking IO and threads. 338 But you need not fear, BackgrounDRb allows you to run all such tasks concurrently in threads 339 which are internally managed by BackgrounDRb thread pool. 340 341 Each worker has access to object "thread_pool" which can be used to run task in threads concurrently. 342 343 thread_pool.defer(wiki_scrap_url) { |wiki_url| scrap_wikipedia(wiki_url) } 344 345 So whatever task you specify within scrap_wikipedia() is going to run concurrently. 346 347 WARNING: You shouldn't try to use +register_status+ method from within the block supplied to +defer+. Because, if you do that, 348 you can get corrupted result hashes. However, if you are confident, you should wrap your status_hash ( or whatever data type, you 349 are going to store as a status ) in a mutex and then use +register_status+ . It would make sure that, only one thread 350 resisters status at a time. 351 352 312 353 * Internal Server and Unhandled Exception Logging on console : 313 354 314 Sometimes, you may want all the internal error messages and unhandled exceptions to appear on console. 315 For that, you can start backgroundrb with config option : 355 Sometimes you may want all the internal error messages and unhandled exceptions to appear on the console. 356 For that, you can start backgroundrb with the following config option : 357 316 358 317 359 # backgroundrb.yml … … 328 370 329 371 === Testing 330 * where will you be without test cases Phaedrus? New version comes with a baked in mechanism to write test cases. 331 First make sure that, you have bdrb_test_helper.rb in test directory of your rails app ( run 332 rake backgroundrb:setup, if you dont have one ). 372 373 * where will you be without test cases Phaedrus? This new version comes with a baked in mechanism to write test cases. 374 First make sure that you have bdrb_test_helper.rb in the test directory of your rails app (run 375 rake backgroundrb:setup, if you dont have one). 376 333 377 Just put your worker test cases in test/unit directory of your rails application and require the helper. 334 378 Now, you should be good to go. … … 347 391 === Legacy and deprecated stuff 348 392 349 Although, You need to wrap your head a bit for understanding "evented" model of network programming, 350 but it gets easier once you get hang of it. Much of the older stuff is deprecated. Here is a brief list: 393 Although You need to wrap your head a bit to understanding the "evented" model of network programming, 394 it gets easier once you get hang of it. Much of the older stuff is deprecated. Here is a brief list: 395 351 396 352 397 - ACL : gone, trust to thy firewalls. … … 354 399 === Exciting new stuff 355 400 * Rock solid stable ( or will be , after few bug reports ) 356 * Each worker comes with Event loop of its own and can potentially do lots of fancy stuff. Two noteworthy methods are:401 * Each worker comes with an Event loop of its own and can potentially do lots of fancy stuff. Two noteworthy methods are: 357 402 358 403 connect(ip,port,Handler) 359 404 start_worker(ip,port,Handler) 360 405 361 If you are familiar with EventMachine or Twisted style of network programming,above methods allow you to362 start tcp servers inside your workers or let syou connect to external tcp servers. For Each accepted client or363 connected socket ainstance of Handler class would be created and integrated with main event loop.406 If you are familiar with the EventMachine or Twisted style of network programming, the above methods allow you to 407 start tcp servers inside your workers or let you connect to external tcp servers. For Each accepted client or 408 connected socket, an instance of Handler class would be created and integrated with main event loop. 364 409 This can be used for worker to worker communication between backgroundrb servers running on two machines. 365 410 366 You are encouraged to look into framework directory, and see the code that implements all this stuff.Guts of 367 new bdrb is based on this library, which would be released soon as separately. 411 You are encouraged to look into framework directory and see the code that implements all this stuff. The guts of 412 this new version of bdrb is based on this library which will be released soon as a separate entity. 413 368 414 369 415 == Online Resources trunk/framework/core.rb
r275 r279 144 144 loop do 145 145 check_for_timer_events 146 user_thread_window #=> let user level threads run for a while 146 147 ready_fds = select(@read_ios,@write_ios,nil,0.005) 147 148 #next if ready_fds.blank? … … 158 159 end 159 160 end 161 end 162 163 def user_thread_window 164 run_user_threads if respond_to?(:run_user_threads) 160 165 end 161 166 trunk/framework/packet_master.rb
r275 r279 14 14 def self.run 15 15 master_reactor_instance = new 16 master_reactor_instance.result_hash = {}16 # master_reactor_instance.result_hash = {} 17 17 master_reactor_instance.live_workers = DoubleKeyedHash.new 18 18 yield(master_reactor_instance) … … 20 20 master_reactor_instance.start_reactor 21 21 end # end of run method 22 23 def set_result_hash(hash) 24 @result_hash = hash 25 end 22 26 23 27 def update_result(worker_key,result) trunk/server/master_worker.rb
r278 r279 123 123 124 124 class MasterProxy 125 attr_accessor :config_file 125 126 def initialize 126 config_file = YAML.load(ERB.new(IO.read("#{RAILS_HOME}/config/backgroundrb.yml")).result) 127 debug_logger = DebugMaster.new(config_file[:backgroundrb][:log]) 127 raise "Running old Ruby version, upgrade to Ruby >= 1.8.5" unless check_for_ruby_version 128 @config_file = YAML.load(ERB.new(IO.read("#{RAILS_HOME}/config/backgroundrb.yml")).result) 129 debug_logger = DebugMaster.new(@config_file[:backgroundrb][:log]) 128 130 129 load_rails_env (config_file)131 load_rails_env 130 132 Packet::Reactor.run do |t_reactor| 133 enable_memcache_result_hash(t_reactor) if @config_file[:backgroundrb][:result_storage] && @config_file[:backgroundrb][:result_storage][:memcache] 131 134 t_reactor.start_worker(:worker => :log_worker) 132 t_reactor.start_server( config_file[:backgroundrb][:ip],config_file[:backgroundrb][:port],MasterWorker) { |conn| conn.debug_logger = debug_logger }135 t_reactor.start_server(@config_file[:backgroundrb][:ip],@config_file[:backgroundrb][:port],MasterWorker) { |conn| conn.debug_logger = debug_logger } 133 136 end 134 137 end 135 138 136 def load_rails_env (config_file)139 def load_rails_env 137 140 db_config_file = YAML.load(ERB.new(IO.read("#{RAILS_HOME}/config/database.yml")).result) 138 run_env = config_file[:backgroundrb][:environment] || 'development'141 run_env = @config_file[:backgroundrb][:environment] || 'development' 139 142 ENV["RAILS_ENV"] = run_env 140 143 RAILS_ENV.replace(run_env) if defined?(RAILS_ENV) … … 142 145 ActiveRecord::Base.allow_concurrency = true 143 146 end 144 end 147 148 def enable_memcache_result_hash(t_reactor) 149 require 'memcache' 150 memcache_options = { 151 :c_threshold => 10_000, 152 :compression => true, 153 :debug => false, 154 :namespace => 'backgroundrb_result_hash', 155 :readonly => false, 156 :urlencode => false 157 } 158 cache = MemCache.new(memcache_options) 159 cache.servers = @config_file[:backgroundrb][:result_storage][:memcache].split(',') 160 t_reactor.set_result_hash(cache) 161 end 162 163 def check_for_ruby_version; return RUBY_VERSION >= "1.8.5"; end 164 165 end # end of module BackgrounDRb 145 166 end 146 167 trunk/server/meta_worker.rb
r277 r279 14 14 end 15 15 end 16 17 class WorkData 18 attr_accessor :data,:block 19 def initialize(*args,&block) 20 @data = args 21 @block = block 22 end 23 end 24 25 class ThreadPool 26 attr_accessor :size 27 attr_accessor :threads 28 attr_accessor :work_queue 29 def initialize(size) 30 @size = size 31 @threads = [] 32 @work_queue = Queue.new 33 @running_tasks = Queue.new 34 @size.times { add_thread } 35 end 36 37 # can be used to make a call in threaded manner 38 # passed block runs in a thread from thread pool 39 # def fetch_url(url) 40 # puts "fetching url #{url}" 41 # thread_pool.defer(url) do |url| 42 # begin 43 # data = Net::HTTP.get(url,'/') 44 # File.open("#{RAILS_ROOT}/log/pages.txt","w") do |fl| 45 # fl.puts(data) 46 # end 47 # rescue 48 # logger.info "Error downloading page" 49 # end 50 # end 51 # end 52 # you can invoke above method from rails as: 53 # MiddleMan.ask_work(:worker => :rss_worker, :worker_method => :fetch_url, :data => "www.example.com") 54 55 def defer(*args,&block) 56 @work_queue << WorkData.new(args,&block) 57 end 58 59 def add_thread 60 @threads << Thread.new do 61 while true 62 task = @work_queue.pop 63 @running_tasks << task 64 if task.data && !task.data.empty? 65 task.block.call(*(task.data)) 66 else 67 task.block.call 68 end 69 @running_tasks.pop 70 end 71 end 72 end 73 74 # method ensures exclusive run of deferred tasks for 2 seconds, so as they do get a chance to run. 75 def exclusive_run 76 if @running_tasks.empty? && @work_queue.empty? 77 return 78 else 79 puts "going to sleep for a while" 80 sleep(2) 81 return 82 end 83 end 84 end 85 16 86 # == MetaWorker class 17 87 # BackgrounDRb workers are asynchrounous reactors which work using events … … 78 148 class MetaWorker < Packet::Worker 79 149 attr_accessor :config_file, :my_schedule, :run_time, :trigger_type, :trigger 80 attr_accessor :logger 150 attr_accessor :logger, :thread_pool 81 151 82 152 # does initialization of worker stuff and invokes create method in 83 153 # user defined worker class 84 154 def worker_init 155 @thread_pool = ThreadPool.new(20) 156 85 157 @config_file = YAML.load(ERB.new(IO.read("#{RAILS_HOME}/config/backgroundrb.yml")).result) 86 158 # load_rails_env … … 93 165 end 94 166 if respond_to?(:create) 95 create(@worker_options[:data]) 167 create_arity = method(:create).arity 168 (create_arity == 0) ? create : create(@worker_options[:data]) 96 169 end 97 170 @logger.info "#{worker_name} started" … … 166 239 end 167 240 241 # probably this method should be made thread safe, so as a method needs to have a 242 # lock or something before it can use the method 168 243 def register_status p_data 169 244 status = {:type => :status,:data => p_data} … … 229 304 end 230 305 306 # method would allow user threads to run exclusively for a while 307 def run_user_threads 308 @thread_pool.exclusive_run 309 end 310 231 311 # we are overriding the function that checks for timers 232 312 # def check_for_timer_events
