Please report new issues athttps://github.com/DOCGroup
When I launched the NotificationService (manually or via the IMR) I got the same error: Loading the Cos Notification Service... in TAO_Properties ctos 4005cf9c Running 1 server threads Cannot activate clients threads Failed to initialize the Notification Service. ... For information this worked fine on the same platform with the same compiler with the ACE/TAO 1.3.1/5.3.1 release. Greg (Greg_Mulyar@cable.comcast.com) sent the following patch to the list: The patch patch below fixes the Notify_Service.cpp problem by reverting it back to where it was in older versions and it seem to work OK. 8<---- 8<---- 8<---- --- ACE_wrappers/TAO/orbsvcs/Notify_Service/Notify_Service.cpp.old Tue Nov 18 11:11:37 2003 +++ ACE_wrappers/TAO/orbsvcs/Notify_Service/Notify_Service.cpp Tue Nov 18 11:13:03 2003 @@ -105,14 +105,14 @@ // Task activation flags. long flags = THR_NEW_LWP | - THR_JOINABLE | - this->orb_->orb_core ()->orb_params ()->thread_creation_flags (); + THR_JOINABLE ; // | + // this->orb_->orb_core ()->orb_params ()->thread_creation_flags (); - int priority = ACE_Sched_Params::priority_min (this->orb_->orb_core ()->orb_params ()->sched_policy () - , this->orb_->orb_core ()->orb_params ()->scope_policy ()); + // int priority = ACE_Sched_Params::priority_min (this->orb_->orb_core ()->orb_params ()->sched_policy () + // , this->orb_->orb_core ()->orb_params ()->scope_policy ()); - if (worker_.activate (flags, - this->nthreads_, 0, priority) != 0) + // if (worker_.activate (flags,this->nthreads_, 0, priority) != 0) + if (worker_.activate (flags,this->nthreads_) != 0) ACE_ERROR_RETURN ((LM_ERROR, "Cannot activate client threads\n"), -1); } 8<---- 8<---- 8<----
Accepting.
Simon, any ideas when this gets tested and integrated? We see it also in our HP builds
Hi Johnny, The patch as it is is apparently is not suitable for integration. Doug said at the time: "Sigh... We need to get a fix here that works properly on HP-UX and other platforms. Simon, can you please take a look at this HP-UX stuff when you come up for air and see if you can figure out a solution that works for HP-UX and other platforms without commenting out all the existing code?! I've enclosed Greg's workaround below, which solves the problem but not really in the "Right Way"[TM]." I'm afraid I never got any time to look into this further and I'm not in a position where I'm able to commit to dedicating any time to doing so now. I'm assigning this over to Paul Morrison who has responsibility for allocating resources to issues like this so that he can answer you.
Any idea when someone has time to look at this? All notification service tests fail on HPUX.
Updated version numbers. This still is a problem on HPUX that prevent all notification tests from running
Created attachment 544 [details] smaller patch for the client threads issue
I have the same problem on x86_64-pc-linux-gnu and on i686-pc-linux-gnu. When I start Notify_Service with the following parameters: Notify_Service -Boot -d -ORBEndPoint iiop://localhost:6142 -Factory NotificationService -ORBSvcConf /home/emil/disimbin/disim-main-energy/notifysvc.conf and with the follwing notifysvc.conf: ===================================== static Notify_Default_Event_Manager_Objects_Factory "-DispatchingThreads 5" static Server_Strategy_Factory "-ORBconcurrency thread-per-connection" ===================================== then I get the following output: >>>>>>>>>>>>>>>>>>>>>>>>>> Using 5 threads for each ConsumerAdmin. Loading the Cos Notification Service... Running 1 ORB threads Cannot activate client threads Failed to initialize the Notification Service. pure virtual method called terminate called without an active exception Aborted >>>>>>>>>>>>>>>>>>>>>>>>>>> A colegue of mine discovered that the problem is that there is no way to change the default thread creation flags and proposed a patch which is similar as the previous but smaller.
For the smaller patch that works on x86_64-pc-linux-gnu and on i686-pc-linux-gnu see http://deuce.doc.wustl.edu/bugzilla/showattachment.cgi?attach_id=544
Back into the pool.
Steve, can maybe OCI look at this issue as main maintainer for the notification service?
This is a known problem on HP-UX. We have not seen the problem on any other platform. The solution is documented in the OCI TAO 1.4a Developer's Guide. Here is the note from the text that describes the problem and the solution: <quote> When the Notification Service creates threads for a thread pool, it specifies a default priority as part of the thread creation parameters. On HP-UX version 10 and later, the process owner must be a member of a group that has the RTSCHED privilege in order to specify a priority for a new thread. Without this privilege, the thread creation operation will return an error (EPERM). Other operating systems do not exhibit this behavior. On HP-UX, you can add the RTSCHED privilege to all members of a group with the setprivgrp(1M) command. You must be super-user to execute this command. For example, for a group named "corbausers", the command would be entered as follows: # setprivgrp corbausers RTSCHED The user that starts the Notification Service should be a member of this group. By default, the effects of the setprivgrp command are lost after a reboot. See <http://www.faqs.org/faqs/hp/hpux-faq/> for information on how to ensure the privilege group changes become permanent. </quote> To make this permanent, you must edit /etc/privgroup and add a line, corbausers RTSCHED which will be used by /sbin/init.d/set_prvgrp start command on the next reboot to reenable this priviledge.
thanks Steve for sorting this out, I will change our HPUX system to use this and see what happens. I couldn't find this explained in the documentation in the TAO distribution, can this be added? Is it possible to check the priviliges in the perl script when run on HPUX and when it is not set correctly give an understandable error with a reference to the documentation?
I have quickly checked HPUX, getprivgrp returns the setting, I am not a perl wizard, but it seems that it should be possible that when on hpux, call the method, do a grep for RTSCHED, if not there, return an error.
Re-assigning to Chad Elliott.
the solution Steve described does resolve the runtime problems on our build system, the test scripts should then be changed to check the pre conditions on hpux
Couldn't we compensate for this programatically. If the task activate fails, couldn't we try a lower priority. Apparently, from other comments on this bug, it isn't restricted to just HP-UX.
I don't think it is a matter of choosing a priority that is too high, but trying to set *any* priority on HP-UX if not in the RTSCHED class. You may be right, it may be possible to detect the failure programmatically. But, what should the Notification Service do in that case? It is a service, so it should probably not keep running if it cannot run properly. That will still show up as a failure from a test's point of view. As I mentioned, we've never seen this problem on any platform other than HP-UX. Perhaps Emil Georgiev could expand on the comments about seeing it on Linux to explain more about the scenario, the platform, permissions, etc.
The only code change I would like to see is that the notification service exits with a more understandable error then it is now doing. We now get a "pure virtual function" called, it would be much better if the notification service would also exit with an error message referring to the documentation. That way when someone starts the naming service not from within the test framework he gets an understandable message
Just found on the scoreboard that the new fc6 builds to have the same problems, see http://www.dre.vanderbilt.edu/scoreboard/FC6_IPV6/2007_01_16_01_45_Brief.html#section_7
I can reproduce this on Linux. I'm debugging it now.
This is an issue with static destruction. In frame 6, there is a call to CORBA::release() on a CORBA::TypeCode_ptr. This call should be fine, considering it uses a null reference counting policy. However, the static object to which it points has already been destroyed. So any attempt to dereference it causes the pure virtual function called issue. #0 0x00000039e9f2f3b0 in raise () from /lib64/libc.so.6 #1 0x00000039e9f30860 in abort () from /lib64/libc.so.6 #2 0x00000039ec6bdd1f in __gnu_cxx::__verbose_terminate_handler () from /usr/lib64/libstdc++.so.6 #3 0x00000039ec6bc156 in __gxx_personality_v0 () from /usr/lib64/libstdc++.so.6 #4 0x00000039ec6bc17b in std::terminate () from /usr/lib64/libstdc++.so.6 #5 0x00000039ec6bc5bb in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6 #6 0x00002aaaac28327d in CORBA::release (obj=0x2aaaabb435a0) at /tao_builds/chad/doc/build/linux/TAO/tao/AnyTypeCode/TypeCode.inl:19 #7 0x00002aaaab9d5840 in TAO::Any_Dual_Impl_T<NotifyExt::ThreadPoolParams>::free_value (this=0x52f380) at /tao_builds/chad/doc/build/linux/TAO/tao/AnyTypeCode/Any_Dual_Impl_T.cpp:199 #8 0x00002aaaac249c7c in TAO::Any_Impl::_remove_ref (this=0x52f380) at AnyTypeCode/Any_Impl.cpp:104 #9 0x00002aaaac2422f4 in ~Any (this=0x524060) at AnyTypeCode/Any.cpp:49 #10 0x00002aaaaae5b34a in ~Property (this=0x524058) at ../../orbsvcs/orbsvcs/CosNotificationC.h:157 #11 0x00002aaaaae5b5d7 in TAO::details::unbounded_value_allocation_traits<CosNotification::Property, true>::freebuf (buffer=0x524058) at /tao_builds/chad/doc/build/linux/TAO/tao/Unbounded_Value_Allocation_Traits_T.h:45 #12 0x00002aaaaae5b601 in TAO::details::generic_sequence<CosNotification::Property, TAO::details::unbounded_value_allocation_traits<CosNotification::Property, true>, TAO::details::value_traits<CosNotification::Property, true> >::freebuf (buffer=0x524058) at /tao_builds/chad/doc/build/linux/TAO/tao/Generic_Sequence_T.h:300 #13 0x00002aaaaae5b629 in ~generic_sequence (this=0x52f2c0) at /tao_builds/chad/doc/build/linux/TAO/tao/Generic_Sequence_T.h:138 #14 0x00002aaaab944191 in ~unbounded_value_sequence (this=0x52f2c0) at /tao_builds/chad/doc/build/linux/TAO/tao/Unbounded_Value_Sequence_T.h:25 #15 0x00002aaaab93e9b3 in ~PropertySeq (this=0x52f2b8) at CosNotificationC.cpp:269 #16 0x00002aaaaae9c5de in ~TAO_Notify_Properties (this=0x52f278) at Notify/Properties.cpp:41 #17 0x00002aaaaae9d3f2 in ~TAO_Singleton (this=0x52f270) at /tao_builds/chad/doc/build/linux/TAO/tao/TAO_Singleton.h:48 #18 0x00002aaaaae9d46f in TAO_Singleton<TAO_Notify_Properties, ACE_Thread_Mutex>::cleanup (this=0x52f270) at /tao_builds/chad/doc/build/linux/TAO/tao/TAO_Singleton.cpp:107 #19 0x00002aaaac85c0d0 in ace_cleanup_destroyer (object=0x52f270, param=0x0) at Cleanup.cpp:33 #20 0x00002aaaac85c636 in ACE_OS_Exit_Info::call_hooks (this=0x517e58) at Cleanup.cpp:183 #21 0x00002aaaac5837d2 in TAO_Singleton_Manager::fini (this=0x517e30) at TAO_Singleton_Manager.cpp:249 #22 0x00002aaaac583595 in TAO_SINGLETON_MANAGER_CLEANUP_DESTROYER_NAME () at TAO_Singleton_Manager.cpp:54 #23 0x00002aaaac85c670 in ACE_OS_Exit_Info::call_hooks (this=0x50b1b8) at Cleanup.cpp:188 #24 0x00002aaaac895bc9 in ACE_Object_Manager::fini (this=0x50b1a0) at Object_Manager.cpp:609 #25 0x00002aaaac895f35 in ~ACE_Object_Manager (this=0x50b1a0) at Object_Manager.cpp:318 #26 0x00002aaaac896210 in ~ACE_Object_Manager_Manager (this=0x2aaaaca7b840) at Object_Manager.cpp:765 #27 0x00002aaaac897f5c in __tcf_0 () at Object_Manager.cpp:773 #28 0x00000039e9f31dcc in __cxa_finalize () from /lib64/libc.so.6 #29 0x00002aaaac80a733 in __do_global_dtors_aux () from /tao_builds/chad/doc/build/linux/ACE/lib/libACE.so.5.5.4 #30 0x00007fffffdc1180 in ?? () #31 0x00002aaaac8fd8b1 in _fini () from /tao_builds/chad/doc/build/linux/ACE/lib/libACE.so.5.5.4 #32 0x0000000000000000 in ?? ()
Created attachment 649 [details] Here is a possible "solution" to this problem. Unfortunately, it requires leaking memory but it would only happen once during program execution since TAO_Notify_Properties is a singleton.
is it possible to get rid of the singleton, what if we want to run multiple notification services with different settings in the same process?
Anythings possible, but removing the TAO_Notify_Properties singleton would be a big change. 32 files under orbsvcs/orbsvcs/Notify use TAO_Notify_Properties. It may be easier to allow the Notify_Service to explicitly set the singleton from a stack or heap allocated TAO_Notify_Properties object to avoid the static destruction problem.
Created attachment 650 [details] Another possible "solution". This modification changes TAO_Notify_Properties into more of a holder than a singleton.
seems the holder is a good step forward in resolving the static problem, can we get that into x.5.5 and then maybe in x.5.6 get rid of the singleton completely?
The work-around is in and functional so I'm decreasing the priority and severity. I also looked at removing the properties singleton and it will be pretty extensive.
This one seems to block 2926, the singleton causes a crash because at the program exit the destruction of the singleton causes a crash because the typecodes are already unloaded
(In reply to comment #28) > This one seems to block 2926, the singleton causes a crash because at the > program exit the destruction of the singleton causes a crash because the > typecodes are already unloaded This bug is assigned to Chad, but he is very busy with a customer-funded activity. I will see if someone else can look into this issue, with Chad's guidance, next week.
looks the workaround is only used for the executable, not for the dll that can be loaded on demand, we need to add it there also, the singleton shouldn't be destructed at this moment, we can only leak it at process shutdown and let the OS just free the memory. The full fix would be to not use a singleton
proposed related patch when the notification service is loaded as dll, then the Notify_server in the notify_service directory (as in the last patch from Chad) is not used at all. Files below are from tao/orbsvcs/orbsvcs/notify. Chad, what do you think? Index: CosNotify_Service.h =================================================================== --- CosNotify_Service.h (revision 78772) +++ CosNotify_Service.h (working copy) @@ -21,6 +21,7 @@ #include "orbsvcs/Notify/Service.h" #include "orbsvcs/Notify/Builder.h" +#include "orbsvcs/Notify/Properties.h" #include "orbsvcs/Notify/Factory.h" TAO_BEGIN_VERSIONED_NAMESPACE_DECL @@ -87,6 +88,9 @@ /// Service component for building NS participants. ACE_Auto_Ptr< TAO_Notify_Builder > builder_; + + /// Notify properties + TAO_Notify_Properties properties_; }; =================================================================== --- CosNotify_Service.cpp (revision 78772) +++ CosNotify_Service.cpp (working copy) @@ -1,7 +1,6 @@ // $Id$ #include "orbsvcs/Notify/CosNotify_Service.h" -#include "orbsvcs/Notify/Properties.h" #include "orbsvcs/Notify/Default_Factory.h" #include "orbsvcs/Notify/Builder.h" #include "ace/Sched_Params.h" @@ -19,9 +18,10 @@ TAO_CosNotify_Service::TAO_CosNotify_Service (void) { + TAO_Notify_Properties::instance (&properties_); } -TAO_CosNotify_Service::~TAO_CosNotify_Service () +TAO_CosNotify_Service::~TAO_CosNotify_Service (void) { }
Chad, with my patch, I think we could zap the workaround in Notify_Service/Notify_Server.cpp, no need to do it there, what do you think?
(In reply to comment #32) > Chad, with my patch, I think we could zap the workaround in > Notify_Service/Notify_Server.cpp, no need to do it there, what do you think? > I tested this patch and it doesn't seem to work. The same thing happens as happened before. The global objects from AnyTypeCode library are destroyed before the TAO_Notify_Properties in CosNotification_Serv and it dumps core. When the properties is in the executable, it gets destroyed before AnyTypeCode is unloaded.
> I tested this patch and it doesn't seem to work. The same thing happens as > happened before. The global objects from AnyTypeCode library are destroyed > before the TAO_Notify_Properties in CosNotification_Serv and it dumps core. > When the properties is in the executable, it gets destroyed before AnyTypeCode > is unloaded. Chad, which tests did you run, I tried several of the notification service tests, these run fine on my linux host, also I checked the FC6_IPV6 build and look at the failing tests from jan 16th, these also run fine.
(In reply to comment #34) > Chad, which tests did you run, I tried several of the notification service > tests, these run fine on my linux host, also I checked the FC6_IPV6 build and > look at the failing tests from jan 16th, these also run fine. > Unfortunately, there is no regression test. However, it would be very easy for some one with the time to add one. Using the Notify_Service, I tested it by commenting out the properties setting in Notify_Server.cpp main and adding a "return -1;" on line 131 of Notify_Service.cpp just above the worker_.activate() call. After these modifications, just running the Notify_Service (along with a Name Service) will cause the problem.
> Unfortunately, there is no regression test. However, it would be very easy for > some one with the time to add one. Using the Notify_Service, I tested it by > commenting out the properties setting in Notify_Server.cpp main and adding a > "return -1;" on line 131 of Notify_Service.cpp just above the > worker_.activate() call. > > After these modifications, just running the Notify_Service (along with a Name > Service) will cause the problem. Ok, that was unknown to me. Would it be an option to zap the fallback to the TAO_Singleton in the properties? I suspect that the crash happens because somewhere in the shutdown somebody tries to access the properties and this has not been set yet, so an implicit singleton is created and destruction of that at shutdown then crashes. The cosnotify_service can then set the static pointer to 0 at destruction, maybe on some places during shutdown we then have to add a safety that the properties could be 0
(In reply to comment #36) > Ok, that was unknown to me. Would it be an option to zap the fallback to the > TAO_Singleton in the properties? I suspect that the crash happens because > somewhere in the shutdown somebody tries to access the properties and this has > not been set yet, so an implicit singleton is created and destruction of that > at shutdown then crashes. The cosnotify_service can then set the static pointer > to 0 at destruction, maybe on some places during shutdown we then have to add a > safety that the properties could be 0 > No, the problem is that in the destruction of the TAO_Notify_Properties, it accesses one of the static type codes found in the TAO_AnyTypeCode library. The static has already been destroyed and thus the core dump. As long as we can ensure that the destruction of the TAO_Notify_Properties occurs before the static destruction in the TAO_AnyTypeCode library we're fine.
my patch also fails on head. Chad, do you know a place in the notification service that would be safe to create the properties and that works in an exe and when the notification service dll is loaded on demand?
Thu Jul 5 13:35:00 UTC 2007 Johnny Willemsen <jwillemsen@remedy.nl> * orbsvcs/Notify_Service/Notify_Server.cpp * orbsvcs/Notify_Service/Notify_Service.cpp * orbsvcs/orbsvcs/Notify/CosNotify_Service.cpp * orbsvcs/orbsvcs/Notify/CosNotify_Service.h Reverted my change below, it breaks the executable Thu Jul 5 06:22:00 UTC 2007 Johnny Willemsen <jwillemsen@remedy.nl>
(In reply to comment #38) > my patch also fails on head. Chad, do you know a place in the notification > service that would be safe to create the properties and that works in an exe > and when the notification service dll is loaded on demand? > Without spending any time investigating this more, the only thing that I can think of off the top of my head is to leak the properties. Since the properties would be created as part of the notification service library, there's no real way to ensure that it gets destroyed before the static destructors run in the AnyTypeCode library.
Changed to enhancement and updated summary. As future enhancement we can look to rework TAO_Notify_Properties to not be a singleton so that we can support multiple services within one process with different settings