Please report new issues athttps://github.com/DOCGroup
Following is an accumulation of suggestions for redesigning invocation code from Carlos O'Ryan <coryan@atdesk.com>. I have a bunch of disparate notes on these issues, I was hoping to fill in more details and clean them up further, but here it goes in raw form. Please feel free to shoot me emails about it, and I'll try to get to them after hours. The problems with the generated code as it stands today are many. Some result in poor compilation times, some unduly increase the complexity of the compiler. You asked about simplifying stubs, so I'll do those first: ---------------------------------------------------------------- - Encapsulate code generation complexity in the library, for example, look at a very trivial Stub generated today: ---------------- // IDL module Test { interface Hello { string get_string (); }; } ---------------- // C++ char * Test::_TAO_Hello_Remote_Proxy_Impl::get_string ( CORBA_Object *_collocated_tao_target_ ACE_ENV_ARG_DECL ) ACE_THROW_SPEC (( CORBA::SystemException )) { CORBA::String_var _tao_retval; TAO_Stub *istub = _collocated_tao_target_->_stubobj (); if (istub == 0) { ACE_THROW_RETURN (CORBA::INTERNAL (),_tao_retval._retn ()); } TAO_GIOP_Twoway_Invocation _tao_call ( istub, "get_string", 10, 0, istub->orb_core () ); int _invoke_status; #if (TAO_HAS_INTERCEPTORS == 1) TAO_ClientRequestInterceptor_Adapter _tao_vfr ( istub->orb_core ()->client_request_interceptors (), &_tao_call, _invoke_status ); #endif /* TAO_HAS_INTERCEPTORS */ for (;;) { _invoke_status = TAO_INVOKE_EXCEPTION; #if TAO_HAS_INTERCEPTORS == 1 TAO_ClientRequestInfo_Test_Hello_get_string _tao_ri ( &_tao_call, _collocated_tao_target_ ACE_ENV_ARG_PARAMETER ); ACE_CHECK_RETURN (_tao_retval._retn ()); #endif /* TAO_HAS_INTERCEPTORS */ CORBA::Short _tao_response_flag = TAO_TWOWAY_RESPONSE_FLAG; TAO_INTERCEPTOR (_tao_ri.response_expected (1)); #if TAO_HAS_INTERCEPTORS == 1 ACE_TRY { _tao_vfr.send_request ( &_tao_ri ACE_ENV_ARG_PARAMETER ); ACE_TRY_CHECK; if (_invoke_status == TAO_INVOKE_RESTART) { _tao_call.restart_flag (1); continue; } #endif /* TAO_HAS_INTERCEPTORS */ _tao_call.start (ACE_ENV_SINGLE_ARG_PARAMETER); TAO_INTERCEPTOR_CHECK_RETURN (_tao_retval._retn ()); _tao_call.prepare_header ( ACE_static_cast (CORBA::Octet, _tao_response_flag) ACE_ENV_ARG_PARAMETER ); TAO_INTERCEPTOR_CHECK_RETURN (_tao_retval._retn ()); _invoke_status = _tao_call.invoke (0, 0 ACE_ENV_ARG_PARAMETER); TAO_INTERCEPTOR_CHECK_RETURN (_tao_retval._retn ()); if (_invoke_status == TAO_INVOKE_EXCEPTION) { TAO_INTERCEPTOR_THROW_RETURN ( CORBA::UNKNOWN ( TAO_OMG_VMCID | 1, CORBA::COMPLETED_YES ), 0 ); } else if (_invoke_status == TAO_INVOKE_RESTART) { TAO_INTERCEPTOR ( _tao_ri.reply_status (_invoke_status); _tao_vfr.receive_other ( &_tao_ri ACE_ENV_ARG_PARAMETER ); ACE_TRY_CHECK; ) continue; } TAO_InputCDR &_tao_in = _tao_call.inp_stream (); if (!( (_tao_in >> _tao_retval.inout ()) )) { TAO_INTERCEPTOR_THROW_RETURN ( CORBA::MARSHAL ( TAO_DEFAULT_MINOR_CODE, CORBA::COMPLETED_YES ), 0 ); } #if TAO_HAS_INTERCEPTORS == 1 char * _tao_retval_info = _tao_retval._retn (); _tao_ri.result (_tao_retval_info); _tao_retval = _tao_retval_info; _tao_ri.reply_status (_invoke_status); _tao_vfr.receive_reply ( &_tao_ri ACE_ENV_ARG_PARAMETER ); ACE_TRY_CHECK; } ACE_CATCHANY { _tao_ri.exception (&ACE_ANY_EXCEPTION); _tao_vfr.receive_exception ( &_tao_ri ACE_ENV_ARG_PARAMETER ); ACE_TRY_CHECK; PortableInterceptor::ReplyStatus _tao_status = _tao_ri.reply_status (ACE_ENV_SINGLE_ARG_PARAMETER); ACE_TRY_CHECK; if (_tao_status == PortableInterceptor::SYSTEM_EXCEPTION || _tao_status == PortableInterceptor::USER_EXCEPTION) { ACE_RE_THROW; } } ACE_ENDTRY; ACE_CHECK_RETURN (_tao_retval._retn ()); PortableInterceptor::ReplyStatus _tao_status = _tao_ri.reply_status (ACE_ENV_SINGLE_ARG_PARAMETER); ACE_CHECK_RETURN (_tao_retval._retn ()); if (_tao_status == PortableInterceptor::LOCATION_FORWARD || _tao_status == PortableInterceptor::TRANSPORT_RETRY) { continue; } #endif /* TAO_HAS_INTERCEPTORS */ break; } return _tao_retval._retn (); } ---------------------------------------------------------------- I think that could be better. Here is an idea, what if we could pass the list of arguments to the invocation class, as follows: ---------------------------------------------------------------- // C++ char * Test::_TAO_Hello_Remote_Proxy_Impl::get_string ( CORBA_Object *_tao_target_ ACE_ENV_ARG_DECL ) ACE_THROW_SPEC (( CORBA::SystemException )) { // Store the arguments in _tao_arguments... // Store the return value in _tao_retval... // create the invocation... it has complete knowledge, so it can // demarshal, wait, marshal, location forward, call interceptors, // etc. TAO_GIOP_Twoway_Invocation _tao_call ( _tao_target_, "get_string", 10, 0, _tao_retval, _tao_arguments, 0 ); _tao_call.invoke(); return _tao_retval._retn(); } ---------------------------------------------------------------- So you will say, "Carlos, you must be smoking something, how is the TAO_Invocation class going to deal with those arguments. They are generated *after* the TAO library is compiled dude." Andy would probably add "You surely do not want interpretive marshaling again" The answer is, of course, use abstractions and a base class, just so: ---------------------------------------------------------------- namespace TAO { class Argument { public: virtual void marshal(TAO_OutputCDR&) = 0; virtual void unmarshal(TAO_InputCDR&) = 0; }; ---------------------------------------------------------------- and they you say: "That is very cute, but now you have to generate one 'Argument' class for each IDL type, so this is *more* complexity, not less". The answer is, of course, external polymorphism: ---------------------------------------------------------------- template<class S> class Struct_Argument : public Argument { public: Struct_Argument(S const & x, char const * argname); virtual void marshal(TAO_OutputCDR & cdr) { cdr << x; } virtual void unmarshal(TAO_InputCDR&) { } virtual void interceptor_support(Interceptor_Arg_List & x) = 0; { CORBA::Any any; any <<= x_; x.append(any, argname_); } private: S const & x_; char const * argname_; }; ---------------------------------------------------------------- and then you would ask, "Very nice, but creating all those arguments is going to be expensive....", phooey says I, they are created on the stack: ---------------------------------------------------------------- // IDL interface Foo { void the_operation(in Struct_A a_arg, inout Struct_B b_arg); } // C++ void Foo::the_operation( Struct_A const & a_arg, Struct_B & b_arg) { TAO::Struct_In_Argument<Struct_A> _tao_a_arg(a_arg, "a_arg"); TAO::Struct_InOut_Argument<Struct_B> _tao_b_arg(b_arg, "b_arg"); TAO::Argument * _tao__argument_list[] = { &_tao_a_arg, &_tao_b_arg, }; .... } ---------------------------------------------------------------- "Ohhh, so return types would be some TAO::Struct_Retn<> template, that's cool... but what about void returns?" Easy, you use a special class (or template specialization for void). Jeff, is thinking, "Boy, that's OK, but now I have to write a template each one of IN, INOUT, OUT and RETN directions... sigh, even worse, the template for variable sized structures is different from the template for fixed size structs or for unions. That is a lot of template and a lot of code in the IDL compiler to figure out what template to use." I have some answers for that too, but we can get to them later. Before, let me point out what we could do for the skeletons, but we need to use "SArguments": ---------------------------------------------------------------- // IDL // C++ (server side) /* static */ void POA_Foo:: the_operation_skel( .... ) { TAO::In_Struct_SArgument<Struct_A> _tao_a_arg ("a_arg"); TAO::InOut_Struct_SArgument<Struct_B> _tao_b_arg ("b_arg"); TAO::SArgument * _tao_arguments[] = { &_tao_a_arg, &_tao_b_arg, }; // Implements interceptors and other madness... TAO_Upcall_Wrapper _tao_upcall( _tao_arguments, sizeof(_tao_arguments) / sizeof(_tao_arguments[0]), _tao_retval); _tao_upcall.demarshal(); _tao_retval.arg() = servant->create_another_foo( _tao_name.arg(), _tao_id.arg()); _tao_upcall.marshal(); } ---------------------------------------------------------------- Notice that the In_Struct_SArgument can also "know" how to declare the appropriate object in the stack, and/or the T_var needed to contain the argument. What we need to convince ourselves is that we can support all the required features (interceptors, location forwarding, exceptions, remarshaling, etc.) with that abstraction. Naturally the interface for "Argument" and "SArgument" will grow heavier, for example, we will need virtual methods to add an argument to the RequestInfo argument list, but I do not think any of those challenges are too bad. Now, if you want to hear about really advanced techniques to reduce the complexity of the compiler, check this out: ---------------------------------------------------------------- The compiler has to deal with many different "rules" for different types, for example, the mapping for return type changes for unions, interfaces, fixed size struct, var size structures, arrays, strings, wstrings, etc. By "changes" I do not mean that it is a different type, that's obvious, but if you have: ---------------- // IDL .... X interface Foo { X return_X(); }; ---------------- What surrounds the following 'X': ---------------- // C++ ... X ... Foo::return_X() {} ---------------- changes with what is really there, as in: ---------------- X * Foo::return_X() {} // variable size structure char* Foo::return_X() {} // string CORBA::WChar * Foo::return_X() // wstring X Foo::return_X() {} // primitive type X_ptr Foo::return_X() {} // interface ---------------- so the compiler has all this code to figure out exactly what it should print in different cases. It is not only return types, IN, INOUT, OUT parameter declarations change too, also Any insertion operators, T_var and T_out types, skeleton declarations, and (potentially) the TAO::Argument wrapper used for each one. Wouldn't it be nice if we could write: ---------------- TAO::Traits<X>::_retn_type Foo::return_X() {} // ALL TYPES!! ---------------- Well, we can, just do this, make the IDL compiler generate, in a *SINGLE SPOT*, the following code: ---------------- .... X; // forward declare X namespace TAO { // reopen template<> class Traits<X> // specialize : /* type specific trait here */ /* eg TAO::Fixed_Struct_Traits<X> */ /* eg TAO::Var_Struct_Traits<X> */ /* eg TAO::Interface_Traits<X> */ /* eg TAO::Primitive_Traits<X> */ /* eg TAO::Array_Traits<(underlying type), 15> */ /* eg TAO::Union_Traits<X> */ } // namespace TAO ---------------- Isn't that cool? Now we can wrap all that complexity in the TAO library... Better, we can define some typedefs in the traits, as in: ---------------- namespace TAO { // reopen struct Fixed_Structure_Tag {}; // generic programming tag struct Var_Structure_Tag {}; // generic programming tag template<class S> class Fixed_Struct_Traits { typedef T _retn_type; typedef T const & _in_type; typedef T & _inout_type; typedef T & _out_type; typedef In_Struct_Argument<T> in_argument_type; typedef InOut_Struct_Argument<T> inout_argument_type; typedef Out_Struct_Argument<T> out_argument_type; /* Ditto for SArguments */ typedef Fixed_Structure_Tag idl_classification_tag; /* Maybe some static methods too! */ }; ---------------- Well, that sort of sums it up: the generated code complexity is reduce and the IDL compiler complexity is reduced. With a little partial template specialization we can shrink it even further, but you guys want^H^H^H^H need to keep dealing with brain dead compilers.... Some more random notes follow: ================================================================ = Compilation time issues: - Eliminated unwanted includes for *C.h and *S.h files. + Most people do not need the full corba.h file, it includes: - NVList.h - Object.h - The complete Exception.h file (just SystemException would do) - CurrentC.h - BoundsC.h - PolicyC.h - tao/ServicesC.h - tao/WrongTransactionC.h !! - Remote_Object_Proxy_Impl.h! - PortableInterceptorC.h ! Recheck the #included files to eliminate (if possible) OS.h from the list. There should be no need to include OS wrappers to get CORBA functionality. - Eliminate code that applications do not want. For example, all the Proxy_Impl classes should go into the .cpp file or into a '_TAO_Impl.h' file, ditto for the AMI_*Holder implementations. - Move inline methods that are not in the critical path out. E.g. all sorts of constructurs and destructors. - Move classes out of the S.h file too!!! - Encapsulate repeated code in the library, for example, all the T_var, T_out and similar types. The application then needs to parse the template only *ONCE*, instead of parsing the same code multiple times. The problem with templates are: 1) Instantiation: many can be instantiated without problem (i.e. in the file where the class is *defined*), we already do this for TAO_Object_Manager<> and a couple others. But sequences are hard for platforms that require explicit template instantiation. My approach is to take the "if your platform sucks then you pay for it, not everyone else." In this case we should *NOT* instantiate the sequence template, on platform with automatic template instantiation (most of them these days) this is not a problem. On other platforms we can simply document what to do, or add a #pragma so users can say "I want the template instantiated here" like this: // IDL - file1.idl typedef sequence<long> Long_Sequence; // C++ - file1C.cpp class Long_Sequence : public TAO_Sequence<CORBA::Long> { .... }; // IDL - file2.idl typedef sequence<long> Another_Long_Sequence; #pragma tao_instantiate Another_Long_Sequence // C++ - file2C.cpp #if defined(ACE_HAS_EXPLICIT_TEMPLATE_INSTANTIATION) template class TAO_Sequence<CORBA::Long>; #elif defined(ACE_HAS_TEMPLATE_INSTANTIATION_PRAGMA) # pragma instantiate TAO_Sequence<CORBA::Long> #endif class Another_Long_Sequence : public TAO_Sequence<CORBA::Long> { .... }; 2) Some T_var and T_out types change depending on what the <T> really is. For example, the T_var for object references is quite different from the T_var for structures. I do not think this is a big deal, there are probably less than a dozen variations. 3) Forward declared interfaces are hard because the release(), duplicate(), _nil() and other methods are not defined yet. Right now we are using a technique based on generated standalone functions with long names (like Module__Interface__duplicate()) We can use template specialization and traits for this, like so: // IDL - file1.idl interface Foo; // C++ - file1C.h class Foo; typedef Foo * Foo_ptr; namespace TAO { // reopen template<> class Interface_Traits<Foo> { // specialize... static Foo_ptr _nil(); static Foo_ptr duplicate(Foo_ptr); static void release(Foo_ptr); }; } typedef TAO::Interface_var<Foo> Foo_var; typedef TAO::Interface_out<Foo> Foo_out; // C++ - Interface_var.h namespace TAO { template<class Interface> Interface_var { // use the Interface_Traits<Interface> template... the specialization // will be used when needed! }; The arguments are passed in, just wrapped so the invocation class can treat any IDL type polymorphically... The arguments are *NOT* marshaled until needed. Maybe the following outline would help... void TAO_Invocation::invoke( .... TAO::Argument * args[], size_t nargs, TAO::Retur_Value * retval, ....) { for(;;) { // location forwarding loop // Create the CDR stream, initialize the header, call whatever // interceptors are needed. // (*) marshal each argument: for(TAO::Argument** i = args; i != args + nargs; ++i) { (*i)->marshal(output_cdr); } // Send the request... // Wait for the response... // (*) demarshal the return value... retval->demarshal(input_cdr); // demarshal the arguments for(TAO::Argument** i = args; i != args + nargs; ++i) { (*i)->demarshal(output_cdr); } } } The trick is to notice that the code in the Stubs is generic already, except that the steps marked with (*) are currently inlined, while I want to replace those step with polymorphism. Naturally we want to avoid any copying or extra marshaling, maybe more code will help you see how: template<class S> In_Struct_Argument : public Argument { S const & arg_; public: In_Struct_Argument(S const & arg, char const * argname) : Argument(argname) , arg_(arg) { } virtual void marshal(TAO_OutputCDR & cdr) { // no marshaling of IN arguments } virtual void demarshal(TAO_InputCDR & cdr) { cdr >> arg_; } }; So, the argument in question is simply wrapped, not copied, and certainly not marshaled until needed. One trick is to get this stuff working for the really nasty types, like arrays. I think it can be done. The other trick is avoiding the 200 variations of TAO::{IN,INOUT,OUT,RETN}_XXXXX_Argument I think we can do that with some care, for example, for all IN the marshal operation is a noop, we simply need to use decorators for that. Then if we can use the Traits in the implementation of these classes we could say something like: template<class S> In_Argument : public Argument { Traits<S>::_in_type arg_; public: virtual void marshal(TAO_OutputCDR&) {} virtual void demarshal(TAO_InputCDR & cdr) { return Traits<S>::demarshal(cdr, arg_); } }; Probably there are more details to fill in, but with a little experimentation and thought we should be able to dramatically reduce the number of variations of the XXX_Argument<> templates. > > TAO::In_Struct_SArgument<Struct_A> _tao_a_arg ("a_arg"); > > TAO::InOut_Struct_SArgument<Struct_B> _tao_b_arg ("b_arg"); [snip] > > Again, this looks so cool! But again I see too many copies being made > out here. There are none... > A copy from the incoming data to TAO::In_Struct_SArgument (since you > have a CDR underneath) I do not have such a thing. The implementation should be something like this: template<class S> class In_Struct_SArgument : public SArgument { S arg_; public: // Store the name so interceptors can be implemented.. In_Struct_SArgument(char const * argname) : SArgument(argname) {} // Convert... operator S const & () { return arg_; } virtual demarshal(TAO_InputCDR & cdr) { cdr >> arg_; } virtual marshal(TAO_OutputCDR & cdr) { // noop for IN arguments. } } BTW, this template is also able to hide the differences in argument declaration, allocation and cleanup on the skeletons, as well as making some of that exception-safe.
Accepted for tao-support.
I think we can use those techniques to further reduce the size of the generated code *and* solve a host of problems for collocated calls. Remember that the arguments (and return value) for an operation are passed to the Invocation class using an array of Argument* objects. Well, suppose the call was actually collocated. Instead of doing the crazy stuff we do today, we could simply generate two skeleton functions, the regular skeleton and a "skeleton for collocated calls", which would look like this: /* static */ void POA_MyServant::_tao_collocated_skel_my_operation( size_t argument_count, TAO::Argument *arguments[], ....) { this skeleton function could extract the arguments using dynamic casts (or their emulated version if we lack RTTI support): // First check the argument count against the real number of // arguments if(argument_count != 5) throw CORBA::MARSHAL(); // Then downcast each argument and raise if downcast is unsucessful Real_Argument_Type * _tao_first_argument = dynamic_cast<Real_Argument_Type*>(arguments[0]); if(_tao_first_argument == 0) throw CORBA::MARSHAL(); Once all the arguments have been downcasted we can get the Servant (POA mediated or not) and call the implementation method directly. There is probably some trickery in getting this stuff to work while keeping the POA decoupled. Basically we will need to know if the object has *any* chance of being collocated (we can do this when the object is created using the same techniques we use today.) If so, we need to call the object adapter to handle the invocation object. If collocation does not apply (for whatever reason) we follow the normal path. In short, a few more methods in that object adapter abstract class. What are the advantages of doing this vs. the current approach? 1) We would not need to generate special proxy classes for the collocated case. Just the special skeleton. This may prove to provide some footprint savings... or not. Basically is a tradeoff, so hopefully the difference is small. OTOH: 2) It also eliminates the need for the proxy brokers, thus we save in compilation time and code footprint. Basically we would need to implement the broker code once, at the library level. 3) Because the arguments are wrapped in the little Argument classes the client-side and server-side interceptors for collocated calls can be moved to the library too (more features for the same footprint / compilation-time.) 4) Because the code is in the library it is easier to control (for example via some Policy) when is collocation used, and what type applies.
When we remove the sequence base class generation, we should also remember to make sure that replace() and get_buffer() are in all the sequence template classes. Some of them are missing.
This isn't about invocation code refactoring, but it does apply to footprint reduction, so I wanted to get it down before I forget it. We can make the IDL compiler smart about what files from TAO it includes in generated code. One of the things this would require is to keep an inverse of the member lists - that is, declarations would need to know not only what other declarations they reference, but also what other declarations reference them, and what types these other declarations are. Of course, if no declarations of a certain type are present, that may also be important. Information of this kind can also affect directly what does or does not get generated. For example, if a given interface is not used in a sequence, there is no need to generate the forward declaration workarounds for the object sequence methods _downcast and _upcast. In a branch, I've already separated the code generation for these particular workarounds from the other four: _duplicate, CORBA::release, _nil, and marshal (a new one to prevent the marshaling of local interfaces), so it will be easy to turn this piece of code generation off, if it turns out to be possible to achieve this optimization.
Adding bug 133 as a dependency, because if the collocation decision is implememented in the library instead of the "when the object reference is created", then we can easily look at the target object and decide if we need to follow the collocated path or not. Strictly speaking bug 133 can be resolved without the fixes for bug 1369, the proxy has to be smarter: - For each class change the ProxyBroker to create collocated objects the first time a skeleton is instantiated - The collocated proxy looks at the effective target, if it is collocated (this can be cached at IOR creation time) the collocated path is used, otherwise fall back on the remote path.
Jeff and Bala already took care of the client side. I'm currently work on the server side.
Mine.
Update on skeleton refactoring progress: * Enabled/fixed server side "SArg_Traits" supports in TAO_IDL. * Made accompanying fixes to PortableServer files (Upcall_Wrapper.*, ServerRequestInfo.*, etc) * Removed all vestiges of the interceptor related files in TAO_IDL. * Includes zapping all interceptor related visitors, and consequently all interceptor related generated code in the skeleton. * Sweet! * With the above completed, I'm now able to compile the skeleton in Jeff's Param_Test torture test with my changes in place. (All changes/fixes have been committed to my skeleton-refactor branch.) Since refactored skeletons can now be compiled, I was able to get some more meaningful numbers: Skeleton: $TAO_ROOT/tests/Param_Test/param_testS.* Configuration: * g++ 3.4.2 on Debian GNU/Linux "unstable" distribution * Exceptions and inlining enabled * Optimization disabled * Shared library / PIC binaries "old" == "HEAD" branch as of 14 October 2004. "new" == "skeleton-refactor" branch as of today -- split off from HEAD on 14 September 2004. Source Metrics (not preprocessed) ================================= Bytes (old / new / % change): 446820 / 215177 / 52% Lines (old / new / % change): 16402 / 7728 / 53% Object File Metrics =================== Size (using `size' command) ......... HEAD branch .......... $ size .obj/param_testS.o text data bss dec hex filename 362078 4 232 362314 5874a .obj/param_testS.o ......... skeleton-refactor branch .......... $ size .obj/param_testS.o text data bss dec hex filename 196087 8 256 196351 2feff .obj/param_testS.o % change = 46% Updated List of TODO Items ========================== 1. Refactor ThruPOA collocation code in skeleton (direct collocation later) 2. Get linking to work 3. Measure footprint of linked applications 4. Perform run-time regression tests 5. Run performance benchmarks 6. Remember what I forgot to add to this list. :-)
Done. See the following ChangeLog entry: Tue Feb 22 02:03:20 2005 Ossama Othman <ossama@dre.vanderbilt.edu>