Bug 1637

Summary: TypeCode redesign
Product: TAO Reporter: Nanbor Wang <bala>
Component: ORBAssignee: Ossama Othman <ossama.othman>
Status: RESOLVED FIXED    
Severity: enhancement CC: bala, j.parsons
Priority: P2    
Version: 1.4   
Hardware: All   
OS: All   
Bug Depends on: 1369    
Bug Blocks: 550, 1803    

Description Nanbor Wang 2003-11-05 10:39:43 CST
More suggestions from Carlos

-----------------------------------------------------------------
char * Test::Hello::get_string (
    ACE_ENV_SINGLE_ARG_DECL
  )
  ACE_THROW_SPEC ((
    CORBA::SystemException
  ))
{
  if (!this->is_evaluated ())
    {
      ACE_NESTED_CLASS (CORBA, Object)::tao_object_initialize (this);
    }

  if (this->the_TAO_Hello_Proxy_Broker_ == 0)
  Test_Hello_setup_collocation (
          this->ACE_NESTED_CLASS (CORBA, Object)::_is_collocated ()
        );
    }
 ....
 ...
 ..
 .
----------------

        you may want to change that for a single call, like:

----------------
/* private */ /* inline ?? */ void Test::Hello::__tao_initialize()
{
  if (!this->is_evaluated ())
    {
      ACE_NESTED_CLASS (CORBA, Object)::tao_object_initialize (this);
    }

  if (this->the_TAO_Hello_Proxy_Broker_ == 0)
    {
      Test_Hello_setup_collocation (
          this->ACE_NESTED_CLASS (CORBA, Object)::_is_collocated ()
);
    }
}

char * Test::Hello::get_string (
    ACE_ENV_SINGLE_ARG_DECL
  )
  ACE_THROW_SPEC ((
    CORBA::SystemException
  ))
{
  __tao_initialize();
 ....
 ...
 ..
 .
----------------

        Not much difference if you only have one operation, but much
better if you have a lot.

 The next change is more interesting.  What can we do to reduce
the cost of the Typecodes and the Anys?  First notice the following
line counts (in TAO/tests/Hello):

$ /home/coryan/UCI/ACE_wrappers/build/Linux/TAO/TAO_IDL/tao_idl -Ge 1 -
DBORG_EVENT_HEADER  -St  Test.idl
$ wc -l TestC.cpp
    303 TestC.cpp

$ /home/coryan/UCI/ACE_wrappers/build/Linux/TAO/TAO_IDL/tao_idl -Ge 1 -
DBORG_EVENT_HEADER  Test.idl
$ wc -l TestC.cpp
    399 TestC.cpp

        Nearly 25% of the code in the C.cpp file is just Typecode
madness.  In terms of build time this is even more interesting:

-- Without -St --
$ time g++ -Wwrite-strings -W -Wall -Wpointer-arith -pipe -O3 -g    -D_LARGEFILE64_SOURCE -
D_REENTRANT -DACE_HA\
S_AIO_CALLS -D_GNU_SOURCE -DBORG_EVENT_HEADER  -I/home/coryan/UCI/ACE_wrappers/
build/Linux -I/home/coryan/UCI/A\
CE_wrappers/build/Linux/TAO -DACE_HAS_EXCEPTIONS -DACE_NO_INLINE    -DTAO_BUILD_DLL  -
c -o foo.o TestC.cpp

real    0m2.730s
user    0m2.550s
sys     0m0.210s

-- With -St --
$ /home/coryan/UCI/ACE_wrappers/build/Linux/TAO/TAO_IDL/tao_idl -Ge 1 -
DBORG_EVENT_HEADER  -St  Test.idl
[coryan@glamdring Hello]$ time g++ -Wwrite-strings -W -Wall -Wpointer-arith -pipe -O3 -g    -
D_LARGEFILE64_SOUR\
CE -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE -DBORG_EVENT_HEADER  -I/home/
coryan/UCI/ACE_wrappers/build/Li\
nux -I/home/coryan/UCI/ACE_wrappers/build/Linux/TAO -DACE_HAS_EXCEPTIONS -
DACE_NO_INLINE    -DTAO_BUILD_DLL  -cCE -D_REENTRANT -DACE_HAS_AIO_CALLS -
D_GNU_SOURCE -DBORG_EVENT_HEADER  -I/home/coryan/UCI/ACE_wrappers/build/Li\
nux -I/home/coryan/UCI/ACE_wrappers/build/Linux/TAO -DACE_HAS_EXCEPTIONS -
DACE_NO_INLINE    -DTAO_BUILD_DLL  -c\
 -o foo.o TestC.cpp

real    0m1.924s
user    0m1.810s
sys     0m0.150s

        That's nearly a 40% improvement!  I'll admit this is a small
file, anything with lots of operations and no types will not look this
good, but something with lots of types and no operations will look
much better :-)

        Furthermore, the Typecode.h files in the ORB are bloated and
slow to compile.

        What I will propose is that we completely change the way we
generate Typecode information.  First we need to remember that a
Typecode, after it is completely extracted, is simply a hierarchal
data structure that represents the elements in an IDL construct.  What
we could do is create this hierarchy directly, skipping the
intermediate binary representation.

        The solution below should work, I can only find one problem
with it: it does not work if B.idl includes (and uses types from)
A.idl but A.idl was compiled without -St....  Frankly, this
ill-defined at best.  I have some solutions to offer there (read [*])

        Before we go into abstractions, let me give you an example,
suppose we have IDL like this:

typedef long Token;

struct Quz {
  Token first_token;
  Token second_token;
  string the_string;
};

union Foo switch(short) {
  case 0:
  case 1:  short bar;
  case 2:  Quz the_quz;
  default: Token baz;
};

        I think if we generate code like this:

 // C++ (*C.cpp file)
 TAO::Alias_TypeCode _tao_tc_Token("IDL:Token/1.0", "Token", &CORBA::_tc_long);
 CORBA::TypeCode_ptr _tc_Token = &_tao_tc_Token;

       first notice that the *value* of CORBA::_tc_long is not know
at compile time, but its *address* is known (or at least well-defined)
at compile time.  Naturally the TAO::Alias_TypeCode class will have to
derive from CORBA::TypeCode, and it will need to know how to use the
parameters effectively.  I'll show how to implement such classes in a
second.

        Structures are easy:

 // C++ (*C.cpp file)
 TAO::TypeCode_Field _tao_fields_tc_Quz[] = {
   { "first_token", &_tc_Token },
   { "second_token", &_tc_Token },
   { "the_string", &CORBA::_tc_string },
 };
 TAO::Struct_TypeCode _tao_tc_Quz(
    "IDL:Quz/1.0", "Quz",
    _tao_fields_tc_Quz, sizeof(_tao_fields_tc_Quz));

        here I can show some of the implementation:

namespace TAO {
class Struct_TypeCode
  : public CORBA::TypeCode // This should be abstract
{
public:
  Struct_TypeCode(
    char const * id,
    char const * name,
    TAO::TypeCode_Field * fields,
    size_t nfields)
    : id_(id)
    , name_(name)
    , fields_(fields)
    , nfields_(nfields)
  { }

  virtual CORBA::TCKind() const { return _tk_struct; }
  char const * id() const { return id_; }
  char const * name() const { return name_; }
  CORBA::ULong member_count() const { return nfields_; }
  char const * member_name(CORBA::ULong i) const
  {
    if(i >= nfields_) {
      throw ....;

    return fields_[i].name_;
  }
  CORBA::TypeCode_ptr member_type(CORBA::ULong i) const
  {
 if(i >= nfields_) {
      throw ....;

    return *(fields_[i].type_); // notice the indirection
  }

  // operations that do not make sense are *supposed* to throw...
  CORBA::TypeCode_ptr discriminator_type() const {
    throw BadKind();
  }
};

        as you can see very is straightforward.  I think
arrays, sequences, interfaces and stuff like that are simple too.
Unions are a little more tricky:

 // C++ (*C.cpp file)
 TAO::TypeCode_Branch _tao_branches_tc_Foo[] = {
   { 0, "bar", &CORBA::_tc_short },
   { 1, "bar", &CORBA::_tc_short },
   { 2, "the_quz", &_tc_Quz },
 };
 TAO::Struct_TypeCode _tao_tc_Foo(
    "IDL:Foo/1.0", "Foo",
    &CORBA::_tc_short, // discriminator
_tao_branches_tc_Foo, sizeof(_tao_branches_tc_Foo),
    "baz", &_tc_Token // default (could be NULL if there is none)
 );

        I think recursive types simply work too, as in:

// IDL
struct Recursive;
typedef sequence<Recusive> Recursion;
struct Recursive {
  long x;
  Recursion y;
};

// C++ (Header)
CORBA::TypeCode_ptr _tc_Recursive;
CORBA::TypeCode_ptr _tc_Recursion;

// C++ (CPP)
TAO::Alias_TypeCode _tao_tc_Recursion(.. &_tc_Recursive ...); // legal!
CORBA::TypeCode_ptr _tc_Recursion = &_tao_tc_Recursion;

TAO::Struct_TypeCode _tao_tc_Recursive(.. &_tc_Recursion ...); // OK!
CORBA::TypeCode_ptr _tc_Recursive = &_tao_tc_Recursive;

_tao_branches_tc_Foo, sizeof(_tao_branches_tc_Foo),
    "baz", &_tc_Token // default (could be NULL if there is none)
 );

        I think recursive types simply work too, as in:

// IDL
struct Recursive;
typedef sequence<Recusive> Recursion;
struct Recursive {
  long x;
  Recursion y;
};

// C++ (Header)
CORBA::TypeCode_ptr _tc_Recursive;
CORBA::TypeCode_ptr _tc_Recursion;

// C++ (CPP)
TAO::Alias_TypeCode _tao_tc_Recursion(.. &_tc_Recursive ...); // legal!
CORBA::TypeCode_ptr _tc_Recursion = &_tao_tc_Recursion;

TAO::Struct_TypeCode _tao_tc_Recursive(.. &_tc_Recursion ...); // OK!
CORBA::TypeCode_ptr _tc_Recursive = &_tao_tc_Recursive;

     The demarshaling code to receive typecodes from the wire will
have to get smarter, but that's hidden in the library.


                                Let me know what you think.




[*]  So what if B.idl did not define _tc_Foo but A.idl needs it?
Several solutions:
- If we successfully eliminate the *motivation* for the -St option,
  then we can remove the option and avoid the problem completely!

- The application is invoking undefined behavior, i.e. as soon as the
  use -St they are outside the CORBA spec and we can tell them what
  the rules are.  In this case "-St everywhere or nowhere."

- Use macros to detect (at build time) if A.idl was compiled without
  -St.  In that case fallback on some other mechanism, like the old
  arrays.  For example, the generated code could include some macros
  like:

// AC.h
#define _tao_compile_trait__A_IDL__with_St

        which are used like this:

// BC.cpp

#if defined(_tao_compile_trait__A_IDL__with_St)
// Generate _tc_Foo here
namespace {
CORBA::TypeCode _tc_Foo = ....;
}
#endif

        Using macros will probably fail when the filenames or the
type names are ambiguous.  One could dream of template
meta-programming techniques to work around that, but it is overkill.
Comment 1 Nanbor Wang 2003-11-05 11:03:42 CST
Add dependency
Comment 2 Nanbor Wang 2004-01-03 09:31:30 CST
Accepting this
Comment 3 Ossama Othman 2004-03-29 11:30:48 CST
I'll be taking over this work.
Comment 4 Ossama Othman 2004-03-29 11:31:40 CST
Mine.
Comment 5 Ossama Othman 2004-08-17 01:01:34 CDT
Made subject more descriptive of the content.
Comment 6 Ossama Othman 2004-08-17 01:15:09 CDT
*** Bug 142 has been marked as a duplicate of this bug. ***
Comment 7 Ossama Othman 2004-08-24 02:00:04 CDT
No need for me to be on the CC list since this enhancement request is already
assigned to me.
Comment 8 Carlos O'Ryan 2004-11-19 15:58:37 CST
Hopefully the new TypeCode classes will automatically fix bug 550 or render the
bug obsolete.
Comment 9 Ossama Othman 2005-04-08 03:40:24 CDT
The TypeCode rewrite has finally been merged to the HEAD branch.  However, the
suggested approach above, which is the one used in the new implemenation,
requires that the "Template Method" design pattern be used to allow the
appropriate TypeCode method implementation to be bound at run-time. 
Unfortunately, this approach appears to be quite heavy.  For example,
tao/AnySeqA.o nearly doubled in size:

pre-typecode-rewrite-merge
==========================
$ size .shobj/AnySeqA.o
   text    data     bss     dec     hex filename
   5401      88     144    5633    1601 .shobj/AnySeqA.o

HEAD (as of today)
==========================
$ size .shobj/AnySeqA.o
   text    data     bss     dec     hex filename
   9810     544     164   10518    2916 .shobj/AnySeqA.o

AFAICT, the new implementation is fairly lean source code-wise, meaning that it
is likely the virtual tables may be the cause of the code bloat.  Another cause
may be the use of templates in the new implementation.  However, quick
experiments imply otherwise.

It may be possible to subset the virtual tables, so to speak, by using a
variation of the "Bridge" design pattern for each method in the CORBA::TypeCode
interface.  The idea is to only pay for virtual table entries for virtual
methods specific to the TypeCode subclass.  TAO-specific virtual methods,
including the destructor, in the CORBA::TypeCode base class could then be
entirely removed.  Presumably this approach would reduce the vtable footprint
penalty.
Comment 10 Ossama Othman 2005-04-08 17:25:55 CDT
It turns out the code bloat is infact related to the use of templates.  By
fully specializing the Alias<> and Sequence<> TypeCode subclasses, and
dropping the template source includes (e.g. Alias_TypeCode.cpp) from
their corresponding headers, I got a substantial reduction in footprint:

pre-typecode-rewrite-merge
==========================
$ size .shobj/AnySeqA.o
   text    data     bss     dec     hex filename
   5401      88     144    5633    1601 .shobj/AnySeqA.o

Current HEAD
============
ossama@void:~/work/head/ACE_wrappers/TAO/tao$ size .shobj/AnySeqA.o
   text    data     bss     dec     hex filename
   9810     544     164   10518    2916 .shobj/AnySeqA.o

Modified HEAD  (code named "Hair Cut"  :-))
=============
$ size .shobj/AnySeqA.o
   text    data     bss     dec     hex filename
   4570     160      84    4814    12ce .shobj/AnySeqA.o

As you can see, the footprint is slightly less than the
pre-typecode-rewrite-merge code.  TypeCode CDR extraction operations are
not reflected in the HEAD branch numbers, so we may still take a hit in
that case.  I'd prefer not to specialize the templates so I'll see what
I can dig up on alternatives to preventing the code bloat.
Comment 11 Ossama Othman 2005-04-08 17:27:49 CDT
Similar effects can be attained without specialization simply removing the
template source includes (e.g. Alias_TypeCode.cpp) in the concrete TypeCode headers.
Comment 12 Ossama Othman 2005-05-16 14:39:00 CDT
The new TypeCode implementation is available in TAO 1.4.5.