Hi all,

Does anyone know if it is possible to write a C++ macro, which will force the compiler to construct Pascal/custom style string constants during compilation?

The basic idea is to avoid unnecessary run-time construction from C-style to custom-style string literals.

For example, when evaluating the expression {String2 = String1 + _MyMacro("Hello") + String2;}, the "Hello" literal would already be pre-compiled into the appropriate format.

Any ideas/comments would be much appreciated.

What gives you the impression that creating such a macro would be more efficient than without it? Obviously each compiler might do it differently, but if you look at the disassembled code for VC++ 2008 Express compiled for Debug all the compiler has to do is push the address of the std::string object on the stack, the address of the string literal on the stack, and call a std::string method to concantinate the two. Seems to me that inserting your _MyMacro would just complicate things and slow it down, not speed it up, or it would have absolutely no affect one way or the other.

What gives you the impression that creating such a macro would be more efficient than without it? Obviously each compiler might do it differently, but if you look at the disassembled code for VC++ 2008 Express compiled for Debug all the compiler has to do is push the address of the std::string object on the stack, the address of the string literal on the stack, and call a std::string method to concantinate the two. Seems to me that inserting your _MyMacro would just complicate things and slow it down, not speed it up, or it would have absolutely no affect one way or the other.

Then allow me to elaborate:

As chip designers, not C++ programmers, we would normally construct these routines at ASM/uC level and thus have complete control over both implicit/explicit code generation. However, due to product restructuring, we must now produce native C/C++ "programmer-friendly" firmware which is readily accessible by designer and non-designer alike. As most of our systems are tailored specifically for the embedded market, these routines MUST often maintain a customer-defined T-State/Memory metric in order to complete post-production testing.

From your rather elementary diagnosis of the VC++ debug code, the parameter reference semantic is indeed correct - after all, what else would it be - but it's what happens subsequent to method invocation that constitutes the real crux of our problem. No matter which way you cut it, ASCIIZ strings often require iterative length parsing when size/processing phases are not easily interchangeable and as such, introduce additional back-end T-States that were not present prior to our adoption of C/C++.

In order to alleviate this problem, we have created a custom string class - std::string is neither desirable nor available - which successively stores the reference count, string length, buffer length and string content as part of its transient reference buffer. We have of
course included methods which explicitly deal with standard ASCIIZ strings at run-time, but it would be both "programmer-friendly" and efficient if we could coerce the implicit (Construct -> "String Literal", Reference -> char *) compile-time semantic in order to construct our own custom literals directly within the executable's data section.

Thus, depending upon preference, the programmer could invoke:

String1("Hello"); <- Familiar syntax, but length processed during run-time construction.

OR

String1(_MyMacro("Hello")); <- Reasonably familiar syntax, but length processed via compile-time construction and immediately available at run-time.

Note that the transient buffer containing both string parameters and fixed-length content are of POD type and subsequently referenced from within one or multiple non-POD string objects. Of course, the literal transient buffers could be manually constructed via standard C/C++ constructs, but this regresses back to our "unfriendly" ASM/uC practices.

TAs chip designers, not C++ programmers, we would normally construct these routines at ASM/uC level and thus have complete control over both implicit/explicit code generation. However, due to product restructuring, we must now produce native C/C++ "programmer-friendly" firmware which is readily accessible by designer and non-designer alike. As most of our systems are tailored specifically for the embedded market, these routines MUST often maintain a customer-defined T-State/Memory metric in order to complete post-production testing.

You can use the inline assembler if you need a better low-level control:
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

From your rather elementary diagnosis of the VC++ debug code...

not so friendly hey?

String1("Hello"); <- Familiar syntax, but length processed during run-time construction.

OR

String1(_MyMacro("Hello")); <- Reasonably familiar syntax, but length processed via compile-time construction and immediately available at run-time.

So what do you actually want? The length of the string at compile time?

#define _MyMacro(cstr) cstr, sizeof(cstr)

Should work if your string constructor accepts (const char*st, int len)

If you actually want to build the whole "string" class object at compile time just build up a initialiser for the struct/class and cast to type "string".

An example is shown below. I've called your "string" class "stwing", as otherwise there's a problem name uniqueness. You'll also see there's a macro STWING(x) that will build the whole class up for you during compile time, I would expect.

#include <iostream>

using namespace std;

struct stwing{
   int len, capacity;
   const char *x;
   // constructors, etc.
   };

#define STWING(x) (stwing){sizeof(x), sizeof(x), x}
main(){
   
   stwing xyz = (stwing){sizeof("Hello"), sizeof("Hello"), "Hello"};
   stwing jkl = STWING("Goodbye");
   
   cout << xyz.len << ", " << xyz.capacity << ", " << xyz.x << endl;
   cout << jkl.len << ", " << jkl.capacity << ", " << jkl.x << endl;
   cin.get();
   }

You might also need a extra flag in your class to state whether the class references an explicitly allocated buffer or not, so that you don't try to deallocate it.

not so friendly hey?

Quite right; looking back at my second post, this certainly doesn't read as it was originally intended. I apologise for the somewhat arrogant undertones.

So what do you actually want? The length of the string at compile time?

When we initially considered the C/C++ language option, we quickly realised that the compiler exhibited an implicit string semantic in order to expose a unified syntax across both literal character buffer and immediate machine type alike.

Thus, both {int V1 = 5;} and {char *V1 = "Hello";} are both valid expressions, with the latter requiring implicit compile-time string construction within the executable's data section. Assuming that the prerequisite constructors are already in place, we were looking to convolute this implicit string semantic - possibly via a preprocessor macro? - in order to procure similar functionality within our own application-specific POD types. In our perfect and quite deluded world for example, the expression {TransientBuffer *V1 = _MyMacro("Hello");} would implicitly construct all requisite information within the executable's data section and subsequently return the appropriate address for inclusion within its associated assignation expression. As this effectively emulates the existing C/C++ string semantic, the real question is whether the preprocessor is capable of performing such convolute tasks?

We could of course implement explicit initialisation/referencing as illustrated within your previous post, but this regresses back to our existing and somewhat "unfriendly" ASM/uC practices. We are all rather enamoured by the syntax/performance balance of C/C++'s literal semantic and it would be a shame if this is not extensible with respect to compile-time customization.

You might also need a extra flag in your class to state whether the class references an explicitly allocated buffer or not, so that you don't try to deallocate it.

Again, due to performance/size constraints, it has been necessary to integrate this functionality within the existing reference-count logic; an augmented reference-count ensures that all literal/constant buffers never exhibit a free unreferenced state.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.