This document is still under construction, and still subject to significant changes. Still, I hope parts of it will be useful, so I'm releasing it even though it's not done.
For the most part, it's a collection of anecdotal information that already assumes some familiarity with the Perl sources. I really need an introductory section that describes the organization of the sources and all the various auxiliary files that are part of the distribution.
Subscribe by sending the message (in the body of your letter)
subscribe perl5-porters
to perl5-porters-request@africa.nicoh.com .
print "You've got an old perl\n" if $] < 5.002;
(Observations about the imprecision of floating point numbers for representing reality probably have more relevance than you might imagine :-)
You can also require particular version (or later) with
use 5.002;
These sub-versions can also be used as floating point numbers, so you can do things such as
print "You've got an unstable perl\n" if $] == 5.00303;
You can also require particular version (or later) with
use 5.00303;
The sub-versions are usually available on CPAN in the src/5.0/unsupported directory.
First, we need some way to identify releases that are known to have new
features that need testing and exploration. The subversion scheme does that
nicely while fitting into the
use 5.003;
mold.
Second, since most of the folks who help maintain perl do so on a free-time voluntary basis, perl development does not proceed at a precise pace, though it always seems to be moving ahead quickly. We needed some way to pass around the ``patch pumpkin'' to allow different people chances to work on different aspects of the distribution without getting in each other's way. It wouldn't be constructive to have multiple people working on incompatible implementations of the same idea. Instead what was needed was some kind of ``baton'' or ``token'' to pass around so everyone knew whose turn was next.
[begin quote]
Who has the patch pumpkin?
To explain: A former co-worker told me once that at a previous job, there was one tape drive and multiple systems that used it for backups. But instead of some high-tech exclusion software, they used a low-tech method to prevent multiple simultaneous backups: a stuffed pumpkin. No one was allowed to make backups unless they had the ``backup pumpkin''.
[end quote]
The name has stuck.
Consider writing the appropriate documentation first and then implementing it to correspond to the documentation.
Configure and config_h.SH are also automatically generated by metaconfig . In general, you should patch the metaconfig units instead of patching these files directly.
Here are the steps I go through to prepare a patch & distribution.
Lots of it could doubtless be automated but isn't.
At the same time, announce what you plan to do with the patch pumpkin, to allow folks a chance to object or suggest alternatives, or do it for you. Naturally, the patch pumpkin holder ought to incorporate various bug fixes and documentation improvements that are posted while he or she has the pumpkin, but there might also be larger issues at stake.
One of the precepts of the subversion idea is that we shouldn't give it to anyone unless we have some idea what you're going to do with it.
metaconfig -m
will regenerate Configure and config_h.SH. More information on obtaining and running metaconfig is in the U/README file that comes with Perl's metaconfig units. Perl's metaconfig units should be available the same place you found this file. On CPAN, look under my directory id/ANDYD/ for a file such as 5.003_07-02.U.tar.gz . That file should be unpacked in your main perl source directory. It contains the files needed to run metaconfig to reproduce Perl's Configure script.
Alternatively, do consider if the *ish.h files might be a better place for your changes.
perl -MExtUtils::Manifest -e fullcheck
to do half the job. This will make sure everything listed in MANIFEST is included in the distribution. dist's manicheck command will also list extra files in the directory that are not listed in MANIFEST.
The MANIFEST is normally sorted, with one exception. Perl includes both a Configure script and a configure script. The configure script is a front-end to the main Configure , but is there to aid folks who use autoconf-generated configure files for other software. The problem is that Configure and configure are the same on case-insensitive file systems, so I deliberately put configure first in the MANIFEST so that the extraction of Configure will overwrite configure and leave you with the correct script. (The configure script must also have write permission for this to work, so it's the only file in the distribution I normally have with write permission.)
If you are using metaconfig to regenerate Configure, then you should note that metaconfig actually uses MANIFEST.new, so you want to be sure MANIFEST.new is up-to-date too. I haven't found the MANIFEST/MANIFEST.new distinction particularly useful, but that's probably because I still haven't learned how to use the full suite of tools in the dist distribution.
In all, the following files should probably be executable:
Configure configpm configure embed.pl installperl installman keywords.pl lib/splain myconfig opcode.pl perly.fixer t/TEST t/*/*.t *.SH vms/ext/Stdio/test.pl vms/ext/filespec.t vms/fndvers.com x2p/*.SH
Other things ought to be readable, at least :-).
Probably, the permissions for the files could be encoded in MANIFEST somehow, but I'm reluctant to change MANIFEST itself because that could break old scripts that use MANIFEST.
I seem to recall that some SVR3 systems kept some sort of file that listed permissions for system files; something like that might be appropriate.
It may also be necessary to update vms/config.vms and plan9/config.plan9, though you should be quite careful in doing so if you are not familiar with those systems. You might want to issue your patch with a promise to quickly issue a follow-up that handles those directories.
Some additional notes from Larry on this:
Don't forget to regenerate perly.c.diff.
diff -c perly.c.orig perly.c >perly.c.diff
It may be necessary to edit perly.c first if the previous attempt to patch didn't successfully remove the chunk of lines starting with
#line 29 "perly.y"
and ending one line before
#define YYERRCODE 256
This only happens when you add or remove a token type, causing that hunk of the patch to fail. I suppose this could be automated, but it doesn't happen very often nowadays.
Larry
I used to include rules like the following in the makefile:
# The following three header files are generated automatically # The correct versions should be already supplied with the perl kit, # in case you don't have perl or 'sh' available. # The - is to ignore error return codes in case you have the source # installed read-only or you don't have perl yet. keywords.h: keywords.pl @echo "Don't worry if this fails." - perl keywords.pl
However, I got lots of mail consisting of people worrying because the
command failed. I eventually decided that I would save myself time and
effort by manually running make regen_headers
myself rather than answering all the questions and complaints about the
failing command.
Of course, some incompatible changes may well be necessary. I'm just suggesting that we not make any such changes without thinking carefully about them first. If possible, we should provide backwards-compatibility stubs. There's a lot of XS code out there. Let's not force people to keep changing it.
mkdir ../perl5.003_08 awk '{print $1}' MANIFEST | cpio -pdm ../perl5.003_08 cd ../ tar cf perl5.003_08.tar perl5.003_08 gzip --best perl5.003_08.tar
# Print a reassuring "End of Patch" note so people won't # wonder if their mailer truncated patches. print "\n\nEnd of Patch.\n";
at the end. That's because I used to get questions from people asking if their mail was truncated.
Here's how I generate a new patch. I'll use the hypothetical 5.003_07 to 5.003_08 patch as an example.
# unpack perl5.003_07/ gzip -d -c perl5.003_07.tar.gz | tar -xof - # unpack perl5.003_08/ gzip -d -c perl5.003_08.tar.gz | tar -xof - makepatch perl5.003_07 perl5.003_08 > perl5.003_08.pat
Makepatch will automatically generate appropriate rm commands to remove deleted files. Unfortunately, it will not correctly set permissions for newly created files, so you may have to do so manually. For example, patch 5.003_04 created a new test t/op/gv.t which needs to be executable, so at the top of the patch, I inserted the following lines:
# Make a new test touch t/op/gv.t chmod +x t/opt/gv.t
Now, of course, my patch is now wrong because makepatch didn't know I was going to do that command, and it patched against /dev/null.
So, what I do is sort out all such shell commands that need to be in the patch (including possible mv-ing of files, if needed) and put that in the shell commands at the top of the patch. Next, I delete all the patch parts of perl5.003_08.pat, leaving just the shell commands. Then, I do the following:
cd perl5.003_07 sh ../perl5.003_08.pat cd .. makepatch perl5.003_07 perl5.003_08 >> perl5.003_08.pat
(Note the append to preserve my shell commands.) Now, my patch will line up with what the end users are going to do.
rm -rf perl5.003_07 gzip -d -c perl5.003_07.tar.gz | tar -xf - cd perl5.003_07 sh ../perl5.003_08.pat patch -p1 -N < ../perl5.003_08.pat cd .. gdiff -r perl5.003_07 perl5.003_08
where gdiff is GNU diff. Other diff's may also do recursive checking.
If your changes include conditional code, try to test the different branches as thoroughly as you can. For example, if your system supports dynamic loading, you can also test static loading with
sh Configure -Uusedl
You can also hand-tweak your config.h to try out different #ifdef branches.
#if defined(I_FOO) # include <foo.h> #elif defined(I_BAR) # include <bar.h> #else # include <fubar.h> #endif
You have to do the more Byzantine
#if defined(I_FOO) # include <foo.h> #else # if defined(I_BAR) # include <bar.h> # else # include <fubar.h> # endif #endif
Incidentally, whitespace between the leading '#' and the preprocessor command is not guaranteed, but is very portable and you may use it freely. I think it makes things a bit more readable, especially once things get rather deeply nested. I also think that things should almost never get too deeply nested, so it ought to be a moot point :-)
memcmp()
and bcmp().
The perl.h header file
handles these by appropriate #defines, selecting the POSIX mem*() functions
if available, but falling back on the b*() functions, if need be.
More serious is the case where some brilliant person decided to use the
same function name but give it a different meaning or calling sequence :-).
getpgrp()
and setpgrp()
come to mind. These are a
real problem on systems that aim for conformance to one standard (e.g.
POSIX), but still try to support the other way of doing things (e.g. BSD).
My general advice (still not really implemented in the source) is to do
something like the following. Suppose there are two alternative versions,
fooPOSIX()
and fooBSD().
#ifdef HAS_FOOPOSIX /* use fooPOSIX(); */ #else # ifdef HAS_FOOBSD /* try to emulate fooPOSIX() with fooBSD(); perhaps with the following: */ # define fooPOSIX fooBSD # else # /* Uh, oh. We have to supply our own. */ # define fooPOSIX Perl_fooPOSIX # endif #endif
#ifdef HAS_NEATO_FEATURE /* use neato feature */ #else /* use some fallback mechanism */ #endif
rather than the more impenetrable
#ifndef MISSING_NEATO_FEATURE /* Not missing it, so we must have it, so use it */ #else /* Are missing it, so fall back on something else. */ #endif
Of course for this toy example, there's not much difference. But when the #ifdef's start spanning a couple of screen fulls, and the #else's are marked something like
#else /* !MISSING_NEATO_FEATURE */
I find it easy to get lost.
pause()
function as an illustration.
Perl5.003 has the following in perl.h
#ifndef HAS_PAUSE #define pause() sleep((32767<<16)+32767) #endif
Configure sets HAS_PAUSE if the system has the pause()
function, so this #define only kicks in if the pause()
function is missing. Nice idea, right?
Unfortunately, some systems apparently have a prototype for
pause()
in unistd.h
, but don't actually have the function in the library. (Or maybe they do
have it in a library we're not using.)
Thus, the compiler sees something like
extern int pause(void); /* . . . */ #define pause() sleep((32767<<16)+32767)
and dies with an error message. (Some compilers don't mind this; others apparently do.)
To work around this, 5.003_03 and later have the following in perl.h:
/* Some unistd.h's give a prototype for pause() even though HAS_PAUSE ends up undefined. This causes the #define below to be rejected by the compiler. Sigh. */ #ifdef HAS_PAUSE # define Pause pause #else # define Pause() sleep((32767<<16)+32767) #endif
This works.
The curious reader may wonder why I didn't do the following in util.c instead:
#ifndef HAS_PAUSE void pause() { sleep((32767<<16)+32767); } #endif
That is, since the function is missing, just provide it. Then things would probably be been alright, it would seem.
Well, almost. It could be made to work. The problem arises from the conflicting needs of dynamic loading and namespace protection.
For dynamic loading to work on AIX (and VMS) we need to provide a list of
symbols to be exported. This is done by the script perl_exp.SH
, which reads global.sym
and interp.sym
. Thus, the pause
symbol would have to be added to global.sym
So far, so good.
On the other hand, one of the goals of Perl5 is to make it easy to either
extend or embed perl and link it with other libraries. This means we have
to be careful to keep the visible namespace ``clean''. That is, we don't
want perl's global variables to conflict with those in the other
application library. Although this work is still in progress, the way it is
currently done is via the embed.h
file. This file is built from the global.sym
and interp.sym
files, since those files already list the globally visible symbols. If we
had added pause
to global.sym, then embed.h
would contain the line
#define pause Perl_pause
and calls to pause
in the perl sources would now point to
Perl_pause
. Now, when ld
is run to build the perl
executable, it will go looking for perl_pause
, which probably won't exist in any of the standard libraries. Thus the
build of perl will fail.
Those systems where HAS_PAUSE
is not defined would be ok, however, since they would get a Perl_pause
function in util.c. The rest of the world would be in trouble.
And yes, this scenario has happened. On SCO, the function chsize
is available. (I think it's in -lx
, the Xenix compatibility library.) Since the perl4 days (and possibly
before), Perl has included a chsize
function that gets called something akin to
#ifndef HAS_CHSIZE I32 chsize(fd, length) /* . . . */ #endif
When 5.003 added
#define chsize Perl_chsize
to embed.h , the compile started failing on SCO systems.
The ``fix'' is to give the function a different name. The one implemented in 5.003_05 isn't optimal, but here's what was done:
#ifdef HAS_CHSIZE # ifdef my_chsize /* Probably #defined to Perl_my_chsize in embed.h */ # undef my_chsize # endif # define my_chsize chsize #endif
My explanatory comment in patch 5.003_05 said:
Undef and then re-define my_chsize from Perl_my_chsize to just plain chsize if this system HAS_CHSIZE. This probably only applies to SCO. This shows the perils of having internal functions with the same name as external library functions :-).
Now, we can safely put my_chsize
in global.sym
, export it, and hide it with embed.h
.
To be consistent with what I did for pause
, I probably should have called the new function Chsize
, rather than my_chsize
. However, the perl sources are quite inconsistent on this (Consider New,
Mymalloc, and Myremalloc, to name just a few.)
There is a problem with this fix, however, in that Perl_chsize
was available as a libperl.a
library function in 5.003, but it isn't available any more (as of
5.003_07). This means that we've broken binary compatibility. This is not
good.
Part of the problem is that we want to have some functions listed as exported but not have their names mangled by embed.h or possibly conflict with names in standard system headers. We actually already have such a list at the end of perl_exp.SH (though that list is out-of-date):
# extra globals not included above. cat <<END >> perl.exp perl_init_ext perl_init_fold perl_init_i18nl14n perl_alloc perl_construct perl_destruct perl_free perl_parse perl_run perl_get_sv perl_get_av perl_get_hv perl_get_cv perl_call_argv perl_call_pv perl_call_method perl_call_sv perl_requirepv safecalloc safemalloc saferealloc safefree
This still needs much thought, but I'm inclined to think that one possible
solution is to prefix all such functions with perl_
in the source and list them along with the other perl_*
functions in
perl_exp.SH
.
Thus, for chsize
, we'd do something like the following:
/* in perl.h */ #ifdef HAS_CHSIZE # define perl_chsize chsize #endif
then in some file (e.g. util.c or doio.c ) do
#ifndef HAS_CHSIZE I32 perl_chsize(fd, length) /* implement the function here . . . */ #endif
Alternatively, we could just always use chsize
everywhere and move
chsize
from global.sym
to the end of perl_exp.SH
. That would probably be fine as long as our chsize
function agreed with all the
chsize
function prototypes in the various systems we'll be using. As long as the
prototypes in actual use don't vary that much, this is probably a good
alternative. (As a counter-example, note how Configure and perl have to go
through hoops to find and use get Malloc_t and Free_t for malloc
and free
.)
At the moment, this latter option is what I tend to prefer.
Metaconfig and autoconf are two tools with very similar purposes. Metaconfig is actually the older of the two, and was originally written by Larry Wall, while autoconf is probably now used in a wider variety of packages. The autoconf info file discusses the history of autoconf and how it came to be. The curious reader is referred there for further information.
Overall, both tools are quite good, I think, and the choice of which one to use could be argued either way. In March, 1994, when I was just starting to work on Configure support for Perl5, I considered both autoconf and metaconfig, and eventually decided to use metaconfig for the following reasons:
Metaconfig's Configure scripts, on the other hand, can be interactive. Thus if Configure is guessing things incorrectly, you can go back and fix them. This isn't as important now as it was when we were actively developing Configure support for new features such as dynamic loading, but it's still useful occasionally.
@INC
is
the following:
$archlib $privlib $sitearch $sitelib
Specifically, on my Solaris/x86 system, I run sh Configure -Dprefix=/opt/perl and I have the following directories:
/opt/perl/lib/i86pc-solaris/5.00307 /opt/perl/lib /opt/perl/lib/site_perl/i86pc-solaris /opt/perl/lib/site_perl
That is, perl's directories come first, followed by the site-specific directories.
The site libraries come second to support the usage of extensions across
perl versions. Read the relevant section in INSTALL
for more information. If we ever make $sitearch
version-specific, this topic could be revisited.
Apparently, most folks who want to override one of the standard library files simply do it by overwriting the standard library files.
The main intent of APPLLIB_EXP is for folks who want to send out a version of Perl embedded in their product. They would set the symbol to be the name of the library containing the files needed to run or to support their particular application. This works at the "override" level to make sure they get their own versions of any library code that they absolutely must have configuration control over.
As such, I don't see any conflict with a sysadmin using it for a override-ish sort of thing, when installing a generic Perl. It should probably have been named something to do with overriding though. Since it's undocumented we could still change it... :-)
Given that it's already there, you can use it to override distribution modules. If you do
sh Configure -Dccflags='-DAPPLLIB_EXP=/my/override'
then perl.c will put /my/override ahead of ARCHLIB and PRIVLIB.
I typically upload both the patch file, e.g. perl5.003_08.pat.gz and the full tar file, e.g. perl5.003_08.tar.gz .
If you want your patch to appear in the src/5.0/unsupported directory on CPAN, send e-mail to the CPAN master librarian. (Check out http://www.perl.com/CPAN/CPAN.html).
Configure -Dinstallprefix=/blah/blah
Currently, we support -Dprefix=/blah/blah , but the changing the install location has to be handled by something like the config.over trick described in INSTALL . AFS users also are treated specially. We should probably duplicate the metaconfig prefix stuff for an install prefix.
free(0),
for example.) This might be a time-saver for systems
that already have a good malloc. (Recent Linux libc's apparently have a
nice malloc that is well-tuned for the system.)
long long
on systems where long long
is larger than what we've been using for IV
? What if you can't sprintf
a long long
?
$firstmakefile
that the make
command will try to use before it uses
Makefile
. Such may not be the case for all make
commands, particularly those on non-Unix systems.
Probably some variant of the BSD .depend file will be useful. We ought to check how other packages do this, if they do it at all. We could probably pre-generate the dependencies (with the exception of malloc.o, which could probably be determined at Makefile.SH extraction time.
lockf(),
flock(),
and/or fcntl()
file locking. It's a
mess.
All opinions expressed herein are my own.