Thoughts on being a package registry maintainer

Introduction

Julia is a modern programming language with a fairly large package ecosystem (currently ~10k packages) that provide all kinds of useful functionality to build on. Packages are registered in a global registry called General, which is installed by default by the package manager, allowing users to easily add and use registered packages.

Packages are identified by a name (like Convex.jl, DataFrames.jl, or Makie.jl) and a UUID (like f65535da-76fb-5f13-bab9-19810c17039a). General has a rule that every package has to have a unique name (although Julia and its package manager Pkg.jl can tolerate name collisions in nested dependencies), to avoid confusion. General also has many guidelines (as opposed to strict rules) around package names, such as avoiding overly similar names (e.g. the name Websocket.jl was rejected as there was already as WebSockets.jl), avoiding acronyms, names shorter than 5 letters, and so forth. When a new package is being registered, automation checks the guidelines and refuses to auto-merge packages which violate them. That is, such packages need to be manually merged by a registry maintainer. If the package isn’t breaking a hard rule (such as a name collision), they can still be merged at the discretion of the maintainers, but it’s not a guarantee. There is a sense that the namespace is a shared resource and registering under the name you want is not an automatic right because it affects everyone (by consuming that shared resource in potentially more or less useful ways).

I have become one of these maintainers and so I find myself sometimes trying to answer questions like: should I merge this package registration? What factors should I consider? Should I just not “get involved” at all? (But isn’t that also de-facto making a decision?) Why do I have this power and what responsibility does it come with?

I’ll try to work backwards through these.

How did I become a maintainer & should I exercise that power?

I think the last question is in some ways the most straightforward but least satisfying: I became a registry maintainer because I got involved in improving the automation that powers the registry, and at some point years ago I was given commit access to General to facilitate that work. I believe at the time I asked if I should only use it viz-a-viz the automation and was told no, just use it responsibly, but I have a truly terrible memory and really can’t say for sure if that exchange happened or was just in my head. At least no one has told me not to merge stuffFeel free to contact me if you have issues with my use of commit privileges to General, and I will try my best to not be defensive..

So then I have this possibly-self-proclaimed power as “registry maintainer”. Should I exercise it at all? Sometimes I avoid it; if I’m not sure about something or it seems like a complicated situation, it’s at least easier to not get involved. But it’s a specific form of community moderation, and I think community moderation is an important thing. Julia’s community forum software tells me I’ve spend 40 days reading posts there over the years, I am convinced it would be a much less pleasant place to be if it weren’t for the excellent moderators like Matt Bauman. So I think if I can try to approach issues that arise in General (typically “I want my package to be named X, but the guidelines or another community member disagrees”) thoughtfully and considerately, then that’s a good thingAnd on the flip side, if I’m feeling frustrated, I think it’s probably better to stay out of it or approach it another time..

What factors should I consider in registration decisions?

So, if I am going to try to get involved and make a registration decision (typically, merging a package registration), what factors should I consider in doing so?

First, I think it helps to discuss a bit more background. General is in an interesting position as a package registry, where it is more permissive than some registries like R’s CRAN or TeX’s CTAN (which involve human review for every registration), but more restrictive than more free-for-all registries like npm, PyPi, and Cargo, which automatically and ~instantly register eligible packages. General’s approach is to place a 3-day waiting period for community review, during which time anyone with a GitHub account, as well as our automated guideline checks, can block auto-registration, but it’s “default yes” in the sense that if there are no blocks, the registration is automatically merged after the waiting period.

I’ve written about this a bit more on a community discussion I started about disallowing vibe-coded packages in General here. That discussion surfaced a spectrum of opinions on permissiveness, from Christopher Rackauckas’s opinion that

Everyone still remembers Tony right? We learned all the way back then that having a code czar doesn’t scale. That’s when we changed to General being permissive by default, and it was required in order to make v1.0 Julia work given the growth we had. From time to time someone comes in trying to police it again, and every time that happens they grow tired pretty quickly for the same reason.
I think we just have to be permissive. We should test what’s easy, for example we should probably require docs and not auto-merge anything without docs or CI. But beyond what is easy to test, the human labor should flag issues when we can, rollback / ban when we need to, etc. and be willing to accept that there are some things in General which may not match what we call high quality and that’s okay as long as any truly bad issues (something gets abandoned, name-squatted, security issue, etc.) are what the human time is saved for rapidly handling.

to Micheal Goerz’s reply

I think if anything, we’re moving towards tightening the supervision of the General registry. Certainly, there’s a lot more emphasis on good package naming. We don’t generally do detailed code / quality reviews, but it’s a lot more regulated the PyPI. I feel like the Julia community takes the “collective ownership” of the ecosystem pretty seriously, and that’s a good thing.

Overall, we don’t really currently have a full “system” for making these decisions. We have guidelines, we have past precedent, we have community values, and we have judgement calls.

Three-letter package names

I started writing this post because I wanted to discuss a particular registration question that came up recentlyBut I have an endless need to provide context, which is why we’re 1,000 words in and just getting to the point.. A package author wanted to register a package named Ark.jl, but it does not meet the guideline for package names to be at least 5 letters, so it is not eligible for auto-registration. This is a pretty decent name, in that it is not overtly similar to another name (there’s no Arks.jl for example), it’s somewhat “unique” (it’s not “Analysis.jl” or whatever), it’s not jargon-y, it’s not an acronym, and it is a Julia port of an existing Go project (https://github.com/mlange-42/ark) by the same author. But there can’t be that many three letter names: if we use the guideline for the Damerau–Levenshtein distance between lowercased names to be at least 2, and assume the first letter is capitalized, the latter two are not, and using letters not digits, there’s somewhere between 654 and 676 names available, as shown by the following mixed-integer linear program solveI actually tried a few things to get the number here, and I think there’s probably a clever way to do it. I ended up just running a mixed-integer solver for 30 minutes and reporting the bounds from there, since it’s good enough to make the point. If someone comes up with way the exact number, let me know! I will update the post.
edit: I wrote a followup here with a full solution.. That’s not that many!

Code

# Code written by ChatGPT 5.1 thinking
using JuMP
using HiGHS
using MathOptInterface
const MOI = MathOptInterface

# -----------------------------
# 1. All [A-Z][a-z][a-z] names
# -----------------------------
function all_names()
    names = String[]
    for C in 'A':'Z', x in 'a':'z', y in 'a':'z'
        push!(names, string(C, x, y))
    end
    return names
end

# ----------------------------------------------------------
# 2. DL=1 neighbors for 3-letter lowercase strings
#
# Damerau–Levenshtein distance = 1 (for equal length) means:
#   - one substitution in any position, OR
#   - one adjacent transposition (1↔2, 2↔3)
# We enumerate those directly instead of calling the metric.
# ----------------------------------------------------------
function neighbors_DL1(s::String)
    @assert ncodeunits(s) == 3
    c1, c2, c3 = s[1], s[2], s[3]
    neigh = String[]

    # substitutions
    for a in 'a':'z'
        if a != c1
            push!(neigh, string(a, c2, c3))
        end
        if a != c2
            push!(neigh, string(c1, a, c3))
        end
        if a != c3
            push!(neigh, string(c1, c2, a))
        end
    end

    # adjacent transpositions
    if c1 != c2
        push!(neigh, string(c2, c1, c3))  # swap 1,2
    end
    if c2 != c3
        push!(neigh, string(c1, c3, c2))  # swap 2,3
    end

    return neigh
end

# ----------------------------------------------------------
# 3. Build conflict edge list for full DL≥2 constraint
#
# DL(lowercase(x), lowercase(y)) ≥ 2
# ⇔ forbid DL = 0 or 1.
# DL = 0 = duplicate name; we just don't include duplicates.
# DL = 1 neighbors are exactly neighbors_DL1().
# ----------------------------------------------------------
function build_conflicts_full()
    names  = all_names()
    lowers = lowercase.(names)
    N = length(names)

    # map lowercase string -> index 1..N
    name_to_idx = Dict{String,Int}(lowers .=> collect(1:N))

    conflicts = Tuple{Int,Int}[]
    for i in 1:N
        s = lowers[i]
        for t in neighbors_DL1(s)
            j = name_to_idx[t]
            if j > i
                push!(conflicts, (i, j))
            end
        end
    end

    return names, conflicts
end

# ----------------------------------------------------------
# 4. Build MIS MILP model with 30-minute timeout
#    Maximize number of chosen names
#    subject to: x[i] + x[j] ≤ 1 for every DL=1 pair (i,j).
# ----------------------------------------------------------
function build_max_DL_model()
    names, conflicts = build_conflicts_full()
    N = length(names)
    println("Total candidate names: $N")
    println("Conflict edges (DL=1 pairs): ", length(conflicts))

    model = Model(HiGHS.Optimizer)
    # 30 minute time limit
    set_attribute(model, MOI.TimeLimitSec(), 1800.0)

    @variable(model, x[1:N], Bin)
    @objective(model, Max, sum(x))

    for (i, j) in conflicts
        @constraint(model, x[i] + x[j] <= 1)
    end

    return model, names, x
end

# ----------------------------------------------------------
# 5. Solve and report bound
# ----------------------------------------------------------
function solve_with_timeout()
    model, names, x = build_max_DL_model()
    println("\nSolving with HiGHS (30 min time limit)...")
    optimize!(model)
    return model, names, x
end

solve_with_timeout()

which yielded

Total candidate names: 17576
Conflict edges (DL=1 pairs): 676000

Solving with HiGHS (30 min time limit)...
Running HiGHS 1.12.0 (git hash: 755a8e027): Copyright (c) 2025 HiGHS under MIT licence terms
MIP has 676000 rows; 17576 cols; 1352000 nonzeros; 17576 integer variables (17576 binary)
Coefficient ranges:
  Matrix  [1e+00, 1e+00]
  Cost    [1e+00, 1e+00]
  Bound   [1e+00, 1e+00]
  RHS     [1e+00, 1e+00]
Presolving model
676000 rows, 17576 cols, 1352000 nonzeros  1s
18929 rows, 17576 cols, 103431 nonzeros  3s
18929 rows, 17576 cols, 103431 nonzeros  4s
Presolve reductions: rows 676000(-0); columns 17576(-0); nonzeros 1352000(-0) - Not reduced
Objective function is integral with scale 1

Solving MIP model with:
   18929 rows
   17576 cols (17576 binary, 0 integer, 0 implied int., 0 continuous, 0 domain fixed)
   103431 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; H => Heuristic;
     I => Shifting; J => Feasibility jump; L => Sub-MIP; P => Empty MIP; R => Randomized rounding;
     S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution; Y => HiGHS solution;
     Z => ZI Round; l => Trivial lower; p => Trivial point; u => Trivial upper; z => Trivial zero

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 z       0       0         0   0.00%   inf             -0                 Large        0      0      0         0     5.2s
 J       0       0         0   0.00%   inf             1                  Large        0      0      0         0     5.3s
 S       0       0         0   0.00%   1300            23              5552.17%        0      0      0         0    12.6s
 R       0       0         0   0.00%   676             24              2716.67%        0      0      0     10393    12.7s
 S       0       0         0   0.00%   676             26              2500.00%       86      3      0     10696    18.5s
         0       0         0   0.00%   676             26              2500.00%      176      7      0     11288    24.6s
 C       0       0         0   0.00%   676             27              2403.70%      253      8      0     11474    29.9s
         0       0         0   0.00%   676             27              2403.70%      329      9      0     11650    35.0s
         0       0         0   0.00%   676             27              2403.70%      447     11      0     12044    41.9s
         0       0         0   0.00%   676             27              2403.70%      528     14      0     12387    49.3s
 L       0       0         0   0.00%   676             628                7.64%      580     16      0     12675    82.5s
 S       0       0         0   0.00%   676             638                5.96%      580     15      0     45715   474.1s
 B       0       0         0   0.00%   676             645                4.81%      580     15      0     45715   474.2s
 B     480     469         0   0.00%   676             648                4.32%      726     20      0    929147   860.2s
       634     551         1   0.00%   676             648                4.32%      771     22      1     1005k   879.6s
 T     654     547        12   0.00%   676             649                4.16%      771     22      1     1006k   880.7s
       734     610        19   0.00%   676             649                4.16%      832     24      1     1035k   902.3s
       848     675        37   0.00%   676             649                4.16%      837      6      1     1074k   922.6s
       951     757        46   0.00%   676             649                4.16%      882      7      1     1109k   941.7s
 T     953     750        47   0.00%   676             650                4.00%      882      7      4     1110k   942.1s

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

      1043     832        53   0.00%   676             650                4.00%      944      9     58     1138k   957.5s
 T    1050     810        57   0.00%   676             651                3.84%      944      9     58     1138k   957.9s
 T    1065     800        60   0.00%   676             652                3.68%      944      9     58     1139k   958.6s
      1148     871        62   0.00%   676             652                3.68%     1152     11     58     1170k   974.3s
      1240     951        72   0.00%   676             652                3.68%     1152     12     58     1198k   989.4s
 T    1264     937        84   0.00%   676             653                3.52%     1152     12     60     1200k   991.3s
      1339    1004        86   0.00%   676             653                3.52%     1213     10     61     1228k  1007.5s
      1430    1075       101   0.00%   676             653                3.52%     1188     12     61     1263k  1035.6s
      1941    1611       119   0.00%   676             653                3.52%     1269     14     61     1650k  1320.3s
      2002    1610       120   0.00%   676             653                3.52%     1304     16     61     2091k  1710.7s
      2107    1674       138   0.00%   676             653                3.52%     1365     17     61     2128k  1730.5s
      2205    1739       156   0.00%   676             653                3.52%     1333     17     61     2160k  1747.6s
 T    2241    1694       172   0.00%   676             654                3.36%     1333     17     81     2163k  1750.6s
      2310    1754       174   0.00%   676             654                3.36%     1358     12     81     2201k  1768.9s
      2421    1824       191   0.00%   676             654                3.36%     1300     12     81     2248k  1790.5s
      2467    1919       204   0.00%   676             654                3.36%     1382     13     81     2262k  1800.2s
      2467    1919       204   0.00%   676             654                3.36%     1382     13     81     2262k  1800.2s

Solving report
  Status            Time limit reached
  Primal bound      654
  Dual bound        676
  Gap               3.36% (tolerance: 0.01%)
  P-D integral      11301.577112
  Solution status   feasible
                    654 (objective)
                    0 (bound viol.)
                    4.4408920985e-16 (int. viol.)
                    0 (row viol.)
  Timing            1800.20
  Max sub-MIP depth 6
  Nodes             2467
  Repair LPs        0
  LP iterations     2262922
                    558768 (strong br.)
                    5277 (separation)
                    786633 (heuristics)

So, should the name be accepted? I tried to see what we have done before. To that end, I queried the GitHub API to pull down all the registration requests and the comments made on them. I filtered to only requests that were closed or were merged by not-a-robot, indicating human intervention (or lack thereof). Then, for this particular question, I selected only the registration attempts for packages with names of length 3, which left me with 210 registrations. These are tabulated here along with the following plots, analysis scripts, and so forth.

First, I found that we indeed seem to be getting stricter over time, with fewer accepted registrations than in the past:

Next, I categorized each accepted registration into one of six categories, using LLMs to analyze the commentsI spot checked the results and they look quite good.:

We see that the “discretionary” has always been the largest one, but it too has been declining.

I also categorized the rejected registrations into six categories with the same methodology; here we can see the results:

We see that “duplicate/superseded PR” is increasingly common. This typically means the package author agreed to a new, longer, name and made a new registration for it.

So, what to make of this? My takeaway in this case was:

Seems like we sometimes merge stuff like this, sometimes we don’t. To me ArkECS.jl is a better name as it describes that its an ECS and I don’t think Ark is well known enough as a trigram to indicate it’s an ECS otherwise. But on the other hand, I don’t really see some other domain that should get to claim Ark either. So probably no one should get Ark or we should merge this.
In my opinion, the best reason for “no one should” is we could make no-3-letter-packages a hard rule so we don’t have to deliberate every time. But we don’t actually have that rule now, it’s still ultimately discretionary and we would have to get buy-in from everyone with commit access to General, otherwise some will still get merged depending on who happens to be looking at the PR.
So I guess I’m leaning towards merging in a day or two under the discretionary banner, unless someone has an objection, or [the author] is OK with ArkECS.jl.

The author kindlyThis really does help. Some folks can be pretty upset when their registration is blocked, especially if they are coming from another ecosystem with a much more permissive registry. They can feel like registering with a preferred name is a right they have that is being infringed on. That’s not the case in the Julia world, but I can see how it can feel frustrating if your expectations are different. expressed their preference for Ark.jl, and I merged it. I’m still not totally sure this was the right decision, but at least it made the package author happy and doesn’t seem to harm anyone.

Do we need a system for this?

I think the long-term trend will basically be what Chris said in the discourse post I quoted above: over time, manual arbitration burns people out, and we will always end up defaulting to whatever gets auto-merged. So I think the best way to make a sustainable influence on the system is with thoughtful automation. Of course, I would say that, as that as what I’ve mostly worked onor at least, have tried to do over years with regards to General 🙂.

But the question remains as to what the role of discretionary power is here. I am certainly uncomfortable with using it; it feels very difficult to do fairly. Having some framework would likely help, at least to provide some skaffolding upon which to make any particular decision. But it also feels like there will always be various edge cases and situational things, and allowing some discretionary power is a good thingI guess I would say this too, as I have that power?.