Why it took 4 years to get a lock files specification

(snarky.ca)

115 points | by birdculture 12 hours ago ago

70 comments

klustregrif 9 hours ago ago
Overall it feels like UV is the best thing to happen to python packaging in two decades, by circumventing the endless non productive discussions on peps and instead just building something that works and is fast. In Rust naturally.
[-]
- webology 8 hours ago ago
  UV is great but also builds on existing PEPs. While they have the ability experiment (which is great), they also benefit from those "endless non productive discussions on peps" as you called it.
  I think UV proves that dedicated funding can make a huge impact on a project and benefit for the community. They are a doing a damn, good job.
  [-]
  - tormeh 7 hours ago ago
    They mostly took inspiration from other languages for UV. Cargo (Rust) was a huge inspiration, but they got stuff from Ruby as well, I believe. There was an episode on "the changelog" about it. Don't remember them saying anything about PEPs, although that might just be me not having listened to the entire thing. However, the Charlie Marsh was extremely insistent on the advantages of being a polyglot and hiring people with diverse programming experiences. So I think it's quite safe to assume that played a bigger role than just PEPs.
  - the__alchemist 7 hours ago ago
    I can't speak for the UV team. My 2C on how I would treat the PEPs: If there is an accepted one, and implementing it doesn't go too strongly against your competing design goals, do it for compatibility. This does not imply that the PEP is driving your design, or required to make your software. It is a way to improve compatibility.
  - zahlman 5 hours ago ago
    > dedicated funding can make a huge impact on a project
    Where does Astral's funding come from, anyway?
  - nxpnsv 5 hours ago ago
    Rye was already pretty good before it was donated to astral and renamed to uv though...
    [-]
    - the__alchemist 3 hours ago ago
      I wrote an earlier one (rust, inspired by Cargo, managed deps, scripts, and py versions) called PyFlow that I abandoned, because nobody used it. "Why should I use this when we have pip, pipenv, and poetry?"
      [-]
      - aiiizzz 2 hours ago ago
        Programming is 80% marketing, eh?
- smallmancontrov 7 hours ago ago
  Yet more proof that "the best way to do anything in python is to not do it in python."
  [-]
  - zahlman 2 hours ago ago
    It's true that where Python offers critical performance it's typically by providing a nice interface to existing compiled code. But people who work through those interfaces are still fundamentally "doing it in Python"; the most important "it" is that which makes a useful system on top of the number-crunching.
    But putting that aside, a big part of uv's performance is due to things that are not the implementation language. Most of the actually necessary parts of the installation process are I/O bound, and works through C system calls even in pip. The crunchy bits are package resolution in rare cases (where lock files cache the result entirely), and pre-compiling Python to .pyc bytecode files (which is embarrassingly parallel if you don't need byte-for-byte reproducibility, and normally optional unless you're installing something with admin rights to be used by unprivileged users).
    Uv simply has better internal design. You know that original performance chart "installing the Trio dependencies with a warm cache"?
    It turns out that uv at the time defaulted to not pre-compiling while pip defaulted to doing it; an apples-to-apples comparison is not as drastic, but uv also does the compilation in parallel which pip hasn't been doing. I understand this functionality is coming to pip soon, but it just hasn't been a priority even though it's really not that hard (there's even higher-level support for it via the standard library `compileall` module!).
    More strikingly, though, uv's cache actually caches the unpacked files from a wheel. It doesn't have to unzip anything; it just hard-links the files. Pip's cache, on the other hand, is really an HTTPS cache; it basically simulates an Internet connection locally, "downloading" a wheel by copying (the cached artifact has a few bytes of metadata prepended) and unpacking it anew. And the files are organized and named according to a hash of the original URL, so you can't even trivially reach in there and directly grab a wheel. I guess this setup is a little better for code reuse given that it was originally designed without caching and with the assumption of always downloading from PyPI. But it's worse for, like, everything else.
  - philipallstar 3 hours ago ago
    Yet more proof that confirmation bias exists.
  - paulddraper 4 hours ago ago
    Find something better than Django.
    Or matplotlib.
    Or PyTorch.
    [-]
    - maleldil 4 hours ago ago
      PyTorch is a C++ project with a Python wrapper.
    - aeve890 3 hours ago ago
      >Find something better than Django
      Rails. QED.
      [-]
      - paulddraper an hour ago ago
        Rails is a good project no doubt.
        Django comes batteries included for basic apps, including an admin.
- scrollaway 5 hours ago ago
  Can someone explain why UV is so praised when Poetry achieved a lot of that good several years earlier? Maybe I missed the train but I’ve been using poetry since its first version and all the benefits people praise UV for have long been in my builds.
  [-]
  - mixmastamyk 5 hours ago ago
    Ruff was the gateway drug for me, much better than the black/isort/etc combo.
    Led me to try uv, which fixed a couple of egregious bugs in pip. Add speed and it’s a no brainer.
    I don’t think poetry has these advantages, and heard about bugs early on. Is that completely fair? Probably not. But it’s obvious astral tools have funding and a competent team.
  - Game_Ender 5 hours ago ago
    Speed and simplicity. Now I can fetch one binary on a system and in seconds fetch everything needed to run a Python tool or work on a code base.
    I can do all that without having to even worry about virtual ends, or Python versions too.
  - zahlman an hour ago ago
    Poetry was in some senses before its time; people were frustrated with pip, but didn't really understand the problems they were encountering, meanwhile the ecosystem had started trying to move away from doing everything with Setuptools for everyone. There's a ton that I'd love to explain here about the history of pyproject.toml etc. but the point is that Poetry had its own idea about what it would mean to be an all-in-one tool... and so did everyone else. Meanwhile, the main packaging people had been designing standards with the expectation of making a UNIX-style tool ecosystem work.
    Everyone seems to like uv's answer better, but I'm still a believer in composable toolchains, since I've already been using those forever. I actually was an early Poetry adopter for a few reasons. In particular, I would have been fine sticking with Setuptools for building projects if it had supported PEPS 517/518/621 promptly. 621 came later but Poetry's workaround was nicer than Setuptools' to me. And it was easier to use with the new pyproject.toml setup, and I really wanted to get away from the expectation of using setup.py even for pure-Python projects.
    But that was really it. The main selling point point of Poetry was (and is) that they offered a lockfile and dependency resolution, but these weren't really things I personally needed. So there was nothing really to outweigh the downsides:
    * The way Poetry does the actual installation is, as far as I can tell, not much different from what pip does. And there are a ton of problems with that model.
    * The early days of Poetry were very inconsistent in terms of installation and upgrade procedures. There was at least once that it seemed that the only thing that would work was a complete manual uninstall and reinstall, and I had to do research to figure out what I had to remove for the uninstallation as there was nothing provided to automate that.
    * In the end, Poetry didn't have PEP 621 support for about four years (https://github.com/python-poetry/roadmap/issues/3 ; the OP was already almost a year after PEP acceptance in https://discuss.python.org/t/_/5472/109); there was this whole thing about how you were supposed to use pyproject.toml to describe the basic metadata of your project for packaging purposes, but if you used Poetry then you used Masonry to build, and that meant using a whole separate metadata configuration. Setuptools was slow in getting PEP 621 support off the ground (and really, PEP 621 itself was slow! It's hard to justify expecting anyone to edit pyproject.toml manually without PEP 621!), but Poetry was far slower still. I had already long given up on it at that point.
    So for me, Poetry was basically there to provide Masonry, and Masonry was still sub-par. I was still creating venvs manually, using `twine` to upload to PyPI etc. because that's just how I think. Writing something like `poetry shell` (or `uv run`) makes about as much sense to me as `git run-unit-tests` would.
JimDabell 8 hours ago ago
Unfortunately:
> Some of uv's functionality cannot be expressed in the pylock.toml format; as such, uv will continue to use the uv.lock format within the project interface.
> However, uv supports pylock.toml as an export target and in the uv pip CLI.
— https://docs.astral.sh/uv/concepts/projects/layout/#the-lock...
[-]
- testdelacc1 5 hours ago ago
  That is pretty unfortunate. Would have been cool if all of them could have used the same file.
  Of course there must have been really good reasons for this decision. I hope no one will hold it against the maintainers of any of the projects. Especially because it looks like it is easy to move between the two lock files.
- sirodoht 4 hours ago ago
  Which functionality of uv's cannot be expressed in the pylock.toml format?
  [-]
  - maleldil 4 hours ago ago
    More information here: https://github.com/astral-sh/uv/issues/12584
    > The biggest limitation is that there's no support for arbitrary entrypoints to the graph, because pylock.toml includes a fixed marker for each package entry rather than recording a graph of dependencies.
miiiiiike 8 hours ago ago
I've exited countless Python/Django threads discussing future plans.
Year -1: The community has a problem.
Year 0: Proposal to fix the problem.
Year 1: A small but vocal subset of the Python/Django community pops up in every thread: "It's not actually a problem." or "It's not an issue that my project would ever encounter so limited resources shouldn't be expended on it."
Year 2: People are choosing other solutions because Python/Django isn't addressing the problem.
Year 3: We'll form a committee. The committee doesn't think it's a problem.
Year 4: The community has a problem. Fine. Why doesn't the community write a Python Enhancement Proposal/Django Enhancement Proposal (PEP/DEP)?
Years 5-10: PEP/DEP ignored.
Year 11: The community has a problem. PEP/DEP implemented and released.
Year 12-22: Major packages refuse to support the change.
Year 23: Last version not supporting the change deprecated.
Year 23+1 day: Fork of last deprecated version Python not supporting change created and released.
I have 15 years of code in Python still running but spend a little more than 50% of my time in other stacks. I don't notice as many people arguing against basic features, like a REST API package in Django, in non-Python/Django communities. The precursor to a Django REST API package, content negotiation, has been a draft DEP since early 2014 (https://github.com/django/deps/blob/main/draft/content-negot...). That's going on 12 years of stalled progress for a feature that needed to be released in 2010.
With Python/Django you learn not to wait because nothing is going to change.
And yes, Python/Django are open source. And yes agin, I donate over $1,000/year to support F/OSS projects that I depend on.
[-]
- sph 6 hours ago ago
  My uninformed impression of the Python steering committee has always been like the C/C++ one. Ponderously bureaucratic, trying to find solutions that work for every competing interest, by the time they get to an agreement the real world has already moved on and solved things in their own way, which makes fragmentation and intercompatibility worse.
  I know that Guido isn’t around any more, but this is what a BDFL is useful for. To have the final say when, inevitably, the squabbling parties are unable to find a compromise.
pjmlp 9 hours ago ago
No worries, it is taking longer to be able to rely on C++ modules for portable code, having Valhala available on JVM, Android supporting anything beyond Java 17, or a JIT in CPython, some things take their time to finally become widespread, for various kinds of reasons.
riedel an hour ago ago
Interesting to see the seemingly canonical meaning of lockfile (semaphore vs package version lock) change over the years. I at least was curious, how one could specify a format for typical empty files.
truelson 7 hours ago ago
So… what is a good example of a consensus driven culture on something popular with a lot of opinions, some legacy use cases, that can get these things done quickly?
This is a systems problem. Successful examples wanted.
[-]
- gurjeet 7 hours ago ago
  Postgres community is one example to look at, that I can think of. Linux may be other, but I'm not intimately aware of its inner workings.
  https://www.postgresql.org/community/ is a good start to get a feel of all things related to Postgres community.
- miiiiiike 7 hours ago ago
  Angular turned things around quickly. Corporate sure, but if you know anything about how Angular is used within Google that was a massive job.
- mkoubaa 6 hours ago ago
  You need a consensus driven culture with a final decision maker when that culture fails to reach a decision.
  [-]
  - testdelacc1 5 hours ago ago
    In the general case what you say is true. But look at the specific example of PEP 751 (the thread we’re in). Normally you’d designate some person who has to gain consensus and add the feature to python or to the standard library. Even if everyone isn’t on board, they’ll get on board when they upgrade python.
    PEP 751 isn’t a python feature, it’s a feature that will be implemented by 3 projects - PDM, pip and uv. Consensus isn’t optional or nice to have, it’s necessary. If any of the 3 maintainers felt their needs weren’t met they wouldn’t have implemented it.
    Some projects wait too long for consensus because they prioritise not rushing into a suboptimal solution or hurting people’s feelings. Sometimes it’s ok to just go ahead and implement something. Pep 751 is not one of those projects.
flowerthoughts 9 hours ago ago
> A lock file is meant to record all the dependencies your code needs to work along with how to install those dependencies.
It's about dependency locking in Python packaging.
charcircuit 9 hours ago ago
The post didn't answer why it took over 4 years.
Why couldn't everyone be flied to the same place and have it all figured out in a week instead of having the process drag on for years?
[-]
- quotemstr 9 hours ago ago
  An endless multiplication of veto points in a consensus culture is a failure mode. Funny thing is people embedded in such a culture will see its slowness as a feature and sneer at the world as it leaves them behind.
  Python was better with a BDFL.
  The best thing that could happen to Python right now would be for someone to fork it. Maybe just have Astral run away with the whole language. This lock file format should have taken a weekend, not four damn years.
  [-]
  - zahlman an hour ago ago
    > Python was better with a BDFL.
    The problem is that the BDFL also didn't want to think about packaging, certainly not the issues that inspired Conda.
    I wouldn't mind seeing a PyPy-like fork done in Rust. Maybe take the opportunity to redesign the standard library, too.
  - vinnymac 8 hours ago ago
    A governance model where “time-based decision thresholds” directly damaged veto power as time passes could be used to harness this sort of BDFL-only power in a consensus culture.
blindriver 6 hours ago ago
This is why having a benevolent dictator sometimes results in better progress than committees. It’s a double edged sword obviously if the dictator has limited skill but having someone like a Steve Jobs or Linus clears the way for progress when things like “consensus” causes decisions to take years or die from inertia. I’ve seen this first hand at FAANGs where bureaucracy kills great ideas because bureaucrats in key areas don’t want to lift a finger to make changes.
The big counter example is c++ which I feel is too productive and should slow down their decisions by a factor of 3.
eduction 7 hours ago ago
Why did we have to call them “lock files?” There is an existing thing known as a lock file for actual file locking.
Call them literally anything else. Freeze file, version spec, dependency pin…
There really are only two hard problems in computer science, as the saying goes. Cache invalidation and epithet manufacturing (cough).
[-]
- zahlman an hour ago ago
  Python is following the precedent of many other "language ecosystems" here.
- arccy 6 hours ago ago
  we should have called the other thing mutex files
- PaulHoule 6 hours ago ago
  Locks don't actually work in POSIX in real life anyway.
  [-]
  - procaryote 6 hours ago ago
    doesn't opening a file with O_CREAT|O_EXCL work in posix?
    [-]
    - wpollock 5 hours ago ago
      POSIX only guarantees advisory locks; mandatory locks are an optional feature and are not supported in Linux. See for example <https://stackoverflow.com/questions/77931997/linux-mandatory...>.
eqvinox 9 hours ago ago
I was hoping part of this delay was due to people arguing lock files are poor engineering to begin with, but alas, no mention of that. I guess we've just given up on any kind of package version flexibility.
[-]
- Diggsey 8 hours ago ago
  That would be because package version flexibility is an entirely orthogonal concept to lock files, and to conflate them shows a lack of understanding.
  pyproject.toml describes the supported dependency versions. Those dependencies are then resolved to some specific versions, and the output of that resolution is the lock file. This allows someone else to install the same dependencies in a reproducible way. It doesn't prevent someone resolving pyproject.toml to a different set of dependency versions.
  If you are building a library, downstream users of your library won't use your lockfile. Lockfiles can still be useful for a library: one can use multiple lockfiles to try to validate its dependency specifications. For example you might generate a lockfile using minimum-supported-versions of all dependencies and then run your test suite against that, in addition to running the test suite against the default set of resolved dependencies.
- lexicality 8 hours ago ago
  > I guess we've just given up on any kind of package version flexibility.
  Presumably because decades of experience has demonstrated that humans are extremely bad at maintaining compatibility between releases and dealing with fallout from badly specified package versions is probably second only to NULL in terms of engineering time wasted?
  Or possibly it's just because a lot of the Python ecosystem doesn't even try and follow semver and you have no guarantee that any two versions are compatible with each other without checking the changelog and sacrificing a small chicken...
  [-]
  - BugsJustFindMe 8 hours ago ago
    > Or possibly it's just because a lot of the Python ecosystem doesn't even try and follow semver
    Even if they try, semver can only ever be a suggestion of the possibility of compatibility at best because people are not oracles and they misjudge the effects of changes all the time.
    [-]
    - 9dev 8 hours ago ago
      The PHP ecosystem is almost universally semver and has been going strong for years now, without any major accidental breaking change related outages.
      A little discipline and commitment to backwards compatibility and it isn’t too hard, really?
      [-]
      - withinboredom 7 hours ago ago
        PHP compatibility and its commitment (across the ecosystem) to backwards compatibility is actually pretty cool. If there is one thing PHP does right, it’s this.
    - quotemstr 8 hours ago ago
      One opinion that gets me flamed all the things is this: I hate semver. Just use linear version numbers. Incompatibility? That's a new package with a new name.
      [-]
      - 9dev 8 hours ago ago
        In ecosystems with hundreds of dependencies, that requires you to review the changelog and source code of all of them all the time, especially if you want to avoid security issues fixed in more recent versions. I’d rather have a clear indication of something I need to take care of (a new major version), something new I might benefit from (a new minor version), or something that improved the library in some way (a new patch version). That alone slices required effort on my part down considerably.
        [-]
        sgarland 6 hours ago ago
        That would be ideal, but in practice, SemVer is frequently broken with technicalities or outright disregard for standards. Look at K8s: still on v1, but there have been countless breaking changes. Their argument is that the core application hasn’t changed its API, it’s all of the parts that go into it to make it useful (Ingress, Service, etc.) that have had breaking changes. This is an absurd argument IMO.
        Then there’s Python - which I dearly love - that uses something that looks exactly like SemVer, but isn’t. This is often confusing for newcomers, especially because most libraries use SemVer.
        [-]
        9dev 3 hours ago ago
        If a cure was effective on 99.9999% of all patients, would you say that it’s frequently failing? Millions of projects use SemVer just fine.
        Singling out two behemoths designed by committee to demonstrate SemVer's shortcomings seems misleading.
        michaelcampbell 5 hours ago ago
        > Look at K8s
        Maybe I'm being too pedantic here, but semver for applications is always going to be broken and driven by marketing. SemVer, for my money, is only applicable realistically to libraries.
        quotemstr 8 hours ago ago
        What do you mean, "have to take care of something"? You don't have to upgrade to a new major version. The problem with major versions is that they make it too easy to break other people are cause work for them.
        [-]
        9dev 8 hours ago ago
        Software is churn. Sticking to outdated versions for too long, the rest of the world evolves without you, until other things will start breaking. For example because a new dependency A you need depends on another package B you already have, but A needs a newer version of B than you use. At that point, you have a huge undertaking ahead of you that blocks productivity and comes with a lot of risk of inadvertently breaking something.
        Whether someone else or I am the problem doesn’t matter to my customers at the end of the day, if I’m unable to ship a feature I’m at fault.
        withinboredom 7 hours ago ago
        Sometimes you do have to upgrade. We were using a package that was two years old and the Google APIs it called were renamed one day. I’m sure there was an announcement or something to give us warning, but for whatever reason, we didn’t get them. So that day, everything came crashing to a halt. We spent the afternoon upgrading and then we were done.
        To say that you don’t have to upgrade is true, but it always comes at a price.
        Uvix 8 hours ago ago
        I have to upgrade if I want security fixes. Even if they patch old majors for a time, that’s not perpetual.
      - c0balt 8 hours ago ago
        > That's a new package with a new name.
        Well, yeah, it's reasonable people flame you there. What is the difference between,
        - zlib-1 v 23
        - zlib 1.2.3
        except that automatic updates and correlation are harder for the first approach. It also will likely make typosquatting so much more fun and require namespacing at the very least (to avoid someone publishing, e. G., zlib-3 when official projects only cover zlib-{1,2}.
        [-]
        quotemstr 8 hours ago ago
        I can already typosquat a "zlib2", so what's the difference?
        [-]
        c0balt 7 hours ago ago
        Sure, but when bumping an existing zlib from 1 -> 2, you would increase the version number (in a package manager) instead of removing & adding separate dependencies.
      - eqvinox 8 hours ago ago
        At least that'll allow you to install both in parallel. Which is an absolutely essential requirement IMHO, and there not being a solution for this for semver'd Python packages is a root cause of all this I'd say.
  - eqvinox 8 hours ago ago
    That's why other spaces have machine tools for this. There seems to be an overall drift in Python to use more type annotations anyway; making a tool that compares 2 versions of a package isn't rocket science (maybe it already exists?)
    Literally everyone posting here is using a system built on compatible interfaces. Stable DLLs on Windows, dylib framework versions on OSX, ELF SOversioning on Linux.
    It's clearly not impossible, just a question of priorities and effort, and that makes it a policy decision to do it or not. And I'll lean towards we've been shifting that policy too far in the wrong direction.
    [-]
    - lexicality 8 hours ago ago
      The only reason DLLs are stable on Windows is that every application now ships all the DLLs they need to avoid the DLL Hell caused by this exact thing not working.
      I look forward to you demonstrating your tool that can check if two python packages are compatible with each other, maybe you can solve the halting problem when you're done with that?
      [-]
      - eqvinox 7 hours ago ago
        I'm not a Windows developer, you'll have to excuse my ignorance on that. Pretty sure you're not shipping user32.dll with your applications though.
        Also, I didn't claim any tool would give a perfect answer on compatibility. They don't for ELF libraries either, they just catch most problems, especially accidental ones. The goal is 99.9%, not 100%. Just being unable to solve a problem perfectly doesn't mean you should give up without trying.
        [-]
        TingPing 6 hours ago ago
        Windows kept system libraries stable and modern software does a lot to avoid them with abstractions on top because they are unpleasant to use.
        You could call that success but I think it’s just an extra layer of cruft.
        lexicality 6 hours ago ago
        > I'm not a Windows developer, you'll have to excuse my ignorance on that. Pretty sure you're not shipping user32.dll with your applications though.
        Microsoft's famously terrifyingly large amount of engineering work that goes into maintaining backwards compatibility to allow you to run Windows 3.1 software on Windows 11 is certainly impressive, but maybe also is the exception that proves the rule.
        > Just being unable to solve a problem perfectly doesn't mean you should give up without trying.
        Currently no one can solve that problem at all, let alone imperfectly. If you can, I'd gladly sponsor your project, since it would make my life a lot easier.
    - noitpmeder 7 hours ago ago
      I think it'd be near impossible to guarantee API compatibility, regardless of type hinting. E.g. if a function returns a list, in a new version I can add/remove items from that list such that it's a breaking change to users, without any API compatibility issues
- Uvix 8 hours ago ago
  Lock files are about what version the application needs installed, not what a library depends on. They don’t prevent package version flexibility.