amprolla

devuan's apt repo merger
git clone git://parazyd.cf/amprolla.git
Log | Files | Refs | README.md | LICENSE

commit 1d9670ade4cc7c28dfd1c6de9bc14ca099be0c9d
parent 0454dba27c9b281b9eaca4b75184a8bc1f54cf15
Author: parazyd <parazyd@dyne.org>
Date:   Mon,  5 Jun 2017 21:47:59 +0200

add readme; remove obsoleteness

Diffstat:
README.md | 23+++++++++++++++++++++++
doc/dan-notes | 109-------------------------------------------------------------------------------
orchestrate.py | 6++----
3 files changed, 25 insertions(+), 113 deletions(-)

diff --git a/README.md b/README.md @@ -0,0 +1,23 @@ +amprolla +======== + +amprolla is an apt repository merger originally intended for use with +the [Devuan](https://devuan.org) infrastructure. This version is the +third iteration of the software. The original version of amprolla was +not performing well in terms of speed, and the second version was never +finished - therefore this version has emerged. + +Dependencies +------------ + +### Devuan + +``` +gnupg2 python3-requests, python3-gnupg +``` + +### Gentoo: + +``` +app-crypt/gnupg dev-python/requests dev-python/python-gnupg +``` diff --git a/doc/dan-notes b/doc/dan-notes @@ -1,109 +0,0 @@ -Ok... so the debian repo is essentially a directory heirarchy... - -Ok.. Do you understand the repo heirarchy? ie the main folder (in -amprolla case /merged) with sub folders 'dist' (for repo metadata) and -'pool' (where the actual binary and source packages go)?? -forget about the "pool" folder, amprolla doesn't touch it... - -in "dists/" you have all the suites ie: jessie, ascii, ceres and all -the and stable, unstable and version symlinks. - -in the suite folder, you find the section folders: main contrib non-free -and files InRelease, Release and Release.gpg - -InRelease is just the pgp/smime version of the Release file - the gpg -sig is the same as Release.gpg - -Anyway the Release file basically is a dictionary of most of the files -in the subdirectory with size and checksums (SHA256, SHA512 etc) in what -is essentially RFC822 format, with a bunch of headers at the top that -specify details about the Release of that suite. - -In the suite subdirectories you have a bunch of folders, binary-<arch> -which contains the Packages file, and compressed copies of that, and a -Release Stanza, and similar for the source folder with Sources file and -compressed copies etc. - -the Contents files (currently not processed) are their too. -(They contain a list of all the files in each package) - -their is also the i8n - folder which contains the processed files. -oops s/processed files/translation files/ - - -Amprolla takes several mirrors and merges them in order of priority -starting with the highest priority. It firsts iterates over the structure -to create it's repo structure, ie dists/<suite>/<section>/ etc and then first -copies the highest priority mirror Packages and Sources files in and then for -the othermirrors iterates over the Packages and Sources files and compares -each package stanza for a match, and if there is a match on name then the highest -priority mirror version is kept, if not then the package is added in. -(This is where the inefficient model really shows up) - - -After all the new Source and Packages files are processed then the Release and -InRelease files are generated by walking the hierarchy and adding those files in. - -There is a lot of complexities, part of which is in the design of amprolla. -What I had started to do, and in describing it now, it seems obvious to me -I should probably have started pretty much from scratch is instead of this -iterative approach of compare and add or skip is keep a cache of each mirrors -last state, and then on each run create a delta between the last state and -current state. - - -* and how does dak integrate in all of this? -it doesn't. Dak is a standalone repository which just deals with the packages built by our CI -* so it's the same as any debian repo -Yup, slightly modified to handle our CI and some other tweaks -and I checked and our version is in gdo too. - - -anyway as I was saying about my approach re delta's: -There are big efficiencies in this approach. For starters, we only download the InRelease or -Release and Release.gpg file and after verifying it, compare to the previous state, and we -can use the delta generated to pick what files are new, changed or removed from the repo. -This means we only download the changed files in the repo for a start. And for the -Packages and Sources files we create a delta list of changed stanza's to apply. - -Instead of building the entire repo from scratch, we apply the delta -to a copy of our merged repo with handling for priority etc... - -What stumped me in the end is we actually should verify that we only have packages go in that -have a matching source stanza and we really need to process the contents and translations -at the same time. - -I suspect that nextime realised this which is why he started on amprolla2 which essentially -replicates dak + amprolla function... - -I just realised, I forgot to mention the overrides processing in amprolla. In the very -top of the dir in "merged/" is the "indices" folder that contains overrides. These -files specify for each Packages files, any metadata changes that need to be applied to -package stanza's - -In debian their is a entry for every single deb package/source in the archive making -them very large. We did away with that to reduce the overhead of processing it created. - -So we only have entries for those that need changing, usually to change priorities of -systemd packages and remove recommends and suggests for systemd related packages. - -* are indices a part of the repo or only needed by amprolla? -both. In debian, dak generates them and they are hand modified by the repo masters to -apply needed fixes. With amprolla, we only create them for applying our own changes as needed. -Technically they don't need to be in the repo, as they're not used by apt, but practically -it's good to have them there. - -hmmm, I think I've cracked my problem... -If I use the Sources delta to identify changed packages, I can use that to pick and apply -the changed Packages stanza's Contents and Translations. This would save lot's of -iterations, and I only need the delta Processing to be done on the Sources files. -Wow that would really speed things up - -The other benefit, is we can side load packages this way too and use it to replace dak -as well as either a standalone repo or directly into the merged repo. -And all without a hefty database. or the writeup - -your welcome. It has helped me probably as much as you. I think it's -turning into a full rewrite, but seems better design and possibly far easier to -write from scratch. -Anyway, it's nearly 3:30am here, so better get a couple hours sleep! diff --git a/orchestrate.py b/orchestrate.py @@ -2,7 +2,7 @@ # see LICENSE file for copyright and license details """ -Module used to orchestrace the entire amprolla merge +Module used to orchestrate the entire amprolla merge """ from os.path import join @@ -12,8 +12,6 @@ from lib.config import (arches, categories, suites, mergedir, mergesubdir, pkgfiles, srcfiles, spooldir, repos) from lib.release import write_release -# from pprint import pprint - def do_merge(): """ @@ -33,7 +31,7 @@ def do_merge(): am = __import__('amprolla_merge') - p = Pool(4) + p = Pool(4) # Set it to the number of CPUs you want to use p.map(am.main, pkg)