Open to Collab

18 4 22

AbstractPhila PRO

AbstractPhil

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

updated a model about 3 hours ago

AbstractPhil/geolip-svae-h2-64

repliedto their post about 3 hours ago

Ever see a 1024x1024 3 channel a little over 1m param noise classifier? This is one. This is phase 1 of the omega experiments and it's successful on a very high-accuracy selectivity level through statistics aggregation and pooling via... a tiny MLP attached to the battery array. SVAE don't care what resolution you use. They never had that concern, they are solvers that fly through solutions. Perfect for math solutions of many formats and many structures, exactly what I need for the next stages. https://huggingface.co/AbstractPhil/geolip-svae-h2-64/blob/main/experiment_2/cell_j_output.txt Currently the primary use case for tests is noise format identification. There are multiple experiments to go before a full nth classification system is ready, however as it stands the only stopping point is training batteries now. They mostly train within about 10 million samples of tiny data so they will fly out hundreds a day if I find purposes for them. Also I trained too many gaussian-related batteries, so there's really only about 50-100 or so batteries useful in the 192 array I set up. There's really only 64 batteries trained total but there are multiple epochs involved. Now that there is a 57k parameter variation that converges on 16 variants of random noise like Johanna and Freckles before, you ask this model questions differently. You check the MSE to train downstream models, so if your array isn't conclusively working it won't work just yet. It's not perfect yet, but it's improving daily. A bad battery in the mix can be replaced at runtime. ``` ============================================================================== PHASE J VERDICT ============================================================================== Subset: 18 batteries, 1,029,870 params (vs 10.9M for full array) Resolution A (summary) B (attn-pool) 256 96.6% 93.1% 512 95.4% 92.0% 1024 95.4% 95.4% ```

updated a model about 4 hours ago

AbstractPhil/geolip-svae-implicit-solver-experiments

View all activity

Organizations

replied to their post about 3 hours ago

Alright the reusable module is being prepared exactly to the math specifications. Any omega-grade battery will have projective differentiated sampling now utilizing rigid axial geometric structures.

This is a massive win. It means the omega shapes retain full structural behavioral potential along with the spherical alignment.

This also means that individual batteries will be considerably more dependable as inference agents and the full battery array will have more robust information from the batteries per inference due to the guaranteed confines of the battery axial rotational limitations.

In other words, they can't go outside of certain ranges because the model size does not allow them, there's no space to solve outside of those ranges so the outliers are the rarity rather than the normality. This behavior may not work with larger structures with more geometric room to explore, as those batteries may have a more monotone sampling for the axial polygons, however that may not be the case and I will be testing the johanna, fresnel, and freckles models for this behavior soon.

The module is ready, it's now directly attached to the geolip battery array sampling.

replied to their post about 8 hours ago

I have found a variation with 14/16 antipodal pairs, which implies there is a stable attractor that can be cross-interpolated to rigid point attractor space, not just spherical. I'm working out the logistics now in order to attempt to curate a direct geometric spherical linkage to the more traditional linear rigid structures that I've been building for 2 years now.

replied to their post about 8 hours ago

replied to their post about 9 hours ago

New format of micro-solver discovered, I've dubbed this variation the P-class battery.
I am probing them currently. D=3 is their current format size and they are smaller than the D=4 variations with a different inner shape, my hunch is polynomial shaped spherical responses, but the probes will say.

This is an antipodal structure, this may be a proper learning omega gate shape. More tests needed.

The geometric readout shows a direct causal response similarity to a Gyroscope behavior. The model found a home using a divergent pathology and offset alignment, the guarantee is baked into the architecture as a guaranteed outcome. This is another unknown shape omega that autosolves in a deviant pathology than simple sphere.

replied to their post about 14 hours ago

Target: The smallest sphere omega.

57k params is small, but how small can we truly make a noise finder that produces a fair MSE on a specific type of data, that doesn't have good MSE with another type of data, and how effectively quick can we make this structure scale upward while still functioning?

These are questions of the day. I believe I can find a much smaller sphere solver and potentially another boundary of trajectory passage, possibly allowing autosolving routed differentiation based on precalculated solvers, which should allow highly complex formulas to autosolve using linear through a properly formatted array.

This concept says it's possible, so it's worth a shot. Producing autosolver arrays is the difference between 500m flops and 50k flops.

posted an update 2 days ago

Post

Ever see a 1024x1024 3 channel a little over 1m param noise classifier? This is one. This is phase 1 of the omega experiments and it's successful on a very high-accuracy selectivity level through statistics aggregation and pooling via... a tiny MLP attached to the battery array.

SVAE don't care what resolution you use. They never had that concern, they are solvers that fly through solutions. Perfect for math solutions of many formats and many structures, exactly what I need for the next stages.
AbstractPhil/geolip-svae-h2-64

Currently the primary use case for tests is noise format identification. There are multiple experiments to go before a full nth classification system is ready, however as it stands the only stopping point is training batteries now. They mostly train within about 10 million samples of tiny data so they will fly out hundreds a day if I find purposes for them.

Also I trained too many gaussian-related batteries, so there's really only about 50-100 or so batteries useful in the 192 array I set up. There's really only 64 batteries trained total but there are multiple epochs involved.

Now that there is a 57k parameter variation that converges on 16 variants of random noise like Johanna and Freckles before, you ask this model questions differently. You check the MSE to train downstream models, so if your array isn't conclusively working it won't work just yet.
It's not perfect yet, but it's improving daily.

A bad battery in the mix can be replaced at runtime.

==============================================================================
PHASE J VERDICT
==============================================================================
Subset: 18 batteries, 1,029,870 params (vs 10.9M for full array)

Resolution       A (summary)   B (attn-pool)
256                   96.6%          93.1%
512                   95.4%          92.0%
1024                  95.4%          95.4%

5 replies

replied to their post 8 days ago

First note, there is no degeneracy in this cell now. As per hundreds of bulk tests with many readouts, the degeneracy is swept up in the SVD kernel, the fl_gram eigh svd, the FLEigh structure, or any of the subsequent catches that the pytorch handles.

The degeneracy problem is solved, and with that introduced a massive amount of new problems. Problems that I have built prototypes to address; each core problem has been narrowed down to three core components as solutions for information movement.

S^N sequential
Scattered S^N * D for orthogonal clustering
S * D + D * D for structural cohesive memory annealing

This comes down to three important utilities that many core structures depend on.

Sequence, distance, cosine similarity, QKV support, rotary support, and more.

Sequential structural cohesion; LLM, tokens, next token prediction, spearman, and so on.
Behavioral attenuated implicit; ViT, Resnets, diffusers, etc
Geometric alignment structure; Distillation, transfer learning, teacher/student, genetic inheritance, generational learning, SVAE, geolip prototypes, and constellations.

Third is least useful to the out of scope, first two are very useful so they are my predominant focus here.

I have 14 potential prototypes and I will be forming a notebook for each, testing the robustness, the positives, the negatives, the storage and recall capacity, the magnitude standardization vs normalization accuracy, the flow matched directional EMA vs non-EMA, the structural supported ensemble approach vs the residual approach, and a few other elemental substructures.

The biggest tradeoffs will be between normalization clipping and standardization unit structured tokens. These are inherently entirely different expectations and produce entirely different opinions.

Each of these experiments will be fully documented, the subsequent models included in the notebook sections, and the notebooks represented in the cell repo.

The Cell is a fickle beast, but I believe I have tamed the monster. The battery will be substantially stronger with the new cell upgrades, as the battery includes multiple constellation elements such as FILM solidification, normalization at curative points rather than destructive, and a few other elements to assist with producing tokenizations such as direct Conv support and huggingface transformer capacity for the MOE substructures.

As it stands, the transformer tokens here are represented simply as [b, S, D, V] also [b, S, U, Vt], and they have direct embedding tokenization potentials on many structures, but not all structures. There are multiple deviant structures that suffer from certain rules that require additional solutions before those work.

The prototypes may not exactly reflect this shape, and the shape may change for packaging and reuse purposes so bare with it for now. I'm only one person and I'm heavily relying on Claude to handle many of the logistics. I can code all of this, it just takes a lot longer for me to do manually so I'm basically on NO GELU HERE - NO NORMS HERE - NO PROJECTION HERE duty. I'm basically babysitting Claude so the code is correct and making sure the tests come out as they are supposed to.

replied to their post 8 days ago

replied to their post 10 days ago

Original Question:

Can the cells utilize positional encoding patches from the triton d=2 decompositions? I'm thinking maybe, and it's worth a shot.

Update:

They can, and without degenerates when curated correctly. Their usefulness isn't as helpful unless applied to an regression cascade that projects upward to the largest structure and compares rotation. The rotation for the projections are perfect procrustes if handled correctly as per the experimental documentation on the SVD triton kernel.

Original Question Assessment:

If the degenerates can converge in-model implicitly, this changes the game entirely. The SVD cell can handle ensuring the preservation, the reconstruction of the cells have no rival. Now... lets see if this holds up over huge dimensional spaces, or if the models simply... shatter with the thin triton 500x speed.

Getting access to that 500x speed is beyond reproach in terms of speed, there is no comparison.

Update:

The model can benefit from the 5000x speed, yes 5000 not 500, and it can be useful - however there are stipulations that require multiple uses so the gains are not as useful as I'd hoped. You need to run many more of them to get a useful informational cluster, otherwise you just end up norming everything to death down the line without magnitudes as shown by the cascade tests.
There NEEDS to be an fp64 triton kernel and I'll work that out today if it's even possible to run fp64 through triton in this fashion. The fp32 is showing serious problems with rounding faults and it needs to be addressed with fp64.
The D=2 models operate around 8x faster, at a much lower accuracy overall. Without the high fidelity access to the FLEigh or gram eigh SVD structure, the model simply does not have the necessary matmul accuracy to represent the outcome.
The D=2 magnitudes are useful only for D=2, if you project the rotation upward to the higher D you lose the magnitude directional accuracy as per the tests and documentation. This means D=2 when ran 32 times is still less accurate than a single D=8 in magnitude-sense, even when cross-correlation is used to determine the most likely magnitude.
This behavior is ideal when distilling a model's signals into another, not as useful when forming a proper embedding encoder utility chain. The encoder needs to be fairly stable, so you need to make sure the model is capably learning the encoding spectrum, and each subsequent encoder chain down the line sees the same structural system and the residual opinions of the last - otherwise the encodings are simply lost. I've mentioned it before, residuals are lossless in this regime, and with that the lossless behavior is essentially explained as rigidity and difficult to differentiate strictness. This when correctly aligned is a powerful implicit structural controller, and explicitly a nightmare to tune into something that isn't just ON OR OFF gating.

There are no more paved roads here... It's time to chart some jungle.

I've baked in CM config controllers into the head of the spectral cell. This will allow the CM to be crutched heavily, letting the model legitimately diverge and drift into impossible terrain and still maintain order - catching everything invalid as it cuts through.

replied to their post 10 days ago

replied to their post 11 days ago

======================================================================
  COMPLETE
======================================================================
  Best val acc: 93.8%
  Time: 979s (8.2s/epoch)
  Conv: 4,251,200  Cells: 366,176  Head: 167,946  Total: 4,785,322

  Comparison:
    SpectralCell standalone (D=16 V=16 h=256 +conv +aug): 79.1%  926K  1.2s/ep
    ConduitBattery backbone (GPT trainer, ep55/120):       88.7%   ~2M  ?s/ep
    Conv + SpectralCell inline:                     93.8%  4,785,322  8.2s/ep

replied to their post 11 days ago

By default the transfer learning from these batteries is not going to go be as effective as say raw pixel transfer.

However, you can achieve from a pure noise model nearly 72% accuracy on cifar100 using just the Freckles-256 (256 patches) trained purely on noise with CrossEntropy, Conv, and direct bottleneck ingestion - BEFORE the conduit-svd was introduced.

With conduit-svd the transfer-potential of the transformer will expand this behavior exponentially with QKV, treating the QKV as a uniquely differentiable format - specifically aligned to the geometric battery-state itself.

This is only possible due to the increased accuracy from the geolip.linalg.eigh structure and speed of the geolip.linalg.svd.

Without them, degenerate eigh and SVD cannot form, and the full structural awareness will never coalesce internally. Without enough degenerate EIGH and SVD, the structural basin for the miniature patchwork accuracy will never coalesce into opinions.

Odd, I know, but it's required. Degenerate SVD create a highly difficult to measure void response that I at first tried to patch out, until direct analysis showed CM is definitely preserving the structure - just in an unexpected series of ways. Near-degenerate and degenerate are a predominant structural learning, so when a huge influx of these structural boundaries format into a utilizable shape, the upshoot structure behaves in a uniformly geometric format that can be analyzed.

I didn't expect it either.

By clamping the CM above near degenerate to guarantee non-degenerate volumes, the structure shows that the volumes aren't in fact there most of the time. It's predominantly directions and almost all magnitude is devoid.

posted an update 11 days ago

Post

136

The geolip-svd-transformer is almost ready.

I've spent multiple days preparing the substructure, scaling, testing, and expanding the system. The conduit is meant to reorganize data. Just like the SVAE prototypes, they are meant to sort and organize, not compress and compact.

The organization is almost prepared and almost ready. The resulting structure will produce projection-capable geometric aligned memory, compacted and transformed into a utilizable token set. The remaining structural components are specifically SVD-related utilities, and each of those are utilizing the variant natures of how difficult, how dispersed, and so on each component is as it's learned over time.

The SVAE components were perfect for testing this playground. They appear to be larger when analyzed, however the representation of those are meant to represent huge vocabularies. Patch 16x16 expanded upward to 768 is meant to encapsulate the behavior of near-pi upscaled, condensed into a considerably simpler smaller form.

This model is behaving perfectly. It does not encode in the traditional sense, it analyzes and produces geometric opinions throughout it's structure. Each of them proved one after the other the model could not only learn, but it can perfectly reconstruct, and with that produce utility-driven expansion capacity directly.

Fresnel -> effective image analysis battery.
Johanna -> effective noise analysis
Grandmaster -> Johanna finetuned with sigma restoration using Fresnel's opinions.
Freckles -> massive analysis array for noise (4096 to 16k tks)

Geometric batteries.

Cayley rotation is meant to encapsulate that potential and expand it, allowing further differentiation down the chain of model structural behavioral events.

Suffice it to say, this is the geometric transformer's evolved state. These will exist as conduits throughout the models, the expanded behavioral attenuation units meant to provide geometric analysis internally within models for data-oriented CV alignment.

6 replies

replied to their post 15 days ago

I see a potential answer to the over-compaction SVD problem that enables Omega self solving tokenization, while leaving the system rigid and difficult to learn from or directly manipulate. Everything short of reconstitution of SVD in it's entirety is short of the correct measure. SVD is one of the most powerful compressors not because it compresses, but because it's a series of mathematical conduits meant to streamline mathematics into a point of high-complexity solution.

Each tool is a nearly perfect compaction system and they all form utility-driven token structures that when fully extracted will produce immense amount of information. They are fully functional self-solving Omega structures - however, bigger in these case isn't better. We want SMALLER. Bigger is making the problem larger rather than adding additional resolution, which is what the models need. Additional resolution at higher accuracy.

The core and absolute core problem; the bottleneck cannot be processed by any other device. The formula is far too rigid, the structure deviant, and the internals are unique with MLP-based structures. As aligned as they are, without a conduit for capture it's just a closed circuit. I must open this circuit.

These formulas are NOT fully deterministic in computation. There is error, and it has everything to do with rounding error, iterative decomposition, and curative rejuvenation. Every single one of these processes can be analyzed, learned from, and solved for - but they cannot be replaced directly to achieve SVD's outcome.

This formula originated from an era before computers. The men and women of the time NEEDED legitimate mechanistic and autonomous creation mechanisms to solve large-scale problems and produce utility-based outcome within reasonable amounts of time. Otherwise, they would be there for potentially years trying to calculate ONE thing, when they can spend a day or two plugging away at their special autosolver function that allows the mathematician to debug the problem autonomously.

SVD is to be deconstructed and a new theorem to be created in it's stead, using every single formal mathematical principal required to reconstruct SVD while introducing direct shunts of deviation for learning offset adjudication.

I see, traces of SVD everywhere in AI and the more I learn about SVD the more I see them. The people who forged AI KNEW of the power of SVD. There is no doubt in my mind that I'm on paved road, and yet I see... a path that I believe few have dared to take, and I believe with Claude, GPT, Grokk, and Gemini we CAN solve this problem. It must be organized, adjudicated, understood, and solved accordingly. I believe this is a potential route and I will directly explore it. This is no longer a problem of choosing a formula, this is now a problem of how we use it to encode and decode information.

MSE 0.0000004 imagenet on SVAE-Fresnel-64-t256 says that it is more than worth the time. I CAN'T keep training Omega models that simply crush or replace information, I must discover how to control the solver, instead of just letting it operate.

I have many theories. My first target being - framed adjudicated trajectory resonance flow alignment prediction.

Put simple, how well the structured flow resonates with the structural resonance surrounding it. This will require SVD information.

I will publish my findings based on heavy sweep-driven notebook experimentation.

replied to their post 16 days ago

Introducing Freckles, the 4x4 1024 SVD patch model.

Already defeating the twins at epoch 1.

6x smaller size, 12:1 compression instead of 48:1 but still effective.

replied to their post 17 days ago

Here are the first three surge trained experts. They should encompass almost any need if used correctly.

The image line.

The specific image trained SVAE structures are dubbed;

SVAE-Fresnel

tiny - 64x64
small - 128x128
base - 256x256 <- cooking current MSE=0.000181 -> Operating CV: 0.3769
large - 512x512 <- upcoming
xl - 1024x1024 <-upcoming
xxl - 2048x2048 <-upcoming
giant - 4096x4096 <-upcoming

The initial Fresnel shows the model can reconstruct images far out of scope at entirely different sizes, entirely never seen images can be fully reconstructed within the same spectrum of MSE as the trained images.

Tests show;

the Fresnel models can piecemeal images back together at a higher accuracy and lower error rate than running the full model. Tested up to 1024x1024 with near perfect reconstruction. 0.0000029 MSE

Fresnel CANNOT reconstruct noise directly; 1.0~ MSE

The 256x256 variant is cooking right now. The MSE is dropping rapidly and it's nearly as accurate as the 128x128 counterpart with only partial cooking.

The noise line. the specific noise trained SVAE structures;

SVAE-Johanna

This model is capable of learning and reconstructing noise and this will train a noise compressor that can deconstruct/reconstruct any noise automatically with it.

tiny - 64x64 <-first train faulted, tried 16 types of noise out of the gate, going to restart with curriculum training.
small - 128x128 <-gaussian prototype ready = 0.012 MSE <- back in the oven 16 spectrum noise
small - 128x128 - 16 noise; <- MSE=0.053170 CV=0.4450 -> learning 16 noise types
base - 256x256 <- upcoming
large - 512x512 <- upcoming
xl - 1024x1024 <-upcoming POSSIBLE if large works

Johanna is being trained on 12 types of noise. The MSE is dropping as expected and the noises are in fact being learned and represented to be replicated.

The text line is exactly the same as the others.
-SVAE-Alexandria

Alexandria is meant to encode/decode text in a perfect or near-perfect reconstruction capacity.

posted an update 18 days ago

Post

150

Say hello to surge resonance training. From random init, 1 epoch trained the 128x128 imagenet SVAE with test reconstruction over 99% accurate by epoch 1 to 99.9% accurate by epoch 5.
AbstractPhil/geolip-SVAE

Epoch 1 test recon error 0.0064
Epoch 2 test recon error 0.0022
Epoch 8 is now 0.000294
Epoch 12 is now 0.000206
Epoch 14 is now 0.000190
Epoch 18 is now 0.000187
Epoch 24 is now 0.000117
Epoch 30 landmark 0.000099

There are NO EXPERTS HERE. This is pure self learning. The model learns the entire behavioral set within 1 epoch to reconstruct imagenet's test set to a useful state. By epoch 12 a recon of 0.000202 recall is now measured. This means, 99.99% accuracy at RECONSTRUCTING the test set through the bottleneck, while simultaneously leaving a trail of centerwise extraction as rich or richer.

ONE epoch. Just one.
Took about 10 minutes to train an already converged epoch, and I set it up for 200 epochs. This model will not need 200 epochs. I'd be surprised if it needs 3.
What you're looking at here, is the emergence of surge resonance. The power of a single epoch when the geometric CV alignment hits the tuning fork of absolute resonant perfection and counterpointed with the concerto's dissonant harmonic response.

I give you, surge resonance.

The metrics will be ready by morning and I'll begin building utilities to figure out what went right and what went wrong.

This model is rewarded when it exists within the geometric spectrum while simultaneously dual punished when leaving. There is no benefit to stray, and the benefit to exist within prevents the model from leaving the validated CV band.

This allows the model to exist perfectly within the tuning fork resonance structure.

The model CONTINUES to refine, even when the CV drift has begun to drift away from home. The model has left home and is now seeking new proximity.

Upcoming training will be the 256x256, 512x512, 1024x1024, and larger if the model holds. Each will be named.

3 replies

replied to their post 20 days ago

I see the answer. The behavioral sweep shows CV of 0.29154 between 0.291 and 0.292 are within a very special band of variations.

1024v, 24d - the entire operating spectrum of the T5 series embeddings when alignment differentiated by the configuration. This is effectively a threshold between what works operationally, going beyond this causes degraded behavioral response without attenuated compensation.

So I've managed to finally get the right questions to discover the connection between the fly in the ointment that kept returning, and the structural systems responsible for curating the behavior around it.

replied to their post 20 days ago

Finding: Geometric controlled structures do not require CV loss if the D is within the expected band. To compensate for the dimensional difference with the CV measured, the CV loss must be adjusted to the distillation target.

The vocabulary when established as geometrically valid throughout the lifecycle of the existence. The CV loss is only attuned and useful when running distillation paradigms. The current CV loss has no impact on the CV measured capacity of the embeddings consistent or pretrained.

This effectively allows compartmentalization to any vectorized locality as accumulated throughout a structure, allowing direct

posted an update 21 days ago

Post

157

The geolip-transformer-v8 requires a fundamental rethinking of training a core structure.

I'll make this brief and to the point.

GEOLIP is an observer system at it's core. It watches, triangulates, and assists with correct answers.

Many experiments worked very well, many fell down and turned into a pile of broken circuits. The recent geometric-transformer being one of my biggest fumbles, still taught me many things about what I'm TRULY trying to accomplish here.

**Save money and lives**. Less hardware use for less need at inference. Train more calculations into a more reusable and accurate structure for near instant zero-shot or sequential inference.

In the process v8 unlocked a missing puzzle piece, EMA trajectory alignment compensation. I'm doing my best to build something that works.

The geolip distillation system is very powerful but requires much experimentation still.
* Genetic experiments were successful
* Data transfer experiments successful
* Analysis experiments successful - and expand large model accuracy
* Many distillation experiments were successful.
* The largest successes being the kernels, the distillation tools, and the geometric analysis systems.

With the good comes the bad, the faulty VITs, the simultaneous trains that fault, the internalized confusion that happens occasionally.
*** The observer NEEDS something to OBSERVE. If the observer observes the progressive development of point cloud structures, it learns how to observe THAT LEARNING PROCESS - drifting fault assessment.
*** In the process it DOES NOT learn how to improve the CE relations by embedding and compensating with anchored triangulation opinions.

BIGGEST CONCLUSION. Staged curriculum training.

These components must be DECOUPLED. One must be a compounding structural awareness beacon, the other must be an informationally aligned composition in a utilizable fashion.

This means stage-by-stage freeze/unfreeze processing. Independent task-oriented structural alignment.

2 replies

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity

Original Question:

Update:

Original Question Assessment:

Update: