🤝 Open to Collab

19 6 25

AbstractPhila PRO

AbstractPhil

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

updated a collection about 10 hours ago

Flagships

updated a model about 12 hours ago

AbstractPhil/Qwen3.5-0.8B-json-captioner

published a model about 14 hours ago

AbstractPhil/Qwen3.5-0.8B-json-captioner

View all activity

Organizations

updated a collection about 10 hours ago

Flagships

Collection

My flagship models that actually work or are the best I have capable from a category currently. • 13 items • Updated about 10 hours ago

updated a model about 12 hours ago

AbstractPhil/Qwen3.5-0.8B-json-captioner

Image-Text-to-Text • 0.9B • Updated about 10 hours ago

published a model about 14 hours ago

AbstractPhil/Qwen3.5-0.8B-json-captioner

Image-Text-to-Text • 0.9B • Updated about 10 hours ago

updated a dataset about 15 hours ago

AbstractPhil/diffusion-pretrain-set-ft1

Viewer • Updated about 6 hours ago • 1.29M • 2.8k • 1

liked a model 2 days ago

mnemic/paligemma-longprompt-v1-safetensors

Image-Text-to-Text • 3B • Updated Jan 10 • 4 • 2

updated a model 5 days ago

AbstractPhil/geolip-constellation-aleph

Updated 5 days ago

published a model 5 days ago

AbstractPhil/geolip-constellation-aleph

Updated 5 days ago

posted an update 6 days ago

Post

138

The article for aleph attention routing needs more work on vision, as the vision portion has not been fully validated, while the LM prototype has been semi-validated for small and medium-small scale. I will post my findings in the coming days with the consequences of training an LM and a VIT utilizing the prototype system.

The current structure for the Geometric Vocabulary does nearly reflect the intended shape as discussed in the earlier posts and articles, so that's coming along nicely - but there are stipulations and problems involved that I did not foresee.

My apologies for the incomplete article I just released on a whim. I jumped to the conclusion a bit early in anticipation before the formulas were fully converged. I also released an early post the other day speaking about the prototype AlephLM - which I removed as an invalid conclusion.

I'm doing my best to only release validated empirical information instead of speculative - however I do sometimes jump to conclusions without proper validation from time to time. Occasionally, I get a bit theory-overzealous and require tidying up through thorough experimentation which I'm currently approaching directly.

updated a model 8 days ago

AbstractPhil/geolip-aleph-lm

Text Generation • Updated 8 days ago • 1

published a model 8 days ago

AbstractPhil/geolip-aleph-lm

Text Generation • Updated 8 days ago • 1

updated a model 9 days ago

AbstractPhil/geolip-aleph-void

Feature Extraction • Updated 9 days ago

reacted to OzTianlu's post with 🧠 10 days ago

Post

6296

ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?

Read online: https://datawhalechina.github.io/learning-terrain/

I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).

The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:

ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.

GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.

DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.

KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.

Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.

Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.

The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.

GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2

Convergence is not hope. Convergence is geometry. You see.

1 reply

replied to OzTianlu's post 10 days ago

geolip-aleph-void and the LM aleph routing is implicit recursive infinities confined into a microcosm of forced rebounding finite space forced through a gelu sift - more akin to an emulated quaternion. All because quaternion is computationally heavy and Cantor's fractals are additionally computationally precise (often >fp64 required), requiring an entirely deviant approach to rotary in order to computationally stabilize the system at BF16 so it won't take 2 weeks for a single epoch on a model 35m params.

Makes me feel a little overdressed for the occasion.

posted an update 10 days ago

Post

109

Claude Fable 5 was temp/perma? banned for security reasons.

Working with Fable I have to say the model is capable at handling highly complex geometric mathematics ACTUALLY to the point of me getting some work done without a headache. I hope Fable returns soon so I can finish cobbling without a headache and a week per prototype again.

During Fable's existence I managed to cobble together a multi-series aleph paradigm that can handle direct implicit and explicit learning for an LM with a trigram context window. This essentially provides expert directional utilization based on a stable codebook without requiring expert distillation into singular experts and duplicated.

Details soon. There are over 20 functional formula prototypes and around 8 potential heads that all lead to the same outcome, the math is rock solid - each with their own benefits and downsides based on the assigned text tasks.

updated a dataset 13 days ago

AbstractPhil/diffusion-pretrain-set-ft1-1024

Viewer • Updated 13 days ago • 1.14M • 692

replied to their post 14 days ago

Currently am upscaling everything in my big diffusion pretrain dataset to start training some real structure.
If a couple epochs of that data doesn't activate the model, I'll need to employ a David structure and attempt to teach global attention to a shared battery set.

Simultaneously heavy experimentation on the geolip-aleph-void structure and potential offshoot objectives are being transcribed and curated. There are multiple prototypes based on functional known structures that have potential and among the discoveries today include a stable attention mechanism that can be curated further. This is based off an earlier experiment named cantor fractal routing.

https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/model/layers/attention/cantor_multiheaded_fusion_fp64_v2.py

This system was a badly optimized prototype that managed to stabilize deep-complexity fractal routes with low vram at the cost of time. Primary problem with it, was the time only matters if you're training a massive model. You don't get benefit from small models like how I usually train, so it was mothballed.

The geolip aleph routed attention is a viable option to train a david and it can in fact handle small models but needs much testing. As it stands it does not benefit from the same large model routing optimizations for vram as the cantor fractal routing. This essentially means that it will OOM like traditional attention. However, because it's based on the aleph structure it'll stabilize point clouds for Q and K, which when employed structurally can provide a cached V. I'm testing structural changes that will allow the structure to bind deterministic systems to K so KV caching can happen and Q can operate normally.

With the aleph routed attention worked out I'll be able to provide an actual backbone to SDXL instead of just a partial one through tokens. This will allow the model to directly differentiate tokens through gated learning and attention anchoring, which in theory could enable surge training through procrustes. They are essentially different towers though, so I'm uncertain still if the effect will transcribe or be topical until after the experiments.

published a dataset 14 days ago

AbstractPhil/diffusion-pretrain-set-ft1-1024

Viewer • Updated 13 days ago • 1.14M • 692

replied to their post 15 days ago

Massive expansion to optimization happening today. I can't spend all these upcoming days training when optimization can happen now. My target today is to have a marked and improved speed, as well as enabling accelerate training for upcoming heavy runpod expansion. Likely switching to 8 a40s to train will be a more reasonable use of cost and effectiveness of training.

I advise whenever using qwen 3.5 to install fast path linear attention.

published an article 16 days ago

Article

geolip-aleph-void: The First Relational Geometric Vocabulary Patchwork

AbstractPhil

•

16 days ago

updated a model 16 days ago

AbstractPhil/geolip-sdxl-aleph

Text-to-Image • Updated 15 days ago • • 1

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity

geolip-aleph-void: The First Relational Geometric Vocabulary Patchwork