new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jul 1

SphOR: A Representation Learning Perspective on Open-set Recognition for Identifying Unknown Classes in Deep Learning Models

The reliance on Deep Neural Network (DNN)-based classifiers in safety-critical and real-world applications necessitates Open-Set Recognition (OSR). OSR enables the identification of input data from classes unknown during training as unknown, as opposed to misclassifying them as belonging to a known class. DNNs consist of a feature extraction backbone and classifier head; however, most OSR methods typically train both components jointly, often yielding feature representations that adapt poorly to unknown data. Other approaches employ off-the-shelf objectives, such as supervised contrastive learning, which are not specifically designed for OSR. To address these limitations, we propose SpHOR, which explicitly shapes the feature space via supervised representation learning, before training a classifier. Instead of relying on generic feature learning, SpHOR custom-designs representation learning for OSR through three key innovations: (1) enforcing discriminative class-specific features via orthogonal label embeddings, ensuring clearer separation between classes. (2) imposing a spherical constraint, modeling representations as a mixture of von Mises-Fisher distributions. (3) integrating Mixup and Label Smoothing (LS) directly into the representation learning stage. To quantify how these techniques enhance representations for OSR, we introduce two metrics: the Angular Separability (AS) and Norm Separability (NS). Combining all three innovations, SpHOR achieves state-of-the-art results (in AUROC and OSCR) across various coarse-grained and fine-grained open-set benchmarks, particularly excelling on the Semantic Shift Benchmark with improvements up to 5.1\%. Code at https://github.com/nadarasarbahavan/SpHOR

  • 3 authors
·
Feb 21

GNN-Coder: Boosting Semantic Code Retrieval with Combined GNNs and Transformer

Code retrieval is a crucial component in modern software development, particularly in large-scale projects. However, existing approaches relying on sequence-based models often fail to fully exploit the structural dependencies inherent in code, leading to suboptimal retrieval performance, particularly with structurally complex code fragments. In this paper, we introduce GNN-Coder, a novel framework based on Graph Neural Network (GNN) to utilize Abstract Syntax Tree (AST). We make the first attempt to study how GNN-integrated Transformer can promote the development of semantic retrieval tasks by capturing the structural and semantic features of code. We further propose an innovative graph pooling method tailored for AST, utilizing the number of child nodes as a key feature to highlight the intrinsic topological relationships within the AST. This design effectively integrates both sequential and hierarchical representations, enhancing the model's ability to capture code structure and semantics. Additionally, we introduce the Mean Angular Margin (MAM), a novel metric for quantifying the uniformity of code embedding distributions, providing a standardized measure of feature separability. The proposed method achieves a lower MAM, indicating a more discriminative feature representation. This underscores GNN-Coder's superior ability to distinguish between code snippets, thereby enhancing retrieval accuracy. Experimental results show that GNN-Coder significantly boosts retrieval performance, with a 1\%-10\% improvement in MRR on the CSN dataset, and a notable 20\% gain in zero-shot performance on the CosQA dataset.

  • 4 authors
·
Feb 20, 2025

Separable neural architectures as a primitive for unified predictive and generative intelligence

Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.

  • 5 authors
·
Mar 12

General teleparallel geometric theory of defects

We revisit the geometric theory of defects. In the differential-geometric models of defects that have been adopted since the 1950s, dislocations have been associated with torsion, disclinations with the full curvature, and point defects with the first kind trace of non-metricity. The mainstream formulation exhibits several conceptual and technical shortcomings, most notably a hierarchy inconsistency, the non-exictence of a genuine metric formulation, and the potential emergence of Ostrogradsky-type instabilities. These issues have motivated us to develop a new framework, namely a generalized teleparallel geometric theory of defects. In our model, dislocations are identified with the trace of torsion, disclinations with the second kind trace of the non-metricity, and point defects with the first kind trace of the non-metricity. In addition, we retain the scalar part torsion as a free parameter for describing some possible unknown degrees of freedom in the theory of defects. The proposed geometric theory of defects is free from all of the aforementioned drawbacks and is therefore worthy of further investigation. To ensure the coherence and completeness of the discussion, we begin our analysis with elastic deformations, then summarize the existing metric-affine geometric theory of defects, and finally proceed to our original contribution, namely the new theory introduced here. We formulate the entire theory in Eulerian coordinates. Naturally, all results can be reformulated in Lagrangian coordinates as well. All analyses and formulae are expressed in the language of exterior algebra and are carried out in coordinate-independent orthonormal frames.

  • 3 authors
·
Feb 1

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

A central aspiration of mechanistic interpretability is controllability: if we know where a behavior is represented in a model's activations, we should be able to modify it. This rests on a hidden premise -- that the direction which detects a behavior and the direction which controls it are the same, or close. We test this geometrically: what is the angle between the direction that best detects a behavior and the one that best causes it? If detection implies control the cosine is near 1; otherwise it quantifies a detection-intervention gap. On Gemma 2-2B-it, output format (clean JSON vs markdown fencing) collapses both roles onto one axis. Hallucination does not: the model detects fake entities with perfect linear separability (AUC = 1.000 from layer 5), yet that direction sits at cos = 0.12 (about 83 degrees) from the direction producing a refusal -- a small, reproducible alignment, far from the cos = 1 that "detection is control" would require. A detector built from activations, with no chosen tokens, likewise fails to align (cos = -0.06). The gap generalizes: across four models from three families and two scales (1B-9B), cos stays in [0.12, 0.20], identical before and after instruction tuning (0.1197 vs 0.1200), placing its origin in pretraining. A 15-degree rotation toward the refusal direction partially bridges it -- 73% and 60% refusal on two held-out fake-entity categories at 1.8% false positives. We then ask whether this cosine predicts steerability, and it does not: detection is a high-dimensional class, not a single direction, and what separates the steerable case is functional, not readable from a static angle. The cosine is a weight-computable signature of the dissociation between knowing and steering, not a predictor of it.

  • 5 authors
·
Jun 22

Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution

Exploiting spatial-angular correlation is crucial to light field (LF) image super-resolution (SR), but is highly challenging due to its non-local property caused by the disparities among LF images. Although many deep neural networks (DNNs) have been developed for LF image SR and achieved continuously improved performance, existing methods cannot well leverage the long-range spatial-angular correlation and thus suffer a significant performance drop when handling scenes with large disparity variations. In this paper, we propose a simple yet effective method to learn the non-local spatial-angular correlation for LF image SR. In our method, we adopt the epipolar plane image (EPI) representation to project the 4D spatial-angular correlation onto multiple 2D EPI planes, and then develop a Transformer network with repetitive self-attention operations to learn the spatial-angular correlation by modeling the dependencies between each pair of EPI pixels. Our method can fully incorporate the information from all angular views while achieving a global receptive field along the epipolar line. We conduct extensive experiments with insightful visualizations to validate the effectiveness of our method. Comparative results on five public datasets show that our method not only achieves state-of-the-art SR performance, but also performs robust to disparity variations. Code is publicly available at https://github.com/ZhengyuLiang24/EPIT.

  • 6 authors
·
Feb 15, 2023

Optimised angular power spectra for spectroscopic galaxy surveys

The angular power spectrum is a gauge-independent observable that is in principle the natural tool for analysing galaxy number counts. In practice, the problem is that the computational requirements for next-generation spectroscopic surveys such as Euclid and the Square Kilometre Array are currently unfeasible. We propose a new method to save computational time for spectroscopic angular power spectra. This hybrid method is modelled on the Fourier power spectrum approach of treating relatively thick redshift bins (redshift width ~0.1) as separate surveys. In the hybrid method, each thick bin is further subdivided into thin bins (redshift width ~0.01); all the correlations within each thick bin are computed, while cross-bin correlations beyond the thick bins are neglected. Constraints on cosmological parameters from the hybrid method are comparable to those from the standard galaxy power spectrum analysis - but they have the advantage that cosmic evolution, wide-angle and lensing effects are naturally included, while no Alcock-Paczynski correction is needed. The hybrid method delivers much tighter constraints than a 2D tomographic approach that is typical for photometric surveys, which considers only thick bins and the correlations between them. Furthermore, for standard cosmological parameters our method is not biased by neglecting the effects of lensing on number counts, while the tomographic method is strongly biased.

  • 4 authors
·
Mar 28, 2018

On the Continuity of Rotation Representations in Neural Networks

In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.

  • 5 authors
·
Dec 17, 2018

Kinematical correlations via κ-Poincaré coproducts

We study a kinematical consequence of the Hopf-algebraic momentum composition law in κ-Minkowski spacetime. The same curved momentum space can be described in different coordinates. In the bicrossproduct basis the ordered-plane-wave labels are the translation-generator eigenvalues, so the relevant map is one-to-one. In the classical basis, instead, the translation eigenvalues P_μ are nonlinearly related to the ordered-plane-wave labels p_μ. This relation can fail to be globally one-to-one in a high-momentum region. When a given classical-basis four-momentum admits more than one real auxiliary preimage, the branch-sensitive quantity P_+equiv P_0+P_4=κe^{p_0/κ} enters the coproduct and resolves the branches in two-particle states. Imposing the vanishing total-momentum constraint therefore gives branch-dependent κ-deformed back-to-back momentum correlations. In a single-branch regime this is just a deformed correlated product, while in a multibranch regime a state specified only by P_μ can be expanded into distinct auxiliary branches. If P_μ are taken as the directly meaningful momenta, the physical content is the resulting deformed correlation pattern. If the auxiliary variables p_μ are assigned operational meaning, the same constrained state can be interpreted as a superposition over different auxiliary branches. We also compare this structure with standard regular self-adjoint nonrelativistic minimal-length models and find no analogous smooth local two-real-branch inversion on their physical domains.

  • 2 authors
·
Jun 1

Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors

Modern neural network training relies on optimizers such as Adam and Muon which act on each weight matrix as a single object. Yet every weight matrix carries two distinct quantities -- a magnitude and a direction -- and all optimizers stepping in the matrix as a whole couple their dynamics: the directional change from an update depends on the current magnitude, while the magnitude drifts as a byproduct of learning the direction, so neither is governed directly by the learning rate. Typical training therefore leans on surrounding recipes such as weight decay and warmup to keep learning stable at scale, though these regulate the coupling only indirectly; other recent methods instead constrain the weight to a fixed-norm sphere, but add no learnable magnitude, leaving scale control to normalization layers alone. We propose Magnitude--Direction (MD) Decoupling, an optimizer modification that factorizes each weight into a fixed-norm direction on a hypersphere and learnable per-row and per-column magnitude gains, updated at separate learning rates, all while the model still sees a single fused weight tensor. The method is agnostic to the base optimizer and removes the need for weight decay and warmup. Across both Adam and Muon, MD Decoupling improves on well-tuned baselines, transfers the optimal LR across model width without retuning, and continues to help at scale on large Mixture-of-Experts (MoE) models. Treating magnitude and direction as separately controlled quantities thus yields more predictable training dynamics and a simple, broadly applicable improvement to modern optimizers.

  • 4 authors
·
Jun 23

GARDEN: Gravity-Aligned Reconstruction of Disentangled ENvironments from RGB images

Converting multi-view RGB observations into simulation-ready 3D environments remains challenging because current reconstruction pipelines produce monolithic scene representations without explicit physical structure. They are typically defined up to an arbitrary global rotation and entangle rigid foreground objects with background geometry, which hinders stable physical interaction. Existing solutions often recover interactivity by replacing reconstructed objects with retrieved CAD assets, but this introduces a slow retrieval-and-replacement stage and weakens scene-specific geometric fidelity. We propose GARDEN, an RGB-only framework that reformulates reconstruction as physically-grounded scene factorization and outputs a structured hybrid scene representation. The key idea is to use gravity as a universal physical prior: we first align the reconstruction to a unified Gravity-View frame to resolve gauge ambiguity, then recover object-centric rigid meshes with accurate 6-DoF placement, and finally remove duplicate object geometry from the background through conditional 3D point classification. The resulting representation combines explicit rigid bodies with a decoupled background, enabling direct physics simulation while preserving visual realism. Experiments on both simulated and real multi-view scenes show that GARDEN improves object placement reliability, disentanglement quality, and rendering-simulation efficiency compared with retrieval-based baselines.

  • 6 authors
·
Jun 2

Cylindric plane partitions, Lambda determinants, Commutants in semicircular systems

This thesis is divided into three parts. The first part deals with cylindric plane partitions. The second with lambda-determinants and the third with commutators in semi-circular systems. For more detailed abstract please see inside. Cylindric plane partitions may be thought of as a natural generalization of reverse plane partitions. A generating series for the enumeration of cylindric plane partitions was recently given by Borodin. The first result of section one is a new bijective proof of Borodin's identity which makes use of Fomin's growth diagram framework for generalized RSK correspondences. The second result is a (q,t)-analog of Borodin's identity which extends previous work by Okada in the reverse plane partition case. The third result is an explicit combinatorial interpretation of the Macdonald weight occurring in the (q,t)-analog using the non-intersecting lattice path model for cylindric plane partitions. Alternating sign matrices were discovered by Robbins and Rumsey whilst studying λ-determinants. In the second part of this thesis we prove a multi-parameter generalization of the λ-determinant, generalizing a recent result by di Francesco. Like the original λ-determinant, our formula exhibits the Laurent phenomenon. Semicircular systems were first introduced by Voiculescu as a part of his study of von Neumann algebras. In the third part of this thesis we study certain commutator subalgebras of the semicircular system. We find a projection matrix with an interesting self-similar structure. Making use of our projection formula we given an alternative, elementary proof that the semicircular system is a factor.

  • 1 authors
·
Oct 25, 2021

The Simons Observatory: Cryogenic Half Wave Plate Rotation Mechanism for the Small Aperture Telescopes

We present the requirements, design and evaluation of the cryogenic continuously rotating half-wave plate (CHWP) for the Simons Observatory (SO). SO is a cosmic microwave background (CMB) polarization experiment at Parque Astron\'{o}mico Atacama in northern Chile that covers a wide range of angular scales using both small (0.42 m) and large (6 m) aperture telescopes. In particular, the small aperture telescopes (SATs) focus on large angular scales for primordial B-mode polarization. To this end, the SATs employ a CHWP to modulate the polarization of the incident light at 8 Hz, suppressing atmospheric 1/f noise and mitigating systematic uncertainties that would otherwise arise due to the differential response of detectors sensitive to orthogonal polarizations. The CHWP consists of a 505 mm diameter achromatic sapphire HWP and a cryogenic rotation mechanism, both of which are cooled down to sim50 K to reduce detector thermal loading. Under normal operation the HWP is suspended by a superconducting magnetic bearing and rotates with a constant 2 Hz frequency, controlled by an electromagnetic synchronous motor. We find that the number of superconductors and magnets that make up the superconducting magnetic bearing are important design parameters, especially for the rotation mechanism's vibration performance. The rotation angle is detected through an angular encoder with a noise level of 0.07 muradmathrm{s}. During a cooldown, the rotor is held in place by a grip-and-release mechanism that serves as both an alignment device and a thermal path. In this paper we provide an overview of the SO SAT CHWP: its requirements, hardware design, and laboratory performance.

  • 27 authors
·
Sep 26, 2023

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a static property of individual-layer representations-a direction to be probed, a feature to be extracted. Less is known about the dynamics: how internal representations diverge across the full depth of the network when the model processes correct versus incorrect continuations. We introduce forced-completion probing, a method that presents identical queries with known correct and incorrect single-token continuations and tracks five geometric measurements across every layer of four decoder-only models(1.5B-13B parameters). We report three findings. First, correct and incorrect paths diverge through rotation, not rescaling: displacement vectors maintain near-identical magnitudes while their angular separation increases, meaning factual selection is encoded in direction on an approximate hypersphere. Second, the model does not passively fail on incorrect input-it actively suppresses the correct answer, driving internal probability away from the right token. Third, both phenomena are entirely absent below a parameter threshold and emerge at 1.6B, suggesting a phase transition in factual processing capability. These results show that factual constraint processing has a specific geometric character-rotational, not scalar; active, not passive-that is invisible to methods based on single-layer probes or magnitude comparisons.

  • 1 authors
·
Feb 24

Three-Phase Transformer

We present Three-Phase Transformer (3PT), a residual-stream structural prior for decoder-only Transformers on a standard SwiGLU + RMSNorm + RoPE + GQA backbone. The hidden vector is partitioned into N equally-sized cyclic channels, each maintained by phase-respecting ops: a per-channel RMSNorm, a 2D Givens rotation between attention and FFN that rotates each channel by theta + i*(2*pi/N), and a head-count constraint aligning GQA heads with the partition. The architecture is a self-stabilizing equilibrium between scrambling and re-imposition, not a bolted-on module. The partition carves out a one-dimensional DC subspace orthogonal to the channels, into which we inject a fixed Gabriel's horn profile r(p) = 1/(p+1) as an absolute-position side-channel composing orthogonally with RoPE's relative-position rotation. The canonical N=3 borrows its metaphor from balanced three-phase AC, where three sinusoids 120 degrees apart sum to zero with no anti-correlated pair. At 123M parameters on WikiText-103, 3PT achieves -7.20% perplexity (-2.62% bits-per-byte) over a matched RoPE-Only baseline at +1,536 parameters (0.00124% of total), with 1.93x step-count convergence speedup (1.64x wall-clock). N behaves as a parameter-sharing knob rather than a unique optimum: at 5.5M an N-sweep over {1,2,3,4,6,8,12} is near-monotone with N=1 winning; at 123M a three-seed sweep finds N=3 and N=1 statistically indistinguishable. The load-bearing mechanism is the channel-partitioned residual stream, per-block rotation, per-phase normalization, and horn DC injection. We characterize (a) self-stabilization of the geometry without explicit enforcement, a novel instance of the conservation-law framework for neural networks; (b) a U-shaped depth profile of rotation-angle drift at 12 layers; (c) orthogonal composition with RoPE, attention, and FFN.

BrainsBuild BrainsBuild
·
Apr 14 5

OAM-Induced Lattice Rotation Reveals a Fractional Optimum in Fault-Tolerant GKP Quantum Sensing

Photon loss and dephasing rapidly degrade the sensitivity of quantum sensors, yet systematic methods for designing error-correcting codes whose geometry is simultaneously adapted to the sensing task and the noise channel do not exist. Here we establish that orbital-angular-momentum (OAM) encoding and Gottesman-Kitaev-Preskill (GKP) lattice geometry are structurally coupled: an OAM mode of topological charge ell induces a phase-space rotation θ_ell=ellπ/ell_{max}, corresponding to a family of twisted GKP stabilizer lattices. Using an end-to-end differentiable Strawberry Fields--TensorFlow circuit, we jointly optimise ell, the lattice aspect ratio r, and the finite-energy envelope ε to maximise quantum Fisher information subject to P_{rm err}leq10^{-3}. The optimum occurs at the fractional charge ell=1.5 (θ=67.5^circ), implementable with a half-integer spiral phase plate, which reduces P_{rm err} by 23.9times relative to the square-lattice baseline while leaving F_Q unchanged to within 0.2%. This surpasses the best integer value (ell=2, 15.7times) and arises from an exact 180^circ periodicity of the P_{rm err}(θ) landscape, confirmed analytically and numerically. We derive a transcendental balance equation for the optimal angle θ^*(η,γ,r) and prove that it decreases with both γ and η. A Shannon-inspired metrological capacity C=F_Qcdot(-ln P_{rm err}), maximised at ell=1.5 with a 41% gain over the square lattice, quantifies the joint sensitivity--fault-tolerance resource. These results establish a geometric design principle for noise-adaptive quantum sensors and a fully open-source differentiable template extensible to other bosonic code families.

  • 2 authors
·
May 13

Flat-sky Angular Power Spectra Revisited

We revisit the flat-sky approximation for evaluating the angular power spectra of projected random fields by retaining information about the correlations along the line of sight. With broad, overlapping radial window functions, these line-of-sight correlations are suppressed and are ignored in the Limber approximation. However, retaining the correlations is important for narrow window functions or unequal-time spectra but introduces significant computational difficulties due to the highly oscillatory nature of the integrands involved. We deal with the integral over line-of-sight wave-modes in the flat-sky approximation analytically, using the FFTlog expansion of the 3D power spectrum. This results in an efficient computational method, which is a substantial improvement compared to any full-sky approaches. We apply our results to galaxy clustering (with and without redshift-space distortions), CMB lensing and galaxy lensing observables. For clustering, we find excellent agreement with the full-sky results on large (percent-level agreement) and intermediate or small (subpercent agreement) scales, dramatically out-performing the Limber approximation for both wide and narrow window functions, and in equal- and unequal-time cases. In the case of lensing, we show on the full sky that the angular power spectrum of the convergence can be very well approximated by projecting the 3D Laplacian (rather than the correct angular Laplacian) of the gravitational potential, even on large scales. Combining this approximation with our flat-sky techniques provides an efficient and accurate evaluation of the CMB lensing angular power spectrum on all scales.

  • 3 authors
·
Jul 25, 2023

Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings

Multi-Resolution Hash Encoding (MHE), the foundational technique behind Instant Neural Graphics Primitives, provides a powerful parameterization for neural fields. However, its spatial behavior lacks rigorous understanding from a physical systems perspective, leading to reliance on heuristics for hyperparameter selection. This work introduces a novel analytical approach that characterizes MHE by examining its Point Spread Function (PSF), which is analogous to the Green's function of the system. This methodology enables a quantification of the encoding's spatial resolution and fidelity. We derive a closed-form approximation for the collision-free PSF, uncovering inherent grid-induced anisotropy and a logarithmic spatial profile. We establish that the idealized spatial bandwidth, specifically the Full Width at Half Maximum (FWHM), is determined by the average resolution, N_{avg}. This leads to a counterintuitive finding: the effective resolution of the model is governed by the broadened empirical FWHM (and therefore N_{avg}), rather than the finest resolution N_{max}, a broadening effect we demonstrate arises from optimization dynamics. Furthermore, we analyze the impact of finite hash capacity, demonstrating how collisions introduce speckle noise and degrade the Signal-to-Noise Ratio (SNR). Leveraging these theoretical insights, we propose Rotated MHE (R-MHE), an architecture that applies distinct rotations to the input coordinates at each resolution level. R-MHE mitigates anisotropy while maintaining the efficiency and parameter count of the original MHE. This study establishes a methodology based on physical principles that moves beyond heuristics to characterize and optimize MHE.

  • 2 authors
·
Feb 10

Space-time tradeoffs of lenses and optics via higher category theory

Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind. We identify operational differences between denotationally isomorphic categories of cartesian optics and lenses: their different composition rule and corresponding space-time tradeoffs, positioning them at two opposite ends of a spectrum. With these motivations we lift the existing categorical constructions and their relationships to the 2-categorical level, showing that the relevant operational concerns become visible. We define the 2-category 2-Optic(C) whose 2-cells explicitly track optics' internal configuration. We show that the 1-category Optic(C) arises by locally quotienting out the connected components of this 2-category. We show that the embedding of lenses into cartesian optics gets weakened from a functor to an oplax functor whose oplaxator now detects the different composition rule. We determine the difficulties in showing this functor forms a part of an adjunction in any of the standard 2-categories. We establish a conjecture that the well-known isomorphism between cartesian lenses and optics arises out of the lax 2-adjunction between their double-categorical counterparts. In addition to presenting new research, this paper is also meant to be an accessible introduction to the topic.

  • 1 authors
·
Sep 19, 2022

Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions

Uniform random rotations are a useful primitive in applications such as fast Johnson-Lindenstrauss embeddings, kernel approximation, communication-efficient learning, and recent AI compression pipelines, but they are computationally expensive to generate and apply in high dimensions. A common practical replacement is repeated structured random rotations built from Walsh-Hadamard transforms and random sign diagonals. Applying the structured random rotation twice has been shown empirically to be useful, but the supporting theory is still limited. In this paper we study the approximation quality achieved when using this two-block structured Hadamard rotation. Our results are both positive and negative. On the positive side, we prove that every fixed coordinate of the two-block transform converges uniformly, over all inputs, to the corresponding coordinate of a uniformly rotated vector, with an explicit Kolmogorov-distance bound of order d^{-1/5}. On the negative side, we prove an explicit lower bound on the Wasserstein distance between the full vector distributions, showing that the two-block transform is not a globally accurate surrogate for a uniform random rotation in the worst case. For the extremal input used in the lower bound, we also prove a matching asymptotic upper bound, showing that the lower-bound scale is sharp for that input. Taken together, the results identify a clear separation between one-dimensional marginal behavior, where approximation improves with dimension, and full high-dimensional geometry, where a nonvanishing discrepancy remains. This provides a partial theoretical explanation for the empirical success of structured Hadamard rotations in some algorithms, while also clarifying the limitations of treating them as drop-in replacements for true uniform random rotations.

  • 2 authors
·
Apr 24

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Despite significant progress in alignment, large language models (LLMs) remain vulnerable to adversarial attacks that elicit harmful behaviors. Activation steering techniques offer a promising inference-time intervention approach, but existing methods suffer from critical limitations: activation addition requires careful coefficient tuning and is sensitive to layer-specific norm variations, while directional ablation provides only binary control. Recent work on Angular Steering introduces continuous control via rotation in a 2D subspace, but its practical implementation violates norm preservation, causing distribution shift and generation collapse, particularly in models below 7B parameters. We propose Selective Steering, which addresses these limitations through two key innovations: (1) a mathematically rigorous norm-preserving rotation formulation that maintains activation distribution integrity, and (2) discriminative layer selection that applies steering only where feature representations exhibit opposite-signed class alignment. Experiments across nine models demonstrate that Selective Steering achieves 5.5x higher attack success rates than prior methods while maintaining zero perplexity violations and approximately 100\% capability retention on standard benchmarks. Our approach provides a principled, efficient framework for controllable and stable LLM behavior modification. Code: https://github.com/knoveleng/steering

Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

SO(3)-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the Clebsch-Gordan (CG) tensor product, which is computationally expensive. To accelerate the computation, we develop tensor decomposition networks (TDNs) as a class of approximately equivariant networks in which CG tensor products are replaced by low-rank tensor decompositions, such as the CANDECOMP/PARAFAC (CP) decomposition. With the CP decomposition, we prove (i) a uniform bound on the induced error of SO(3)-equivariance, and (ii) the universality of approximating any equivariant bilinear map. To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the O(L^3) CG paths into a single shared parameter set without compromising equivariance, where L is the maximum angular degree. The resulting layer acts as a plug-and-play replacement for tensor products in existing networks, and the computational complexity of tensor products is reduced from O(L^6) to O(L^4). We evaluate TDNs on PubChemQCR, a newly curated molecular relaxation dataset containing 105 million DFT-calculated snapshots. We also use existing datasets, including OC20, and OC22. Results show that TDNs achieve competitive performance with dramatic speedup in computations. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS/tree/main/OpenMol/TDN{https://github.com/divelab/AIRS/}).

  • 9 authors
·
Jul 1, 2025

Synthetic Light Curves and Spectra for the Photospheric Phase of a 3D Stripped-Envelope Supernova Explosion Model

We present synthetic light curves and spectra from three-dimensional (3D) Monte Carlo radiative transfer simulations based on a 3D core-collapse supernova explosion model of an ultra-stripped 3.5,M_{odot} progenitor. Our calculations predict a fast and faint transient with Delta m_{15} sim 1- 2,mag and peak bolometric luminosity between -15.3,mag and -16.4,mag. Due to a large-scale unipolar asymmetry in the distribution of ^{56}Ni, there is a pronounced viewing-angle dependence with about 1,mag difference between the directions of highest and lowest luminosity. The predicted spectra for this rare class of explosions do not yet match any observed counterpart. They are dominated by prominent Mg~II lines, but features from O, C, Si, and Ca are also found. In particular, the O~I line at 7{774} appears as a blended feature together with Mg~II emission. Our model is not only faster and fainter than the observed Ib/c supernova population, but also shows a correlation between higher peak luminosity and larger Delta m_{15} that is not present in observational samples. A possible explanation is that the unusually small ejecta mass of our model accentuates the viewing-angle dependence of the photometry. We suggest that the viewing-angle dependence of the photometry may be used to constrain asymmetries in explosion models of more typical stripped-envelope supernova progenitors in future.

  • 5 authors
·
Oct 28, 2024

An Unsupervised Method for Estimating Class Separability of Datasets with Application to LLMs Fine-Tuning

This paper proposes an unsupervised method that leverages topological characteristics of data manifolds to estimate class separability of the data without requiring labels. Experiments conducted in this paper on several datasets demonstrate a clear correlation and consistency between the class separability estimated by the proposed method with supervised metrics like Fisher Discriminant Ratio~(FDR) and cross-validation of a classifier, which both require labels. This can enable implementing learning paradigms aimed at learning from both labeled and unlabeled data, like semi-supervised and transductive learning. This would be particularly useful when we have limited labeled data and a relatively large unlabeled dataset that can be used to enhance the learning process. The proposed method is implemented for language model fine-tuning with automated stopping criterion by monitoring class separability of the embedding-space manifold in an unsupervised setting. The proposed methodology has been first validated on synthetic data, where the results show a clear consistency between class separability estimated by the proposed method and class separability computed by FDR. The method has been also implemented on both public and internal data. The results show that the proposed method can effectively aid -- without the need for labels -- a decision on when to stop or continue the fine-tuning of a language model and which fine-tuning iteration is expected to achieve a maximum classification performance through quantification of the class separability of the embedding manifold.

  • 6 authors
·
May 24, 2023

Ghost on the Shell: An Expressive Representation of General 3D Shapes

The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.

  • 7 authors
·
Oct 23, 2023

Magic sizes enable minimal-complexity, high-fidelity assembly of programmable shells

Recent advances in synthetic methods enable designing subunits that self-assemble into structures with well-defined sizes and architectures, but yields are frequently suppressed by the formation of off-target metastable structures. Increasing the complexity (number of distinct inter-subunit interaction types) can inhibit off-target structures, but leads to slower kinetics and higher synthesis costs. Here, we use icosahedral shells formed of programmable triangular subunits as a model system, and identify design principles that produce the highest target yield at the lowest complexity. We use a symmetry-based construction to create a range of design complexities, starting from the maximal symmetry Caspar-Klug assembly up to the fully addressable, zero-symmetry assembly. Kinetic Monte Carlo simulations reveal that the most prominent defects leading to off-target assemblies are a class of disclinations. We derive symmetry-based rules for identifying the optimal (lowest-complexity, highest-symmetry) design that inhibits these disclinations, leading to robust, high-fidelity assembly of targets with arbitrarily large sizes. Optimal complexity varies non-monotonically with target size, with `magic' sizes appearing for high-symmetry designs in which symmetry axes do not intersect vertices of the triangular net. The optimal designs at magic sizes require 12 times fewer inequivalent interaction-types than the (minimal symmetry) fully addressable construction.

  • 6 authors
·
Nov 6, 2024

Spherical convolutions on molecular graphs for protein model quality assessment

Processing information on 3D objects requires methods stable to rigid-body transformations, in particular rotations, of the input data. In image processing tasks, convolutional neural networks achieve this property using rotation-equivariant operations. However, contrary to images, graphs generally have irregular topology. This makes it challenging to define a rotation-equivariant convolution operation on these structures. In this work, we propose Spherical Graph Convolutional Network (S-GCN) that processes 3D models of proteins represented as molecular graphs. In a protein molecule, individual amino acids have common topological elements. This allows us to unambiguously associate each amino acid with a local coordinate system and construct rotation-equivariant spherical filters that operate on angular information between graph nodes. Within the framework of the protein model quality assessment problem, we demonstrate that the proposed spherical convolution method significantly improves the quality of model assessment compared to the standard message-passing approach. It is also comparable to state-of-the-art methods, as we demonstrate on Critical Assessment of Structure Prediction (CASP) benchmarks. The proposed technique operates only on geometric features of protein 3D models. This makes it universal and applicable to any other geometric-learning task where the graph structure allows constructing local coordinate systems.

  • 3 authors
·
Nov 16, 2020

Direction-Preserving Number Representations

Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector can be represented by selecting its scalar elements from a common finite alphabet of a given size. This is standard practice in machine learning, where low-precision significands may be narrow-width floating-point or integer values. A geometric framework is introduced for analyzing the directional coverage of such product-structured codes. This work analytically quantifies the suboptimality gap between such product-structured codes and spherical codes for the vector as a whole, in both low and asymptotically high dimensions. Furthermore, within the product code class, it is proven that the standard formats of two's complement, fixed-point, and floating-point are suboptimal, again with quantified gap, pointing to the potential to develop new scalar number formats. Such scalar alphabets are numerically optimized across multiple block dimensions for directional coverage, including the dimension used in NVIDIA's NVFP4 format. Experimental results are presented comparing the performance of standard formats and the optimized alphabet. We find that for four bits, NVIDIA's choice of E2M1 closely approximates the optimized alphabet, providing a geometric explanation for its strong performance in low-precision machine learning workloads and an analytical understanding of the link between that superiority and block size. We provide open-source formal proofs in Lean for the theorems in this work, along with the experimental code and the optimized alphabets obtained.

  • 2 authors
·
May 7

Parameter estimation from the core-bounce phase of rotating core collapse supernovae in real interferometer noise

In this work we propose an analytical model that reproduces the core-bounds phase of gravitational waves (GW) of Rapidly Rotating (RR) from Core Collapse Supernovae (CCSNe), as a function of three parameters, the arrival time tau, the ratio of the kinetic and potential energy beta and a phenomenological parameter alpha related to rotation and equation of state (EOS). To validate the model we use 126 waveforms from the Richers catalog Richers_2017 selected with the criteria of exploring a range of rotation profiles, and involving EOS. To quantify the degree of accuracy of the proposed model, with a particular focus on the rotation parameter beta, we show that the average Fitting Factor (FF) between the simulated waveforms with the templates is 94.4\%. In order to estimate the parameters we propose a frequentist matched filtering approach in real interferometric noise which does not require assigning any priors. We use the Matched Filter (MF) technique, where we inject a bank of templates considering simulated colored Gaussian noise and the real noise of O3L1. For example for A300w6.00\_BHBLP at 10Kpc we obtain a standar deviation of sigma = 3.34times 10^{-3} for simulated colored Gaussian noise and sigma= 1.46times 10^{-2} for real noise. On the other hand, from the asymptotic expansion of the variance we obtain the theoretical minimum error for beta at 10 kpc and optimal orientation. The estimation error in this case is from 10^{-2} to 10^{-3} as beta increases. We show that the results of the estimation error of beta for the 3-parameter space (3D) is consistent with the single-parameter space (1D), which allows us to conclude that beta is decoupled from the others two parameters.

  • 5 authors
·
Apr 3, 2023

VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations

Rotation estimation of high precision from an RGB-D object observation is a huge challenge in 6D object pose estimation, due to the difficulty of learning in the non-linear space of SO(3). In this paper, we propose a novel rotation estimation network, termed as VI-Net, to make the task easier by decoupling the rotation as the combination of a viewpoint rotation and an in-plane rotation. More specifically, VI-Net bases the feature learning on the sphere with two individual branches for the estimates of two factorized rotations, where a V-Branch is employed to learn the viewpoint rotation via binary classification on the spherical signals, while another I-Branch is used to estimate the in-plane rotation by transforming the signals to view from the zenith direction. To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution (SPA-SConv), which settles the boundary problem of spherical signals via feature padding and realizesviewpoint-equivariant feature extraction by symmetric convolutional operations. We apply the proposed VI-Net to the challenging task of category-level 6D object pose estimation for predicting the poses of unknown objects without available CAD models; experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.

  • 4 authors
·
Aug 19, 2023

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization

This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these techniques were designed to extract meshes from fixed, known fields, and in the optimization setting they lack the degrees of freedom to represent high-quality feature-preserving meshes, or suffer from numerical instabilities. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives. Our main insight is to introduce additional carefully-chosen parameters into the representation, which allow local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation when optimizing for a downstream task. We base our extraction scheme on Dual Marching Cubes for improved topological properties, and present extensions to optionally generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments validate FlexiCubes on both synthetic benchmarks and real-world applications, showing that it offers significant improvements in mesh quality and geometric fidelity.

  • 10 authors
·
Aug 10, 2023

Canonicalizing Multimodal Contrastive Representation Learning

As models and data scale, independently trained networks often induce analogous notions of similarity. But, matching similarities is weaker than establishing an explicit correspondence between the representation spaces, especially for multimodal models, where consistency must hold not only within each modality, but also for the learned image-text coupling. We therefore ask: given two independently trained multimodal contrastive models (with encoders (f, g) and (f,g)) -- trained on different distributions and with different architectures -- does a systematic geometric relationship exist between their embedding spaces? If so, what form does it take, and does it hold uniformly across modalities? In this work, we show that across model families such as CLIP, SigLIP, and FLAVA, this geometric relationship is well approximated by an orthogonal map (up to a global mean shift), i.e., there exists an orthogonal map Q where Q^top Q = I such that f(x)approx Q f(x) for paired images x. Strikingly, the same Q simultaneously aligns the text encoders i.e., g(y)approx Q g(y) for texts y. Theoretically, we prove that if the multimodal kernel agrees across models on a small anchor set i.e. langle f(x), g(y)rangle approx langle f(x), g(y)rangle, then the two models must be related by a single orthogonal map Q and the same Q maps images and text across models. More broadly, this finding enables backward-compatible model upgrades, avoiding costly re-embedding, and has implications for the privacy of learned representations. Our project page: https://canonical-multimodal.github.io/

  • 5 authors
·
Feb 19

Characterising gravitational wave stochastic background anisotropy with Pulsar Timing Arrays

Detecting a stochastic gravitational wave background, particularly radiation from individually unresolvable super-massive black hole binary systems, is one of the primary targets for Pulsar Timing Arrays. Increasingly more stringent upper limits are being set on these signals under the assumption that the background radiation is isotropic. However, some level of anisotropy may be present and the characterisation of the power at different angular scales carries important information. We show that the standard analysis for isotropic backgrounds can be generalised in a conceptually straightforward way to the case of generic anisotropic background radiation by decomposing the angular distribution of the gravitational wave power on the sky into multipole moments. We introduce the concept of generalised overlap reduction functions which characterise the effect of the anisotropy multipoles on the correlation of the timing residuals from the pulsars timed by a Pulsar Timing Array. In a search for a signal characterised by a generic anisotropy, the generalised overlap reduction functions play the role of the so-called Hellings and Downs curve used for isotropic radiation. We compute the generalised overlap reduction functions for a generic level of anisotropy and Pulsar Timing Array configuration. We also provide an order of magnitude estimate of the level of anisotropy that can be expected in the background generated by super-massive black hole binary systems.

  • 4 authors
·
Jun 23, 2013

Riemannian Flow Matching for Disentangled Graph Domain Adaptation

Graph Domain Adaptation (GDA) typically uses adversarial learning to align graph embeddings in Euclidean space. However, this paradigm suffers from two critical challenges: Structural Degeneration, where hierarchical and semantic representations are entangled, and Optimization Instability, which arises from oscillatory dynamics of minimax adversarial training. To tackle these issues, we propose DisRFM, a geometry-aware GDA framework that unifies Riemannian embedding and flow-based transport. First, to overcome structural degeneration, we embed graphs into a Riemannian manifold. By adopting polar coordinates, we explicitly disentangle structure (radius) from semantics (angle). Then, we enforce topology preservation through radial Wasserstein alignment and semantic discrimination via angular clustering, thereby preventing feature entanglement and collapse. Second, we address the instability of adversarial alignment by using Riemannian flow matching. This method learns a smooth vector field to guide source features toward the target along geodesic paths, guaranteeing stable convergence. The geometric constraints further guide the flow to maintain the disentangled structure during transport. Theoretically, we prove the asymptotic stability of the flow matching and derive a tighter bound for the target risk. Extensive experiments demonstrate that DisRFM consistently outperforms state-of-the-art methods.

  • 5 authors
·
Jan 31

Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products

Developing equivariant neural networks for the E(3) group plays an important role in modeling 3D data across real-world applications. Enforcing this equivariance primarily involves the tensor products of irreducible representations (irreps). However, the computational complexity of such operations increases significantly as higher-order tensors are used. In this work, we propose a systematic approach to substantially accelerate the computation of the tensor products of irreps. We mathematically connect the commonly used Clebsch-Gordan coefficients to the Gaunt coefficients, which are integrals of products of three spherical harmonics. Through Gaunt coefficients, the tensor product of irreps becomes equivalent to the multiplication between spherical functions represented by spherical harmonics. This perspective further allows us to change the basis for the equivariant operations from spherical harmonics to a 2D Fourier basis. Consequently, the multiplication between spherical functions represented by a 2D Fourier basis can be efficiently computed via the convolution theorem and Fast Fourier Transforms. This transformation reduces the complexity of full tensor products of irreps from O(L^6) to O(L^3), where L is the max degree of irreps. Leveraging this approach, we introduce the Gaunt Tensor Product, which serves as a new method to construct efficient equivariant operations across different model architectures. Our experiments on the Open Catalyst Project and 3BPA datasets demonstrate both the increased efficiency and improved performance of our approach.

  • 3 authors
·
Jan 18, 2024

Detection asymmetry in solar energetic particle events

Context. Solar energetic particles (SEPs) are detected in interplanetary space in association with flares and coronal mass ejections (CMEs) at the Sun. The magnetic connection between the observing spacecraft and the solar active region (AR) source of the event is a key parameter in determining whether SEPs are observed and the properties of the particle event. Aims. We investigate whether an east-west asymmetry in the detection of SEP events is present in observations and discuss its possible link to corotation of magnetic flux tubes with the Sun. Methods. We used a published dataset of 239 CMEs recorded between 2006 and 2017 and having source regions both on the front side and far side of the Sun as seen from Earth. We produced distributions of occurrence of in-situ SEP intensity enhancements associated with the CME events, versus \Delta \phi, the separation in longitude between the source active region and the magnetic footpoint of the observing spacecraft based on the nominal Parker spiral. We focused on protons of energy >10 MeV measured by the STEREO A, STEREO B and GOES spacecraft at 1 au. We also considered the occurrence of 71-112 keV electron events detected by MESSENGER between 0.31 and 0.47 au. Results. We find an east-west asymmetry in the detection of >10 MeV proton events and of 71-112 keV electron events. For protons, observers for which the source AR is on the east side of the spacecraft footpoint and not well connected (-180 < \Delta \phi < -40) are 93% more likely to detect an SEP event compared to observers with +40 < \Delta \phi < +180. The asymmetry may be a signature of corotation of magnetic flux tubes with the Sun, given that for events with \Delta \phi < 0 corotation sweeps the particle-filled flux tubes towards the observing spacecraft, while for \Delta \phi > 0 it takes them away from it.

  • 9 authors
·
Nov 12, 2024

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceleration. However, these physical tests are often confounded by ambiguous metric scale. We first investigate if observed physical errors are artifacts of these ambiguities (e.g., incorrect frame rate assumptions). We find that even temporal rescaling cannot correct the high-variance gravity artifacts. To rigorously isolate the underlying physical representation from these confounds, we introduce a unit-free, two-object protocol that tests the timing ratio t_1^2/t_2^2 = h_1/h_2, a relationship independent of g, focal length, and scale. This relative test reveals violations of Galileo's equivalence principle. We then demonstrate that this physical gap can be partially mitigated with targeted specialization. A lightweight low-rank adaptor fine-tuned on only 100 single-ball clips raises g_{eff} from 1.81,m/s^2 to 6.43,m/s^2 (reaching 65% of terrestrial gravity). This specialist adaptor also generalizes zero-shot to two-ball drops and inclined planes, offering initial evidence that specific physical laws can be corrected with minimal data.

  • 4 authors
·
Dec 1, 2025

Principled Reflection Separation via Nonlinear Superposition and Feature Interaction

Single-image reflection separation is fundamentally challenged by the entanglement of transmission and reflection layers under complex image formation processes. Existing approaches largely rely on simplified assumptions or independent modeling, limiting their ability to handle real-world scenarios. In this work, we revisit the problem from a unified perspective and identify a key issue of existing approaches, i.e., the widely adopted linear composition model in the sRGB domain fails to capture the nonlinear coupling introduced by real-world image signal processing pipelines. To address this, we introduce a learnable nonlinear superposition model that more faithfully characterizes layer interactions and improves decomposition fidelity. Building upon this formulation, we propose a generalized dual-stream interactive framework that explicitly models bidirectional dependencies between transmission and reflection through feature exchange. This framework unifies activation-, gating-, and attention-based interaction mechanisms, and is compatible with both CNN and Transformer backbones. Extensive experiments on diverse real-world benchmarks demonstrate that the proposed approach achieves superior performance with strong generalization capability. More importantly, our study reveals that reflection separation is not about undoing a linear mixture, but about learning nonlinear formation and interaction}, offering new insights into the design of principled image decomposition models. Code and models are publicly available at https://mingcv.github.io/DIRS-Page.

  • 4 authors
·
May 31

Extensions of Schoen--Simon--Yau and Schoen--Simon theorems via iteration à la De Giorgi

We give an alternative proof of the Schoen--Simon--Yau curvature estimates and associated Bernstein-type theorems (1975), and extend the original result by including the case of 6-dimensional (stable minimal) immersions. The key step is an ε-regularity theorem, that assumes smallness of the scale-invariant L^2 norm of the second fundamental form. Further, we obtain a graph description, in the Lipschitz multi-valued sense, for any stable minimal immersion of dimension ngeq 2, that may have a singular set Σ of locally finite H^{n-2}-measure, and that is weakly close to a hyperplane. (In fact, if H^{n-2}(Σ)=0, the conclusion is strengthened to a union of smooth graphs.) This follows directly from an ε-regularity theorem, that assumes smallness of the scale-invariant L^2 tilt-excess (verified when the hypersurface is weakly close to a hyperplane). Specialising the multi-valued decomposition to the case of embeddings, we recover the Schoen--Simon theorem (1981). In both ε-regularity theorems the relevant quantity (respectively, length of the second fundamental form and tilt function) solves a non-linear PDE on the immersed minimal hypersurface. The proof is carried out intrinsically (without linearising the PDE) by implementing an iteration method à la De Giorgi (from the linear De Giorgi--Nash--Moser theory). Stability implies estimates (intrinsic weak Caccioppoli inequalities) that make the iteration effective despite the non-linear framework. (In both ε-regularity theorems the method gives explicit constants that quantify the required smallness.)

  • 1 authors
·
Sep 11, 2025

Faces of highest weight modules and the universal Weyl polyhedron

Let V be a highest weight module over a Kac-Moody algebra g, and let conv V denote the convex hull of its weights. We determine the combinatorial isomorphism type of conv V, i.e. we completely classify the faces and their inclusions. In the special case where g is semisimple, this brings closure to a question studied by Cellini-Marietti [IMRN 2015] for the adjoint representation, and by Khare [J. Algebra 2016; Trans. Amer. Math. Soc. 2017] for most modules. The determination of faces of finite-dimensional modules up to the Weyl group action and some of their inclusions also appears in previous work of Satake [Ann. of Math. 1960], Borel-Tits [IHES Publ. Math. 1965], Vinberg [Izv. Akad. Nauk 1990], and Casselman [Austral. Math. Soc. 1997]. For any subset of the simple roots, we introduce a remarkable convex cone which we call the universal Weyl polyhedron, which controls the convex hulls of all modules parabolically induced from the corresponding Levi factor. Namely, the combinatorial isomorphism type of the cone stores the classification of faces for all such highest weight modules, as well as how faces degenerate as the highest weight gets increasingly singular. To our knowledge, this cone is new in finite and infinite type. We further answer a question of Michel Brion, by showing that the localization of conv V along a face is always the convex hull of the weights of a parabolically induced module. Finally, as we determine the inclusion relations between faces representation-theoretically from the set of weights, without recourse to convexity, we answer a similar question for highest weight modules over symmetrizable quantum groups.

  • 2 authors
·
Oct 31, 2016

High-order finite element method for atomic structure calculations

We introduce featom, an open source code that implements a high-order finite element solver for the radial Schr\"odinger, Dirac, and Kohn-Sham equations. The formulation accommodates various mesh types, such as uniform or exponential, and the convergence can be systematically controlled by increasing the number and/or polynomial order of the finite element basis functions. The Dirac equation is solved using a squared Hamiltonian approach to eliminate spurious states. To address the slow convergence of the kappa=pm1 states due to divergent derivatives at the origin, we incorporate known asymptotic forms into the solutions. We achieve a high level of accuracy (10^{-8} Hartree) for total energies and eigenvalues of heavy atoms such as uranium in both Schr\"odinger and Dirac Kohn-Sham solutions. We provide detailed convergence studies and computational parameters required to attain commonly required accuracies. Finally, we compare our results with known analytic results as well as the results of other methods. In particular, we calculate benchmark results for atomic numbers (Z) from 1 to 92, verifying current benchmarks. We demonstrate significant speedup compared to the state-of-the-art shooting solver dftatom. An efficient, modular Fortran 2008 implementation, is provided under an open source, permissive license, including examples and tests, wherein particular emphasis is placed on the independence (no global variables), reusability, and generality of the individual routines.

  • 8 authors
·
Jul 11, 2023 1