Errors installing DAM

by rfbrito - opened about 24 hours ago

Hello!

I was following the directions on the DAM model page for installation. I am on a linux HPC. After cloning the repo, when I try to run

pip install -r requirements.txt

I get the following error:

"ERROR: Could not find a version that satisfies the requirement pytorch=2.6.0 (from versions: 0.1.2, 1.0.2)
ERROR: No matching distribution found for pytorch=2.6.0"

I replicated this (both on the cluster and on my mac) error by trying to just install that version of torch using the directions in their documentation in a fresh conda env:

https://pytorch.org/get-started/previous-versions/ (and go to v2.6.0).

Note that torch uses 'torch' instead of 'pytorch' to install their packages, but i get this error in both cases.

I will try a newer version of torch (as I can successfully install a current version), but I wanted to check in case there were versioning issues w.r.t. testing and in case others run into this too!

Cheers,

Rahul

rfbrito

about 23 hours ago

Just an update! If I try to just install most recent version of the packages in requiremets.txt from pypi, I get a python error:

from pipeline import Pipeline
pipeline = Pipeline()
Loading weights: 100%|████████████████████████████████| 479/479 [00:00<00:00, 876.23it/s]
Loading weights: 100%|███████████████████████████████| 479/479 [00:00<00:00, 1652.40it/s]
Traceback (most recent call last):
File "", line 1, in
pipeline = Pipeline()
File "/orcd/home/002/rfbrito/modeling/dam/pipeline.py", line 25, in init
state_dict = torch.load(checkpoint, map_location=device)
File "/home/rfbrito/miniconda3/envs/dam/lib/python3.14/site-packages/torch/serialization.py", line 1572, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make weights_only=True compatible with your use case: WeightsUnpickler error:

Unsupported operand 118

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

If I try to update torch.load() to add weights_only=False (not sure if this is even what you want), I get:

pipeline = Pipeline()
...
Loading weights: 100%|███████████████████████████████| 479/479 [00:00<00:00, 1272.81it/s]
Loading weights: 100%|███████████████████████████████| 479/479 [00:00<00:00, 1650.12it/s]
Traceback (most recent call last):
File "", line 1, in
pipeline = Pipeline()
File "/orcd/home/002/rfbrito/modeling/dam/pipeline.py", line 25, in init
state_dict = torch.load(checkpoint, map_location=device, weights_only=False)
File "/home/rfbrito/miniconda3/envs/dam/lib/python3.14/site-packages/torch/serialization.py", line 1573, in load
return _legacy_load(
opened_file, map_location, pickle_module, **pickle_load_args
)
File "/home/rfbrito/miniconda3/envs/dam/lib/python3.14/site-packages/torch/serialization.py", line 1822, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

So I think I am stuck here. Thanks for your help when you get the chance!

NDStein

Kintsugi org about 23 hours ago

•

edited about 23 hours ago

I can reproduce this when trying to install via pip. When I made the requirements file, I only tested it with mamba, where it still works for me via

mamba env create -n dam -f requirements.txt
mamba activate dam

Could you try installing via mamba and see if that works for you? I believe it should be possible with pip but requires pointing it at the right index. If this works I can update the docs.

rfbrito

about 21 hours ago

•

edited about 21 hours ago

I was able to get the install and get pipeline = Pipeline() to run properly after a minor tweak to the command, and code to run after some changes to the script.

For install, I ran this to add the channels on my cluster.

mamba env create -n dam -f requirements.txt -c conda-forge -c pytorch -c nvidia

However, I was still getingt this error:

>>>pipeline = Pipeline()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/orcd/home/002/rfbrito/modeling/dam/pipeline.py", line 25, in __init__
    state_dict = torch.load(checkpoint, map_location=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/orcd/home/002/rfbrito/.conda/envs/dam/lib/python3.12/site-packages/torch/serialization.py", line 1494, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 118

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

So I made the below fix, per the error (with unchanged lines before and after:

 self.device = device
        self.model = Classifier(**config)
        self.preprocessor = Preprocessor(**self.model.preprocessor_config)
        state_dict = torch.load(checkpoint, map_location=device, weights_only=False) # change here to add this flag
        self.model.load_state_dict(state_dict)
        self.model.to(self.device)
        self.model.eval()

And i got the same error as before. After debugging with Claude, I was able to figure out that dam3.1.ckpt was just a pointer file (only 143b) not the actual file. I did the below:

 # Remove the pointer file
rm dam3.1.ckpt

# Download the real file directly
wget https://huggingface.co/KintsugiHealth/dam/resolve/main/dam3.1.ckpt

And then pipeline = Pipeline() worked!

A side note, I do get this warning:

>>>from pipeline import Pipeline
/orcd/home/002/rfbrito/.conda/envs/dam/lib/python3.12/site-packages/sympy/external/gmpy.py:139: UserWarning: gmpy2 version is too old to use (2.0.0 or newer required)
  gmpy = import_module('gmpy2', min_module_version=_GMPY2_MIN_VERSION,  `````

NDStein

Kintsugi org about 20 hours ago

Thanks! I can add details on the mamba portion to the documentation.

Did you have git lfs and/or git xet installed? These should enable you to fetch the real file rather than the pointer without needing a separate wget command. I think you don't need to make the weights_only change once you have the real file, right?

This gmpy2 warning is weird. I'm getting it too now, but I don't remember seeing it before. I'm not sure if this could be related to only having tested the instructions on a GPU machine. According to mamba, the version of gmpy2 installed is the latest 2.3.0, but according to import gmpy2; print(gmpy2.__version__) it's 0.0.0. I tried a few of google's suggestions on that with no luck. Please let me know if it seems to be a blocker and/or you find a workaround.

rfbrito

about 19 hours ago

Thank you! I don't have git lfs; I am trying to figure out a way to but I keep running into permissions errors on the cluster. Trying to see if I can figure that out.

gmpy2 was not a blocker!

Fortunately I was able to get the model running (very small note, had to update pipeline.run_on_file(file, quantized=True) per the getting started steps in the data card to pipeline.run_on_file(file, quantized=True) )

NDStein

Kintsugi org about 17 hours ago

You're welcome! I might be missing something obvious or there might be a copy-paste error but it looks like you wrote pipeline.run_on_file(file, quantized=True) twice. What was the change?

rfbrito

about 14 hours ago

oh copy-past error my bad! "quantize" vs "quantized" with no d is what i meant to say. Its different between the docs and wahts in the code unless Ive truly lost my mind:)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment