La llamada a ProcessInput del hardware de gráficos Intel H264 MFT falla después de alimentar algunas muestras de entrada, lo mismo funciona bien con el hardware MFT de Nvidia

Estoy capturando el escritorio usando DesktopDuplication API y convirtiendo las muestras de RGBA a NV12 en GPU y alimentando el mismo al hardware MediaFoundation H264 MFT. Esto funciona bien con los gráficos de Nvidia, y también con los codificadores de software, pero falla cuando solo está disponible el hardware de gráficos Intel MFT. El código funciona bien en la misma máquina de gráficos Intel si recurro a Software MFT. También me aseguré de que la codificación se realice en hardware en máquinas gráficas Nvidia.

En los gráficos Intel, MFT devuelve MEError ( "Error no especificado" ), que ocurre solo después de que se alimenta la primera muestra, y las llamadas posteriores a ProcessInput (cuando el generador de eventos activa METransformNeedInput) devuelve "La persona que llama actualmente no acepta más entradas" . Es raro que MFT consuma algunas muestras más antes de devolver estos errores. Este comportamiento es confuso, estoy alimentando una muestra solo cuando el generador de eventos activa METransformNeedInput de forma asincrónica a través de IMFAsyncCallback, y también compruebo correctamente si METransformHaveOutput se activa tan pronto como se alimenta una muestra. Esto realmente me desconcierta cuando la misma lógica asincrónica funciona bien con los codificadores de hardware MFT y Microsoft de Nvidia.

También hay una pregunta similar sin resolver en el foro de Intel. Mi código es similar al mencionado en el subproceso de Intel, excepto por el hecho de que también estoy configurando el administrador de dispositivos d3d en el codificador como se muestra a continuación.

Y, hay otros tres subprocesos de desbordamiento de pila que informan un problema similar sin una solución dada ( codificador MFTransform-> ProcessInput devuelve E_FAIL y Cómo crear IMFSample a partir de la textura D11 para el codificador Intel MFT y MFT asíncrono no envía el evento MFTransformHaveOutput (Decodificador Intel Hardware MJPEG MFT) ). He probado todas las opciones posibles sin mejorar esto.

El código del convertidor de color se toma de muestras de Intel Media SDK. También he subido mi código completo aquí .

Método para configurar el administrador d3d:

void SetD3dManager() {

    HRESULT hr = S_OK;

    if (!deviceManager) {

        // Create device manager
        hr = MFCreateDXGIDeviceManager(&resetToken, &deviceManager);
    }

    if (SUCCEEDED(hr)) 
    {
        if (!pD3dDevice) {

            pD3dDevice = GetDeviceDirect3D(0);
        }
    }

    if (pD3dDevice) {

        // NOTE: Getting ready for multi-threaded operation
        const CComQIPtr<ID3D10Multithread> pMultithread = pD3dDevice;
        pMultithread->SetMultithreadProtected(TRUE);

        hr = deviceManager->ResetDevice(pD3dDevice, resetToken);
        CHECK_HR(_pTransform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(deviceManager.p)), "Failed to set device manager.");
    }
    else {
        cout << "Failed to get d3d device";
    }
}

Getd3ddevice:

CComPtr<ID3D11Device> GetDeviceDirect3D(UINT idxVideoAdapter)
{
    // Create DXGI factory:
    CComPtr<IDXGIFactory1> dxgiFactory;
    DXGI_ADAPTER_DESC1 dxgiAdapterDesc;

    // Direct3D feature level codes and names:

    struct KeyValPair { int code; const char* name; };

    const KeyValPair d3dFLevelNames[] =
    {
        KeyValPair{ D3D_FEATURE_LEVEL_9_1, "Direct3D 9.1" },
        KeyValPair{ D3D_FEATURE_LEVEL_9_2, "Direct3D 9.2" },
        KeyValPair{ D3D_FEATURE_LEVEL_9_3, "Direct3D 9.3" },
        KeyValPair{ D3D_FEATURE_LEVEL_10_0, "Direct3D 10.0" },
        KeyValPair{ D3D_FEATURE_LEVEL_10_1, "Direct3D 10.1" },
        KeyValPair{ D3D_FEATURE_LEVEL_11_0, "Direct3D 11.0" },
        KeyValPair{ D3D_FEATURE_LEVEL_11_1, "Direct3D 11.1" },
    };

    // Feature levels for Direct3D support
    const D3D_FEATURE_LEVEL d3dFeatureLevels[] =
    {
        D3D_FEATURE_LEVEL_11_1,
        D3D_FEATURE_LEVEL_11_0,
        D3D_FEATURE_LEVEL_10_1,
        D3D_FEATURE_LEVEL_10_0,
        D3D_FEATURE_LEVEL_9_3,
        D3D_FEATURE_LEVEL_9_2,
        D3D_FEATURE_LEVEL_9_1,
    };

    constexpr auto nFeatLevels = static_cast<UINT> ((sizeof d3dFeatureLevels) / sizeof(D3D_FEATURE_LEVEL));

    CComPtr<IDXGIAdapter1> dxgiAdapter;
    D3D_FEATURE_LEVEL featLevelCodeSuccess;
    CComPtr<ID3D11Device> d3dDx11Device;

    std::wstring_convert<std::codecvt_utf8<wchar_t>> transcoder;

    HRESULT hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory));
    CHECK_HR(hr, "Failed to create DXGI factory");

    // Get a video adapter:
    dxgiFactory->EnumAdapters1(idxVideoAdapter, &dxgiAdapter);

    // Get video adapter description:
    dxgiAdapter->GetDesc1(&dxgiAdapterDesc);

    CHECK_HR(hr, "Failed to retrieve DXGI video adapter description");

    std::cout << "Selected DXGI video adapter is \'"
        << transcoder.to_bytes(dxgiAdapterDesc.Description) << '\'' << std::endl;

    // Create Direct3D device:
    hr = D3D11CreateDevice(
        dxgiAdapter,
        D3D_DRIVER_TYPE_UNKNOWN,
        nullptr,
        (0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
        d3dFeatureLevels,
        nFeatLevels,
        D3D11_SDK_VERSION,
        &d3dDx11Device,
        &featLevelCodeSuccess,
        nullptr
    );

    // Might have failed for lack of Direct3D 11.1 runtime:
    if (hr == E_INVALIDARG)
    {
        // Try again without Direct3D 11.1:
        hr = D3D11CreateDevice(
            dxgiAdapter,
            D3D_DRIVER_TYPE_UNKNOWN,
            nullptr,
            (0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
            d3dFeatureLevels + 1,
            nFeatLevels - 1,
            D3D11_SDK_VERSION,
            &d3dDx11Device,
            &featLevelCodeSuccess,
            nullptr
        );
    }

    // Get name of Direct3D feature level that succeeded upon device creation:
    std::cout << "Hardware device supports " << std::find_if(
        d3dFLevelNames,
        d3dFLevelNames + nFeatLevels,
        [featLevelCodeSuccess](const KeyValPair& entry)
        {
            return entry.code == featLevelCodeSuccess;
        }
    )->name << std::endl;

done:

    return d3dDx11Device;
}

Implementación de devolución de llamada asíncrona:

struct EncoderCallbacks : IMFAsyncCallback
{
    EncoderCallbacks(IMFTransform* encoder)
    {
        TickEvent = CreateEvent(0, FALSE, FALSE, 0);
        _pEncoder = encoder;
    }

    ~EncoderCallbacks()
    {
        eventGen = nullptr;
        CloseHandle(TickEvent);
    }

    bool Initialize() {

        _pEncoder->QueryInterface(IID_PPV_ARGS(&eventGen));

        if (eventGen) {

            eventGen->BeginGetEvent(this, 0);
            return true;
        }

        return false;
    }

    // dummy IUnknown impl
    virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, void** ppvObject) override { return E_NOTIMPL; }
    virtual ULONG STDMETHODCALLTYPE AddRef(void) override { return 1; }
    virtual ULONG STDMETHODCALLTYPE Release(void) override { return 1; }

    virtual HRESULT STDMETHODCALLTYPE GetParameters(DWORD* pdwFlags, DWORD* pdwQueue) override
    {
        // we return immediately and don't do anything except signaling another thread
        *pdwFlags = MFASYNC_SIGNAL_CALLBACK;
        *pdwQueue = MFASYNC_CALLBACK_QUEUE_IO;
        return S_OK;
    }

    virtual HRESULT STDMETHODCALLTYPE Invoke(IMFAsyncResult* pAsyncResult) override
    {
        IMFMediaEvent* event = 0;
        eventGen->EndGetEvent(pAsyncResult, &event);
        if (event)
        {
            MediaEventType type;
            event->GetType(&type);
            switch (type)
            {
            case METransformNeedInput: InterlockedIncrement(&NeedsInput); break;
            case METransformHaveOutput: InterlockedIncrement(&HasOutput); break;
            }
            event->Release();
            SetEvent(TickEvent);
        }

        eventGen->BeginGetEvent(this, 0);
        return S_OK;
    }

    CComQIPtr<IMFMediaEventGenerator> eventGen = nullptr;
    HANDLE TickEvent;
    IMFTransform* _pEncoder = nullptr;

    unsigned int NeedsInput = 0;
    unsigned int HasOutput = 0;
};

Generar método de muestra:

bool GenerateSampleAsync() {

    DWORD processOutputStatus = 0;
    HRESULT mftProcessOutput = S_OK;
    bool frameSent = false;

    // Create sample
    CComPtr<IMFSample> currentVideoSample = nullptr;

    MFT_OUTPUT_STREAM_INFO StreamInfo;

    // wait for any callback to come in
    WaitForSingleObject(_pEventCallback->TickEvent, INFINITE);

    while (_pEventCallback->NeedsInput) {

        if (!currentVideoSample) {

            (pDesktopDuplication)->releaseBuffer();
            (pDesktopDuplication)->cleanUpCurrentFrameObjects();

            bool bTimeout = false;

            if (pDesktopDuplication->GetCurrentFrameAsVideoSample((void**)& currentVideoSample, waitTime, bTimeout, deviceRect, deviceRect.Width(), deviceRect.Height())) {

                prevVideoSample = currentVideoSample;
            }
            // Feed the previous sample to the encoder in case of no update in display
            else {
                currentVideoSample = prevVideoSample;
            }
        }

        if (currentVideoSample)
        {
            InterlockedDecrement(&_pEventCallback->NeedsInput);
            _frameCount++;

            CHECK_HR(currentVideoSample->SetSampleTime(mTimeStamp), "Error setting the video sample time.");
            CHECK_HR(currentVideoSample->SetSampleDuration(VIDEO_FRAME_DURATION), "Error getting video sample duration.");

            CHECK_HR(_pTransform->ProcessInput(inputStreamID, currentVideoSample, 0), "The resampler H264 ProcessInput call failed.");

            mTimeStamp += VIDEO_FRAME_DURATION;
        }
    }

    while (_pEventCallback->HasOutput) {

        CComPtr<IMFSample> mftOutSample = nullptr;
        CComPtr<IMFMediaBuffer> pOutMediaBuffer = nullptr;

        InterlockedDecrement(&_pEventCallback->HasOutput);

        CHECK_HR(_pTransform->GetOutputStreamInfo(outputStreamID, &StreamInfo), "Failed to get output stream info from H264 MFT.");

        CHECK_HR(MFCreateSample(&mftOutSample), "Failed to create MF sample.");
        CHECK_HR(MFCreateMemoryBuffer(StreamInfo.cbSize, &pOutMediaBuffer), "Failed to create memory buffer.");
        CHECK_HR(mftOutSample->AddBuffer(pOutMediaBuffer), "Failed to add sample to buffer.");

        MFT_OUTPUT_DATA_BUFFER _outputDataBuffer;
        memset(&_outputDataBuffer, 0, sizeof _outputDataBuffer);
        _outputDataBuffer.dwStreamID = outputStreamID;
        _outputDataBuffer.dwStatus = 0;
        _outputDataBuffer.pEvents = nullptr;
        _outputDataBuffer.pSample = mftOutSample;

        mftProcessOutput = _pTransform->ProcessOutput(0, 1, &_outputDataBuffer, &processOutputStatus);

        if (mftProcessOutput != MF_E_TRANSFORM_NEED_MORE_INPUT)
        {
            if (_outputDataBuffer.pSample) {

                CComPtr<IMFMediaBuffer> buf = NULL;
                DWORD bufLength;
                CHECK_HR(_outputDataBuffer.pSample->ConvertToContiguousBuffer(&buf), "ConvertToContiguousBuffer failed.");

                if (buf) {

                    CHECK_HR(buf->GetCurrentLength(&bufLength), "Get buffer length failed.");
                    BYTE* rawBuffer = NULL;

                    fFrameSize = bufLength;
                    fDurationInMicroseconds = 0;
                    gettimeofday(&fPresentationTime, NULL);

                    buf->Lock(&rawBuffer, NULL, NULL);
                    memmove(fTo, rawBuffer, fFrameSize > fMaxSize ? fMaxSize : fFrameSize);

                    bytesTransfered += bufLength;

                    FramedSource::afterGetting(this);

                    buf->Unlock();

                    frameSent = true;
                }
            }

            if (_outputDataBuffer.pEvents)
                _outputDataBuffer.pEvents->Release();
        }
        else if (MF_E_TRANSFORM_STREAM_CHANGE == mftProcessOutput) {

            // some encoders want to renegotiate the output format. 
            if (_outputDataBuffer.dwStatus & MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE)
            {
                CComPtr<IMFMediaType> pNewOutputMediaType = nullptr;
                HRESULT res = _pTransform->GetOutputAvailableType(outputStreamID, 1, &pNewOutputMediaType);

                res = _pTransform->SetOutputType(0, pNewOutputMediaType, 0);//setting the type again
                CHECK_HR(res, "Failed to set output type during stream change");
            }
        }
        else {
            HandleFailure();
        }
    }

    return frameSent;
}

Crear muestra de video y conversión de color:

bool GetCurrentFrameAsVideoSample(void **videoSample, int waitTime, bool &isTimeout, CRect &deviceRect, int surfaceWidth, int surfaceHeight)
{

FRAME_DATA currentFrameData;

m_LastErrorCode = m_DuplicationManager.GetFrame(&currentFrameData, waitTime, &isTimeout);

if (!isTimeout && SUCCEEDED(m_LastErrorCode)) {

    m_CurrentFrameTexture = currentFrameData.Frame;

    if (!pDstTexture) {

        D3D11_TEXTURE2D_DESC desc;
        ZeroMemory(&desc, sizeof(D3D11_TEXTURE2D_DESC));

        desc.Format = DXGI_FORMAT_NV12;
        desc.Width = surfaceWidth;
        desc.Height = surfaceHeight;
        desc.MipLevels = 1;
        desc.ArraySize = 1;
        desc.SampleDesc.Count = 1;
        desc.CPUAccessFlags = 0;
        desc.Usage = D3D11_USAGE_DEFAULT;
        desc.BindFlags = D3D11_BIND_RENDER_TARGET;

        m_LastErrorCode = m_Id3d11Device->CreateTexture2D(&desc, NULL, &pDstTexture);
    }

    if (m_CurrentFrameTexture && pDstTexture) {

        // Copy diff area texels to new temp texture
        //m_Id3d11DeviceContext->CopySubresourceRegion(pNewTexture, D3D11CalcSubresource(0, 0, 1), 0, 0, 0, m_CurrentFrameTexture, 0, NULL);

        HRESULT hr = pColorConv->Convert(m_CurrentFrameTexture, pDstTexture);

        if (SUCCEEDED(hr)) { 

            CComPtr<IMFMediaBuffer> pMediaBuffer = nullptr;

            MFCreateDXGISurfaceBuffer(__uuidof(ID3D11Texture2D), pDstTexture, 0, FALSE, (IMFMediaBuffer**)&pMediaBuffer);

            if (pMediaBuffer) {

                CComPtr<IMF2DBuffer> p2DBuffer = NULL;
                DWORD length = 0;
                (((IMFMediaBuffer*)pMediaBuffer))->QueryInterface(__uuidof(IMF2DBuffer), reinterpret_cast<void**>(&p2DBuffer));
                p2DBuffer->GetContiguousLength(&length);
                (((IMFMediaBuffer*)pMediaBuffer))->SetCurrentLength(length);

                //MFCreateVideoSampleFromSurface(NULL, (IMFSample**)videoSample);
                MFCreateSample((IMFSample * *)videoSample);

                if (videoSample) {

                    (*((IMFSample **)videoSample))->AddBuffer((((IMFMediaBuffer*)pMediaBuffer)));
                }

                return true;
            }
        }
    }
}

return false;
}

El controlador de gráficos Intel en la máquina ya está actualizado.

Solo el evento TransformNeedInput se activa todo el tiempo, pero el codificador se queja de que no puede aceptar más entradas. El evento TransformHaveOutput nunca se ha activado.

Actualización: he tratado de burlarme solo de la fuente de entrada (mediante la creación programática de una muestra de animación NV12 rectángulo) dejando todo lo demás intacto. Esta vez, el codificador Intel no se queja de nada, incluso tengo muestras de salida. Excepto el hecho de que el video de salida del codificador Intel está distorsionado, mientras que el codificador Nvidia funciona perfectamente bien.

Además, sigo recibiendo el error ProcessInput para mi fuente NV12 original con codificador Intel. No tengo problemas con Nvidia MFT y los codificadores de software.

Salida del hardware Intel MFT: (Mire la salida del codificador Nvidia)

Salida de hardware Nvidia MFT:

Estadísticas de uso de gráficos de Nvidia:

Estadísticas de uso de gráficos Intel (no entiendo por qué el motor GPU se muestra como decodificación de video):

video-streaming directx h.264 ms-media-foundation hardware-acceleration RAM
fuente

No se muestra ningún código relevante. Es probable que algo salga mal exactamente al recibir la "necesidad de entrada" y proporcionarla ProcessInput.

Roman R.

@RomanR. Si ese es el caso, también podría haber fallado para el software y las MFT de hardware de Nvidia, ¿no? No he mostrado ningún código relacionado con la enumeración de MFT y las configuraciones de entrada y salida porque será redundante, innecesario y demasiado largo para un hilo ya que mencioné que he seguido exactamente el mismo código dado en el foro de Intel ( software.intel.com / es-es / foros / intel-media-sdk / topic / 681571 ). Intentaré actualizar este hilo con los bloques de código necesarios.

Ram

No, no es. Las MFT de hardware de AMD, Intel y NVIDIA se implementan de manera similar pero al mismo tiempo con un comportamiento ligeramente diferente. Los tres funcionan principalmente como MFT asíncronos, por lo que su pregunta es una indicación aparente de que está haciendo algo mal. Sin código es solo una conjetura qué es exactamente. El codificador de software de Microsoft es MFT AFAIR de sincronización, por lo que es muy probable que sea parte de la comunicación con MFT asíncrono donde algo no está bien.

Roman R.

Por cierto, el código de ese enlace del foro de Intel funciona para mí y produce video.

Roman R.

@RomanR. He actualizado el hilo con mi implementación de IMFAsyncCallback, creación de muestras y conversión de color, ProcessInput y ProcessOutput. El convertidor de color simplemente se toma desde aquí ( github.com/NVIDIA/video-sdk-samples/blob/master/… ).

Ram

La llamada a ProcessInput del hardware de gráficos Intel H264 MFT falla después de alimentar algunas muestras de entrada, lo mismo funciona bien con el hardware MFT de Nvidia

Respuestas: